中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Multimodal Spatiotemporal Representation for Automatic Depression Level Detection

文献类型:期刊论文

作者Mingyue Niu2,3; Jianhua Tao1,2,3; Bin Liu2,3; Jian Huang2,3; Zheng Lian2,3
刊名IEEE Transactions on Affective Computing
出版日期2020
期号0页码:0
关键词Multimodal depression detection Spatio-Temporal Attention Audio/Video Segment-Level Feature Eigen Evolution Pooling Audio/Video Level Feature Multimodal Attention Feature Fusion
英文摘要

Physiological studies have shown that there are some differences in speech and facial activities between depressive and healthy individuals. Based on this fact, we propose a novel Spatio-Temporal Attention (STA) network and a Multimodal Attention Feature Fusion (MAFF) strategy to obtain the multimodal representation of depression cues for predicting the individual depression level. Specifically, we firstly divide the speech amplitude spectrum/video into fixed-length segments and input these segments into the STA network, which not only integrates the spatial and temporal information through attention mechanism, but also emphasizes the audio/video frames related to depression detection. The audio/video segment-level feature is obtained from the output of the last full connection layer of the STA network. Secondly, this paper employs the eigen evolution pooling method to summarize the changes of each dimension of the audio/video segment-level features to aggregate them into the audio/video level feature. Thirdly, the multimodal representation with modal complementary information is generated using the MAFF and inputs into the support vector regression predictor for estimating depression severity. Experimental results on the AVEC2013 and AVEC2014 depression databases illustrate the effectiveness of our method.

语种英语
源URL[http://ir.ia.ac.cn/handle/173211/44397]  
专题模式识别国家重点实验室_智能交互
通讯作者Jianhua Tao
作者单位1.CAS Center for Excellence in Brain Science and Intelligence Technology, Beijing, China
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
3.National Laboratory of Pattern Recognition, CASIA, Beijing, China
推荐引用方式
GB/T 7714
Mingyue Niu,Jianhua Tao,Bin Liu,et al. Multimodal Spatiotemporal Representation for Automatic Depression Level Detection[J]. IEEE Transactions on Affective Computing,2020(0):0.
APA Mingyue Niu,Jianhua Tao,Bin Liu,Jian Huang,&Zheng Lian.(2020).Multimodal Spatiotemporal Representation for Automatic Depression Level Detection.IEEE Transactions on Affective Computing(0),0.
MLA Mingyue Niu,et al."Multimodal Spatiotemporal Representation for Automatic Depression Level Detection".IEEE Transactions on Affective Computing .0(2020):0.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。