中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

文献类型:期刊论文

作者Peng, Zhichao1; Zeng, Hua1; Li, Yongwei2; Du, Yegang3; Dang, Jianwu4,5
刊名ELECTRONICS
出版日期2023-11-01
卷号12期号:22页码:15
关键词modulation-filtered cochleagram parallel attention recurrent neural network dimensional emotion recognition auditory signal processing noise-robust
DOI10.3390/electronics12224620
通讯作者Peng, Zhichao(zcpeng@tju.edu.cn) ; Dang, Jianwu(jdang@jaist.ac.jp)
英文摘要Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human-robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker's emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition.
WOS关键词SPEAKER INDIVIDUALITY ; TEMPORAL ENVELOPE ; VOCAL-EMOTION ; PERCEPTION ; FEATURES
资助项目Hunan Provincial Natural Science Foundation of China
WOS研究方向Computer Science ; Engineering ; Physics
语种英语
出版者MDPI
WOS记录号WOS:001118263900001
资助机构Hunan Provincial Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/55066]  
专题模式识别国家重点实验室_智能交互
通讯作者Peng, Zhichao; Dang, Jianwu
作者单位1.Hunan Univ Humanities Sci & Technol, Sch Informat, Loudi 417000, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100045, Peoples R China
3.Waseda Univ, Future Robot Org, Tokyo 1698050, Japan
4.Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China
5.Pengcheng Lab, Shenzhen 518066, Peoples R China
推荐引用方式
GB/T 7714
Peng, Zhichao,Zeng, Hua,Li, Yongwei,et al. Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network[J]. ELECTRONICS,2023,12(22):15.
APA Peng, Zhichao,Zeng, Hua,Li, Yongwei,Du, Yegang,&Dang, Jianwu.(2023).Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network.ELECTRONICS,12(22),15.
MLA Peng, Zhichao,et al."Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network".ELECTRONICS 12.22(2023):15.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。