中国科学院机构知识库网格系统: Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

文献类型：期刊论文


作者	Peng, Zhichao 1; Zeng, Hua 1; Li, Yongwei 2; Du, Yegang 3; Dang, Jianwu 4,5
刊名	ELECTRONICS
出版日期	2023-11-01
卷号	12 期号:22 页码:15
关键词	modulation-filtered cochleagram parallel attention recurrent neural network dimensional emotion recognition auditory signal processing noise-robust
DOI	10.3390/electronics12224620
通讯作者	Peng, Zhichao(zcpeng@tju.edu.cn) ; Dang, Jianwu(jdang@jaist.ac.jp)
英文摘要	Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human-robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker's emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition.
WOS关键词	SPEAKER INDIVIDUALITY ; TEMPORAL ENVELOPE ; VOCAL-EMOTION ; PERCEPTION ; FEATURES
资助项目	Hunan Provincial Natural Science Foundation of China
WOS研究方向	Computer Science ; Engineering ; Physics
语种	英语
WOS记录号	WOS:001118263900001
出版者	MDPI
资助机构	Hunan Provincial Natural Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/55066]
专题	模式识别国家重点实验室_智能交互
通讯作者	Peng, Zhichao; Dang, Jianwu
作者单位	1.Hunan Univ Humanities Sci & Technol, Sch Informat, Loudi 417000, Peoples R China 2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100045, Peoples R China 3.Waseda Univ, Future Robot Org, Tokyo 1698050, Japan 4.Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China 5.Pengcheng Lab, Shenzhen 518066, Peoples R China
推荐引用方式 GB/T 7714	Peng, Zhichao,Zeng, Hua,Li, Yongwei,et al. Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network[J]. ELECTRONICS,2023,12(22):15.
APA	Peng, Zhichao,Zeng, Hua,Li, Yongwei,Du, Yegang,&Dang, Jianwu.(2023).Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network.ELECTRONICS,12(22),15.
MLA	Peng, Zhichao,et al."Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network".ELECTRONICS 12.22(2023):15.

入库方式： OAI收割

来源：自动化研究所

下载0

Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network

其他版本