Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network
文献类型:期刊论文
作者 | Peng, Zhichao1; Zeng, Hua1; Li, Yongwei2; Du, Yegang3; Dang, Jianwu4,5 |
刊名 | ELECTRONICS |
出版日期 | 2023-11-01 |
卷号 | 12期号:22页码:15 |
关键词 | modulation-filtered cochleagram parallel attention recurrent neural network dimensional emotion recognition auditory signal processing noise-robust |
DOI | 10.3390/electronics12224620 |
通讯作者 | Peng, Zhichao(zcpeng@tju.edu.cn) ; Dang, Jianwu(jdang@jaist.ac.jp) |
英文摘要 | Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human-robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker's emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition. |
WOS关键词 | SPEAKER INDIVIDUALITY ; TEMPORAL ENVELOPE ; VOCAL-EMOTION ; PERCEPTION ; FEATURES |
资助项目 | Hunan Provincial Natural Science Foundation of China |
WOS研究方向 | Computer Science ; Engineering ; Physics |
语种 | 英语 |
出版者 | MDPI |
WOS记录号 | WOS:001118263900001 |
资助机构 | Hunan Provincial Natural Science Foundation of China |
源URL | [http://ir.ia.ac.cn/handle/173211/55066] |
专题 | 模式识别国家重点实验室_智能交互 |
通讯作者 | Peng, Zhichao; Dang, Jianwu |
作者单位 | 1.Hunan Univ Humanities Sci & Technol, Sch Informat, Loudi 417000, Peoples R China 2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100045, Peoples R China 3.Waseda Univ, Future Robot Org, Tokyo 1698050, Japan 4.Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China 5.Pengcheng Lab, Shenzhen 518066, Peoples R China |
推荐引用方式 GB/T 7714 | Peng, Zhichao,Zeng, Hua,Li, Yongwei,et al. Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network[J]. ELECTRONICS,2023,12(22):15. |
APA | Peng, Zhichao,Zeng, Hua,Li, Yongwei,Du, Yegang,&Dang, Jianwu.(2023).Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network.ELECTRONICS,12(22),15. |
MLA | Peng, Zhichao,et al."Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network".ELECTRONICS 12.22(2023):15. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。