中国科学院机构知识库网格系统: Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis

文献类型：期刊论文


作者	Licai Sun2,3 ; Zheng Lian3 ; Bin Liu3 ; Jianhua Tao1
刊名	IEEE Transactions on Affective Computing
出版日期	2023
页码	1-17
英文摘要	With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local unimodal features and mutually promote each other. It not only avoids the quadratic scaling cost of previous local-local cross-modal interaction methods but also leads to better performance. To improve model robustness in the incomplete modality setting, on the one hand, DLFR performs low-level feature reconstruction to implicitly encourage the model to learn semantic information from incomplete data. On the other hand, it innovatively regards complete and incomplete data as two different views of one sample and utilizes siamese representation learning to explicitly attract their high-level representations. Comprehensive experiments on three popular datasets demonstrate that our method achieves superior performance in both complete and incomplete modality settings.
源URL	[http://ir.ia.ac.cn/handle/173211/57088]
专题	多模态人工智能系统全国重点实验室
通讯作者	Bin Liu; Jianhua Tao
作者单位	1.e Department of Automation, Tsinghua University 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Licai Sun,Zheng Lian,Bin Liu,et al. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing,2023:1-17.
APA	Licai Sun,Zheng Lian,Bin Liu,&Jianhua Tao.(2023).Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis.IEEE Transactions on Affective Computing,1-17.
MLA	Licai Sun,et al."Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis".IEEE Transactions on Affective Computing (2023):1-17.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。