Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis
文献类型:期刊论文
作者 | Licai Sun2,3![]() ![]() ![]() ![]() |
刊名 | IEEE Transactions on Affective Computing
![]() |
出版日期 | 2023 |
页码 | 1-17 |
英文摘要 | With the proliferation of user-generated online videos, Multimodal Sentiment Analysis (MSA) has attracted increasing attention recently. Despite significant progress, there are still two major challenges on the way towards robust MSA: 1) inefficiency when modeling cross-modal interactions in unaligned multimodal data; and 2) vulnerability to random modality feature missing which typically occurs in realistic settings. In this paper, we propose a generic and unified framework to address them, named Efficient Multimodal Transformer with Dual-Level Feature Restoration (EMT-DLFR). Concretely, EMT employs utterance-level representations from each modality as the global multimodal context to interact with local unimodal features and mutually promote each other. It not only avoids the quadratic scaling cost of previous local-local cross-modal interaction methods but also leads to better performance. To improve model robustness in the incomplete modality setting, on the one hand, DLFR performs low-level feature reconstruction to implicitly encourage the model to learn semantic information from incomplete data. On the other hand, it innovatively regards complete and incomplete data as two different views of one sample and utilizes siamese representation learning to explicitly attract their high-level representations. Comprehensive experiments on three popular datasets demonstrate that our method achieves superior performance in both complete and incomplete modality settings. |
源URL | [http://ir.ia.ac.cn/handle/173211/57088] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Bin Liu; Jianhua Tao |
作者单位 | 1.e Department of Automation, Tsinghua University 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Institute of Automation, Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Licai Sun,Zheng Lian,Bin Liu,et al. Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis[J]. IEEE Transactions on Affective Computing,2023:1-17. |
APA | Licai Sun,Zheng Lian,Bin Liu,&Jianhua Tao.(2023).Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis.IEEE Transactions on Affective Computing,1-17. |
MLA | Licai Sun,et al."Efficient multimodal transformer with dual-level feature restoration for robust multimodal sentiment analysis".IEEE Transactions on Affective Computing (2023):1-17. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。