中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning

文献类型:期刊论文

作者Zhang, Xi2,3; Zhang, Feifei3; Xu, Changsheng1,2,3
刊名IEEE TRANSACTIONS ON MULTIMEDIA
出版日期2022
卷号24页码:2986-2997
关键词Cognition Video recording Syntactics Visualization Task analysis Semantics Linguistics Visual Commonsense Reasoning explicit reasoning syntactic structure interpretability
ISSN号1520-9210
DOI10.1109/TMM.2021.3091882
通讯作者Xu, Changsheng(csxu@nlpr.ia.ac.cn)
英文摘要Given a question about an image, Visual Commonsense Reasoning (VCR) needs to provide not only a correct answer, but also a rationale to justify the answer. VCR is a challenging task due to the requirement of proper semantic alignment and reasoning between the image and linguistic expression. Recent approaches offer a great promise by exploring holistic attention mechanisms or graph-based networks, but most of them do implicit reasoning and ignore the semantic dependencies among the linguistic expression. In this paper, we propose a novel explicit cross-modal representation learning network for VCR by incorporating syntactic information into the visual reasoning and natural language understanding. The proposed method enjoys several merits. First, based on a two-branch neural module network, we can do explicit cross-modal reasoning guided by the high-level syntactic structure of linguistic expression. Second, the semantic structure of the linguistic expression is incorporated into a syntactic GCN to facilitate language understanding. Third, our explicit cross-modal representation learning network can provide a traceable reasoning-flow, which offers visible fine-grained evidence of the answer and rationale. Quantitative and qualitative evaluations on the public VCR dataset demonstrate that our approach performs favorably against state-of-the-art methods.
资助项目National Key Research and Development Program of China[2018AAA0100604] ; National Natural Science Foundation of China[61720106006] ; National Natural Science Foundation of China[62002355] ; National Natural Science Foundation of China[61721004] ; National Natural Science Foundation of China[61832002] ; National Natural Science Foundation of China[61532009] ; National Natural Science Foundation of China[61751211] ; National Natural Science Foundation of China[62072455] ; National Natural Science Foundation of China[U1705262] ; National Natural Science Foundation of China[U1836220] ; Key Research Program of Frontier Sciences of CAS[QYZDJSSW-JSC039] ; National Postdoctoral Program for Innovative Talents[BX20190367] ; Beijing Natural Science Foundation[L201001] ; Jiangsu Province key research, and development plan[BE2020036]
WOS研究方向Computer Science ; Telecommunications
语种英语
WOS记录号WOS:000809408000024
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Key Research and Development Program of China ; National Natural Science Foundation of China ; Key Research Program of Frontier Sciences of CAS ; National Postdoctoral Program for Innovative Talents ; Beijing Natural Science Foundation ; Jiangsu Province key research, and development plan
源URL[http://ir.ia.ac.cn/handle/173211/49629]  
专题自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
通讯作者Xu, Changsheng
作者单位1.Peng Cheng Lab, Shenzhen 518066, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Xi,Zhang, Feifei,Xu, Changsheng. Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2022,24:2986-2997.
APA Zhang, Xi,Zhang, Feifei,&Xu, Changsheng.(2022).Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning.IEEE TRANSACTIONS ON MULTIMEDIA,24,2986-2997.
MLA Zhang, Xi,et al."Explicit Cross-Modal Representation Learning for Visual Commonsense Reasoning".IEEE TRANSACTIONS ON MULTIMEDIA 24(2022):2986-2997.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。