中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Adversarial Multimodal Network for Movie Story Question Answering

文献类型:期刊论文

作者Yuan, Zhaoquan1; Sun, Siyuan2,3; Duan, Lixin2,3; Li, Changsheng4; Wu, Xiao1; Xu, Changsheng5
刊名IEEE TRANSACTIONS ON MULTIMEDIA
出版日期2021
卷号23页码:1744-1756
关键词Knowledge discovery Motion pictures Visualization Task analysis Generators Gallium nitride Natural languages Movie question answering adversarial network multimodal understanding
ISSN号1520-9210
DOI10.1109/TMM.2020.3002667
通讯作者Duan, Lixin(lxduan@uestc.edu.cn) ; Li, Changsheng(lcs@bit.edu.cn)
英文摘要Visual question answering by using information from multiple modalities has attracted more and more attention in recent years. However, it is a very challenging task, as the visual content and natural language have quite different statistical properties. In this work, we present a method called Adversarial Multimodal Network (AMN) to better understand video stories for question answering. In AMN, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e.g., subtitles and questions) based on generative adversarial networks. Moreover, a self-attention mechanism is developed to enforce our newly introduced consistency constraint in order to preserve the self-correlation between the visual cues of the original video clips in the learned multimodal representations. Extensive experiments on the benchmark MovieQA and TVQA datasets show the effectiveness of our proposed AMN over other published state-of-the-art methods.
资助项目Major Project for New Generation of AI[2018AAA0100400] ; National Natural Science Foundation of China[61802053] ; National Natural Science Foundation of China[61772436] ; National Natural Science Foundation of China[61772118] ; National Natural Science Foundation of China[61806044] ; Sichuan Science and Technology Program[2020YJ0037] ; Sichuan Science and Technology Program[2020YJ0207] ; Foundation for Department of Transportation of Henan Province[2019J-2-2] ; Fundamental Research Funds for the Central Universities[2682019CX62]
WOS研究方向Computer Science ; Telecommunications
语种英语
WOS记录号WOS:000655830300021
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构Major Project for New Generation of AI ; National Natural Science Foundation of China ; Sichuan Science and Technology Program ; Foundation for Department of Transportation of Henan Province ; Fundamental Research Funds for the Central Universities
源URL[http://ir.ia.ac.cn/handle/173211/45316]  
专题自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队
通讯作者Duan, Lixin; Li, Changsheng
作者单位1.Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 610031, Peoples R China
2.Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu 610051, Peoples R China
3.Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610051, Peoples R China
4.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Yuan, Zhaoquan,Sun, Siyuan,Duan, Lixin,et al. Adversarial Multimodal Network for Movie Story Question Answering[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2021,23:1744-1756.
APA Yuan, Zhaoquan,Sun, Siyuan,Duan, Lixin,Li, Changsheng,Wu, Xiao,&Xu, Changsheng.(2021).Adversarial Multimodal Network for Movie Story Question Answering.IEEE TRANSACTIONS ON MULTIMEDIA,23,1744-1756.
MLA Yuan, Zhaoquan,et al."Adversarial Multimodal Network for Movie Story Question Answering".IEEE TRANSACTIONS ON MULTIMEDIA 23(2021):1744-1756.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。