Adversarial Multimodal Network for Movie Story Question Answering
文献类型:期刊论文
作者 | Yuan, Zhaoquan1; Sun, Siyuan2,3; Duan, Lixin2,3; Li, Changsheng4; Wu, Xiao1; Xu, Changsheng5![]() |
刊名 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
出版日期 | 2021 |
卷号 | 23页码:1744-1756 |
关键词 | Knowledge discovery Motion pictures Visualization Task analysis Generators Gallium nitride Natural languages Movie question answering adversarial network multimodal understanding |
ISSN号 | 1520-9210 |
DOI | 10.1109/TMM.2020.3002667 |
通讯作者 | Duan, Lixin(lxduan@uestc.edu.cn) ; Li, Changsheng(lcs@bit.edu.cn) |
英文摘要 | Visual question answering by using information from multiple modalities has attracted more and more attention in recent years. However, it is a very challenging task, as the visual content and natural language have quite different statistical properties. In this work, we present a method called Adversarial Multimodal Network (AMN) to better understand video stories for question answering. In AMN, we propose to learn multimodal feature representations by finding a more coherent subspace for video clips and the corresponding texts (e.g., subtitles and questions) based on generative adversarial networks. Moreover, a self-attention mechanism is developed to enforce our newly introduced consistency constraint in order to preserve the self-correlation between the visual cues of the original video clips in the learned multimodal representations. Extensive experiments on the benchmark MovieQA and TVQA datasets show the effectiveness of our proposed AMN over other published state-of-the-art methods. |
资助项目 | Major Project for New Generation of AI[2018AAA0100400] ; National Natural Science Foundation of China[61802053] ; National Natural Science Foundation of China[61772436] ; National Natural Science Foundation of China[61772118] ; National Natural Science Foundation of China[61806044] ; Sichuan Science and Technology Program[2020YJ0037] ; Sichuan Science and Technology Program[2020YJ0207] ; Foundation for Department of Transportation of Henan Province[2019J-2-2] ; Fundamental Research Funds for the Central Universities[2682019CX62] |
WOS研究方向 | Computer Science ; Telecommunications |
语种 | 英语 |
WOS记录号 | WOS:000655830300021 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | Major Project for New Generation of AI ; National Natural Science Foundation of China ; Sichuan Science and Technology Program ; Foundation for Department of Transportation of Henan Province ; Fundamental Research Funds for the Central Universities |
源URL | [http://ir.ia.ac.cn/handle/173211/45316] ![]() |
专题 | 自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队 |
通讯作者 | Duan, Lixin; Li, Changsheng |
作者单位 | 1.Southwest Jiaotong Univ, Sch Informat Sci & Technol, Chengdu 610031, Peoples R China 2.Univ Elect Sci & Technol China, Big Data Res Ctr, Chengdu 610051, Peoples R China 3.Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610051, Peoples R China 4.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China 5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Yuan, Zhaoquan,Sun, Siyuan,Duan, Lixin,et al. Adversarial Multimodal Network for Movie Story Question Answering[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2021,23:1744-1756. |
APA | Yuan, Zhaoquan,Sun, Siyuan,Duan, Lixin,Li, Changsheng,Wu, Xiao,&Xu, Changsheng.(2021).Adversarial Multimodal Network for Movie Story Question Answering.IEEE TRANSACTIONS ON MULTIMEDIA,23,1744-1756. |
MLA | Yuan, Zhaoquan,et al."Adversarial Multimodal Network for Movie Story Question Answering".IEEE TRANSACTIONS ON MULTIMEDIA 23(2021):1744-1756. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。