DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog
文献类型:会议论文
作者 | Feilong Chen1,2,4,5![]() ![]() ![]() |
出版日期 | 2020 |
会议日期 | 2020.2 |
会议地点 | 美国纽约 |
英文摘要 | Visual Dialog is a vision-language task that requires an AI agent to engage in a conversation with humans grounded in an image. It remains a challenging task since it requires the agent to fully understand a given question before making an appropriate response not only from the textual dialog history, but also from the visually-grounded information. While previous models typically leverage single-hop reasoning or single-channel reasoning to deal with this complex multimodal reasoning task, which is intuitively insufficient. In this paper, we thus propose a novel and more powerful Dual-channel Multi-hop Reasoning Model for Visual Dialog, named DMRM. DMRM synchronously captures information from the dialog history and the image to enrich the semantic representation of the question by exploiting dual-channel reasoning. Specifically, DMRM maintains a dual channel to obtain the question-and history-aware image features and the question-and image-aware dialog history features by a mulit-hop reasoning process in each channel. Additionally, we also design an effective multimodal attention to further enhance the decoder to generate more accurate responses. Experimental results on the VisDial v0. 9 and v1. 0 datasets demonstrate that the proposed model is effective and outperforms compared models by a significant margin. |
源URL | [http://ir.ia.ac.cn/handle/173211/51918] ![]() |
专题 | 数字内容技术与服务研究中心_听觉模型与认知计算 |
通讯作者 | Jiaming Xu |
作者单位 | 1.University of Chinese Academy of Sciences 2.Research Center for Brain-inspired Intelligence, CASIA 3.Center for Excellence in Brain Science and Intelligence Technology, CAS. China 4.Institute of Automation, Chinese Academy of Sciences (CASIA) 5.Pattern Recognition Center, WeChat AI, Tencent Inc., China |
推荐引用方式 GB/T 7714 | Feilong Chen,Fandong Meng,Jiaming Xu,et al. DMRM: A Dual-Channel Multi-Hop Reasoning Model for Visual Dialog[C]. 见:. 美国纽约. 2020.2. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。