中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Entity-level Cross-modal Learning Improves Multi-modal Machine Translation

文献类型:会议论文

作者Huang X(黄鑫)1,2; Zhang JJ(张家俊)1,2; Zong CQ(宗成庆)1,2
出版日期2021-11
会议日期2021-11-7
会议地点Punta Cana, Dominican Republic
英文摘要

Multi-modal machine translation (MMT) aims at improving translation performance by incorporating visual information. Most of the studies leverage the visual information through integrating the global image features as auxiliary input or decoding by attending to relevant local regions of the image. However, this kind of usage of visual information makes it difficult to figure out how the visual modality helps and why it works. Inspired by the findings of (CITATION) that entities are most informative in the image, we propose an explicit entity-level cross-modal learning approach that aims to augment the entity representation. Specifically, the approach is framed as a reconstruction task that reconstructs the original textural input from multi-modal input in which entities are replaced with visual features. Then, a multi-task framework is employed to combine the translation task and the reconstruction task to make full use of cross-modal entity representation learning. The extensive experiments demonstrate that our approach can achieve comparable or even better performance than state-of-the-art models. Furthermore, our in-depth analysis shows how visual information improves translation.

源URL[http://ir.ia.ac.cn/handle/173211/52157]  
专题模式识别国家重点实验室_自然语言处理
通讯作者Zong CQ(宗成庆)
作者单位1.中国科学院自动化研究所模式识别国家重点实验室
2.中国科学院大学人工智能学院
推荐引用方式
GB/T 7714
Huang X,Zhang JJ,Zong CQ. Entity-level Cross-modal Learning Improves Multi-modal Machine Translation[C]. 见:. Punta Cana, Dominican Republic. 2021-11-7.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。