Entity-level Cross-modal Learning Improves Multi-modal Machine Translation
文献类型:会议论文
作者 | Huang X(黄鑫)1,2![]() ![]() ![]() |
出版日期 | 2021-11 |
会议日期 | 2021-11-7 |
会议地点 | Punta Cana, Dominican Republic |
英文摘要 | Multi-modal machine translation (MMT) aims at improving translation performance by incorporating visual information. Most of the studies leverage the visual information through integrating the global image features as auxiliary input or decoding by attending to relevant local regions of the image. However, this kind of usage of visual information makes it difficult to figure out how the visual modality helps and why it works. Inspired by the findings of (CITATION) that entities are most informative in the image, we propose an explicit entity-level cross-modal learning approach that aims to augment the entity representation. Specifically, the approach is framed as a reconstruction task that reconstructs the original textural input from multi-modal input in which entities are replaced with visual features. Then, a multi-task framework is employed to combine the translation task and the reconstruction task to make full use of cross-modal entity representation learning. The extensive experiments demonstrate that our approach can achieve comparable or even better performance than state-of-the-art models. Furthermore, our in-depth analysis shows how visual information improves translation. |
源URL | [http://ir.ia.ac.cn/handle/173211/52157] ![]() |
专题 | 模式识别国家重点实验室_自然语言处理 |
通讯作者 | Zong CQ(宗成庆) |
作者单位 | 1.中国科学院自动化研究所模式识别国家重点实验室 2.中国科学院大学人工智能学院 |
推荐引用方式 GB/T 7714 | Huang X,Zhang JJ,Zong CQ. Entity-level Cross-modal Learning Improves Multi-modal Machine Translation[C]. 见:. Punta Cana, Dominican Republic. 2021-11-7. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。