Modal Contrastive Learning Based End-to-End Text Image Machine Translation
文献类型:期刊论文
作者 | Ma, Cong2,3![]() ![]() ![]() ![]() ![]() ![]() |
刊名 | IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)
![]() |
出版日期 | 2023-10 |
期号 | 32页码:2153-2165 |
英文摘要 | Text image machine translation (TIMT) aims at di- rectly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End- to-end Text Image Machine Translation (METIMT), which allevi- ates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public. |
源URL | [http://ir.ia.ac.cn/handle/173211/57613] ![]() |
专题 | 模式识别国家重点实验室_自然语言处理 |
通讯作者 | Zhang, Yaping |
作者单位 | 1.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China 2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China 3.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China |
推荐引用方式 GB/T 7714 | Ma, Cong,Han, Xu,Wu, Linghui,et al. Modal Contrastive Learning Based End-to-End Text Image Machine Translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),2023(32):2153-2165. |
APA | Ma, Cong.,Han, Xu.,Wu, Linghui.,Zhang, Yaping.,Zhao, Yang.,...&Zong, Chengqing.(2023).Modal Contrastive Learning Based End-to-End Text Image Machine Translation.IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)(32),2153-2165. |
MLA | Ma, Cong,et al."Modal Contrastive Learning Based End-to-End Text Image Machine Translation".IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP) .32(2023):2153-2165. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。