中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Modal Contrastive Learning Based End-to-End Text Image Machine Translation

文献类型:期刊论文

作者Ma, Cong2,3; Han, Xu2,3; Wu, Linghui2,3; Zhang, Yaping2,3; Zhao, Yang2,3; Zhou, Yu1,2; Zong, Chengqing2,3
刊名IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)
出版日期2023-10
期号32页码:2153-2165
英文摘要

Text image machine translation (TIMT) aims at di- rectly translating text in the source language embedded in images into the target language. Most existing systems follow the cascaded pipeline diagram from recognition to translation, which suffers from the problem of error propagation, parameter redundancy, and information reduction. The end-to-end model has the potential to alleviate these issues via bridging the recognition and translation models. However, the challenge is the data limitation and modality gap between text and image. In this paper, we propose a novel end-to-end model, namely Modal contrastive learning based End- to-end Text Image Machine Translation (METIMT), which allevi- ates these issues through end-to-end text image machine translation architecture and modal contrastive learning. Specifically, an image encoder is designed to encode images into the same feature space of corresponding text sentences, with the guidance of an intra-modal and inter-modal contrastive learning module. To further promote the research of text image machine translation, we have constructed one synthetic and two real-world datasets. Extensive experiments show that our lighter, faster model outperforms not only existing pipeline methods but also state-of-the-art end-to-end models on both synthetic and real-world evaluation sets. Our code and dataset will be released to the public.

源URL[http://ir.ia.ac.cn/handle/173211/57613]  
专题模式识别国家重点实验室_自然语言处理
通讯作者Zhang, Yaping
作者单位1.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China
2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China
3.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China
推荐引用方式
GB/T 7714
Ma, Cong,Han, Xu,Wu, Linghui,et al. Modal Contrastive Learning Based End-to-End Text Image Machine Translation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP),2023(32):2153-2165.
APA Ma, Cong.,Han, Xu.,Wu, Linghui.,Zhang, Yaping.,Zhao, Yang.,...&Zong, Chengqing.(2023).Modal Contrastive Learning Based End-to-End Text Image Machine Translation.IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP)(32),2153-2165.
MLA Ma, Cong,et al."Modal Contrastive Learning Based End-to-End Text Image Machine Translation".IEEE/ACM Transactions on Audio, Speech, and Language Processing (IEEE/ACM TASLP) .32(2023):2153-2165.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。