E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation
文献类型:会议论文
作者 | Ma, Cong1,2![]() ![]() ![]() ![]() ![]() |
出版日期 | 2023-08 |
会议日期 | August 21-26, 2023 |
会议地点 | San José, California, USA |
英文摘要 | Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end- to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to- end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are uti- lized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models. |
会议录 | Proceedings of the 17th Document Analysis and Recognition (ICDAR 2023)
![]() |
源URL | [http://ir.ia.ac.cn/handle/173211/57621] ![]() |
专题 | 模式识别国家重点实验室_自然语言处理 |
通讯作者 | Zhang, Yaping |
作者单位 | 1.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China 2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China 3.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China 4.Samsung Research China - Beijing (SRC-B) |
推荐引用方式 GB/T 7714 | Ma, Cong,Zhang, Yaping,Tu, Mei,et al. E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation[C]. 见:. San José, California, USA. August 21-26, 2023. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。