中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation

文献类型:会议论文

作者Ma, Cong1,2; Zhang, Yaping1,2; Tu, Mei4; Zhao, Yang1,2; Zhou, Yu1,3; Zong, Chengqing1,2
出版日期2023-08
会议日期August 21-26, 2023
会议地点San José, California, USA
英文摘要

Text image machine translation (TIMT) aims to translate texts embedded in images from one source language to another target language. Existing methods, both two-stage cascade and one-stage end- to-end architectures, suffer from different issues. The cascade models can benefit from the large-scale optical character recognition (OCR) and MT datasets but the two-stage architecture is redundant. The end-to- end models are efficient but suffer from training data deficiency. To this end, in our paper, we propose an end-to-end TIMT model fully making use of the knowledge from existing OCR and MT datasets to pursue both an effective and efficient framework. More specifically, we build a novel modal adapter effectively bridging the OCR encoder and MT decoder. End-to-end TIMT loss and cross-modal contrastive loss are uti- lized jointly to align the feature distribution of the OCR and MT tasks. Extensive experiments show that the proposed method outperforms the existing two-stage cascade models and one-stage end-to-end models with a lighter and faster architecture. Furthermore, the ablation studies verify the generalization of our method, where the proposed modal adapter is effective to bridge various OCR and MT models.

会议录Proceedings of the 17th Document Analysis and Recognition (ICDAR 2023)
源URL[http://ir.ia.ac.cn/handle/173211/57621]  
专题模式识别国家重点实验室_自然语言处理
通讯作者Zhang, Yaping
作者单位1.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China
2.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China
3.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China
4.Samsung Research China - Beijing (SRC-B)
推荐引用方式
GB/T 7714
Ma, Cong,Zhang, Yaping,Tu, Mei,et al. E2TIMT: Efficient and Effective Modal Adapter for Text Image Machine Translation[C]. 见:. San José, California, USA. August 21-26, 2023.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。