中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation

文献类型:会议论文

作者Ma, Cong1,2; Zhang, Yaping1,2; Tu, Mei3; Zhao, Yang1,2; Zhou, Yu2,4; Zong, Chengqing1,2
出版日期2023-08
会议日期August 21-26, 2023
会议地点San José, California, USA
英文摘要

Text image machine translation (TIMT) has been widely used in various real-world applications, which translates source language texts in images into another target language sentence. Existing methods on TIMT are mainly divided into two categories: the recognition-then- translation pipeline model and the end-to-end model. However, how to transfer knowledge from the pipeline model into the end-to-end model remains an unsolved problem. In this paper, we propose a novel Multi- Teacher Knowledge Distillation (MTKD) method to effectively distillate knowledge into the end-to-end TIMT model from the pipeline model. Specifically, three teachers are utilized to improve the performance of the end-to-end TIMT model. The image encoder in the end-to-end TIMT model is optimized with the knowledge distillation guidance from the recognition teacher encoder, while the sequential encoder and decoder are improved by transferring knowledge from the translation sequen- tial and decoder teacher models. Furthermore, both token and sentence- level knowledge distillations are incorporated to better boost the transla- tion performance. Extensive experimental results show that our proposed MTKD effectively improves the text image translation performance and outperforms existing end-to-end and pipeline models with fewer param- eters and less decoding time, illustrating that MTKD can take advan- tage of both pipeline and end-to-end models.

会议录Proceedings of the 17th Document Analysis and Recognition (ICDAR 2023)
源URL[http://ir.ia.ac.cn/handle/173211/57620]  
专题模式识别国家重点实验室_自然语言处理
通讯作者Zhang, Yaping
作者单位1.School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, P.R. China
2.State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences, Beijing, China
3.Samsung Research China - Beijing (SRC-B)
4.Fanyu AI Laboratory, Zhongke Fanyu Technology Co., Ltd, Beijing 100190, P.R. China
推荐引用方式
GB/T 7714
Ma, Cong,Zhang, Yaping,Tu, Mei,et al. Multi-teacher Knowledge Distillation for End-to-End Text Image Machine Translation[C]. 见:. San José, California, USA. August 21-26, 2023.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。