中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining

文献类型:会议论文

作者Benjia Zhou2; Zhigang Chen3,4; Albert Clapes1,5; Jun Wan2,3,4; Yanyan Liang2; Sergio Escalera1,5,6; Zhen Lei3,4,7; Du Zhang2
出版日期2023-07
会议日期2023-10
会议地点Paris France
英文摘要

Sign Language Translation (SLT) is a challenging task due to its cross-domain nature, involving the translation of visual-gestural language to text. Many previous methods employ an intermediate representation,i.e., gloss sequences, to facilitate SLT, thus transforming it into a two-stage task of sign language recognition (SLR) followed by sign language translation (SLT). However, the scarcity of gloss-annotated sign language data, combined with the information bottleneck in the mid-level gloss representation, has hindered the further development of the SLT task. To address this challenge, we propose a novel Gloss-Free SLT base on Visual-Language Pretraining (GFSLT-VLP), which improves SLT by inheriting language-oriented prior knowledge from pre-trained models, without any gloss annotation assistance. Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training (CLIP) with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual Encoder and Text Decoder from the first stage. The seamless combination of these novel designs forms a robust sign language representation and significantly improves gloss-free sign language translation. In particular, we have achieved unprecedented improvements in terms of BLEU-4 score on the PHOENIX14T dataset (>=+5) and the CSL-Daily dataset (>=+3) compared to state-of-the-art gloss-free SLT methods. Furthermore, our approach also achieves competitive results on the PHOENIX14T dataset when compared with most of the gloss-based methods.

源URL[http://ir.ia.ac.cn/handle/173211/57266]  
专题自动化研究所_模式识别国家重点实验室_生物识别与安全技术研究中心
通讯作者Jun Wan
作者单位1.Universitat de Barcelona, Spain
2.MUST, Macau, China
3.UCAS, China
4.MAIS, CASIA, China
5.5Computer Vision Center, Spain
6.AAU, Aalborg, Denmark
7.CAIR, HKISI, CAS, Hong Kong, China
推荐引用方式
GB/T 7714
Benjia Zhou,Zhigang Chen,Albert Clapes,et al. Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining[C]. 见:. Paris France. 2023-10.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。