中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

文献类型:期刊论文

作者Cui, Zhenchao2; Chen, Ziang2; Li, Zhaoxin1; Wang, Zhaoqi1
刊名SENSORS
出版日期2022-12-01
卷号22期号:24页码:15
关键词human pose generation sign language production semi-autoregressive transformer deep learning
DOI10.3390/s22249606
英文摘要As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy.
资助项目National Key Research and Development Program of China ; Post-graduate's Innovation Fund Project of Hebei University ; National Natural Science Foundation of China ; Scientific Research Foundation for Talented Scholars of Hebei University ; Scientific Research Foundation of Colleges and Universities in Hebei Province ; [2020YFC1523302] ; [HBU2022ss014] ; [62172392] ; [521100221081] ; [QN2022107]
WOS研究方向Chemistry ; Engineering ; Instruments & Instrumentation
语种英语
出版者MDPI
WOS记录号WOS:000902932900001
源URL[http://119.78.100.204/handle/2XEOYT63/20179]  
专题中国科学院计算技术研究所期刊论文
通讯作者Li, Zhaoxin
作者单位1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China
2.Hebei Univ, Hebei Machine Vis Engn Res Ctr, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China
推荐引用方式
GB/T 7714
Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,et al. A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production[J]. SENSORS,2022,22(24):15.
APA Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,&Wang, Zhaoqi.(2022).A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production.SENSORS,22(24),15.
MLA Cui, Zhenchao,et al."A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production".SENSORS 22.24(2022):15.

入库方式: OAI收割

来源:计算技术研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。