中国科学院机构知识库网格系统: A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

文献类型：期刊论文


作者	Cui, Zhenchao 2; Chen, Ziang 2; Li, Zhaoxin 1; Wang, Zhaoqi 1
刊名	SENSORS
出版日期	2022-12-01
卷号	22 期号:24 页码:15
关键词	human pose generation sign language production semi-autoregressive transformer deep learning
DOI	10.3390/s22249606
英文摘要	As a typical sequence to sequence task, sign language production (SLP) aims to automatically translate spoken language sentences into the corresponding sign language sequences. The existing SLP methods can be classified into two categories: autoregressive and non-autoregressive SLP. The autoregressive methods suffer from high latency and error accumulation caused by the long-term dependence between current output and the previous poses. And non-autoregressive methods suffer from repetition and omission during the parallel decoding process. To remedy these issues in SLP, we propose a novel method named Pyramid Semi-Autoregressive Transformer with Rich Semantics (PSAT-RS) in this paper. In PSAT-RS, we first introduce a pyramid Semi-Autoregressive mechanism with dividing target sequence into groups in a coarse-to-fine manner, which globally keeps the autoregressive property while locally generating target frames. Meanwhile, the relaxed masked attention mechanism is adopted to make the decoder not only capture the pose sequences in the previous groups, but also pay attention to the current group. Finally, considering the importance of spatial-temporal information, we also design a Rich Semantics embedding (RS) module to encode the sequential information both on time dimension and spatial displacement into the same high-dimensional space. This significantly improves the coordination of joints motion, making the generated sign language videos more natural. Results of our experiments conducted on RWTH-PHOENIX-Weather-2014T and CSL datasets show that the proposed PSAT-RS is competitive to the state-of-the-art autoregressive and non-autoregressive SLP models, achieving a better trade-off between speed and accuracy.
资助项目	National Key Research and Development Program of China ; Post-graduate's Innovation Fund Project of Hebei University ; National Natural Science Foundation of China ; Scientific Research Foundation for Talented Scholars of Hebei University ; Scientific Research Foundation of Colleges and Universities in Hebei Province ; [2020YFC1523302] ; [HBU2022ss014] ; [62172392] ; [521100221081] ; [QN2022107]
WOS研究方向	Chemistry ; Engineering ; Instruments & Instrumentation
语种	英语
WOS记录号	WOS:000902932900001
出版者	MDPI
源URL	[http://119.78.100.204/handle/2XEOYT63/20179]
专题	中国科学院计算技术研究所期刊论文
通讯作者	Li, Zhaoxin
作者单位	1.Chinese Acad Sci, Inst Comp Technol, Beijing 100190, Peoples R China 2.Hebei Univ, Hebei Machine Vis Engn Res Ctr, Sch Cyber Secur & Comp, Baoding 071002, Peoples R China
推荐引用方式 GB/T 7714	Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,et al. A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production[J]. SENSORS,2022,22(24):15.
APA	Cui, Zhenchao,Chen, Ziang,Li, Zhaoxin,&Wang, Zhaoqi.(2022).A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production.SENSORS,22(24),15.
MLA	Cui, Zhenchao,et al."A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production".SENSORS 22.24(2022):15.

入库方式： OAI收割

来源：计算技术研究所

下载0

A Pyramid Semi-Autoregressive Transformer with Rich Semantics for Sign Language Production

其他版本