中国科学院机构知识库网格系统: Dual-Path Transformer for 3D Human Pose Estimation

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Dual-Path Transformer for 3D Human Pose Estimation

文献类型：期刊论文


作者	Zhou Lu4 ; Chen Yingying4 ; Wang Jinqiao1,2,3,4
刊名	IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
出版日期	2024
卷号	34 期号:5 页码:3260-3270
英文摘要	Video-based 3D human pose estimation has achieved great progress, however, it is still difficult to learn precise 2D-3D projection under some hard cases. Multi-level human knowledge and motion information serve as two key elements in the field to conquer the challenges caused by various factors, where the former encodes various human structure information spatially and the latter captures the motion change temporally. Inspired by this, we propose a DualFormer (dual-path transformer) network which encodes multiple human contexts and motion detail to perform the spatial-temporal modeling. Firstly, motion information which depicts the movement change of human body is embedded to provide explicit motion prior for the transformer module. Secondly, a dual-path transformer framework is proposed to model long-range dependencies of both joint sequence and limb sequence. Parallel context embedding is performed initially and a cross transformer block is then appended to promote the interaction of the dual paths which improves the feature robustness greatly. Specifically, predic tions of multiple levels can be acquired simultaneously. Lastly, we employ the weighted distillation technique to accelerate the convergence of the dual-path framework. We conduct extensive experiments on three different benchmarks, i.e., Human 3.6M, MPI-INF-3DHP and HumanEva-I. We mainly compute the MPJPE, P-MPJPE, PCK and AUC to evaluate the effective ness of proposed approach and our work achieves competitive results compared with state-of-the-art approaches. Specifically, the MPJPE is reduced to 42.8mm which is 1.5mm lower than PoseFormer on Human3.6M, which proves the efficacy of the proposed approach.
源URL	[http://ir.ia.ac.cn/handle/173211/57148]
专题	紫东太初大模型研究中心
通讯作者	Chen Yingying
作者单位	1.Wuhan AI Research 2.School of Artificial Intelligence, University of Chinese Academy of Sciences 3.Peng Cheng Laboratory 4.Foundation Model Research Center, Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Zhou Lu,Chen Yingying,Wang Jinqiao. Dual-Path Transformer for 3D Human Pose Estimation[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2024,34(5):3260-3270.
APA	Zhou Lu,Chen Yingying,&Wang Jinqiao.(2024).Dual-Path Transformer for 3D Human Pose Estimation.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,34(5),3260-3270.
MLA	Zhou Lu,et al."Dual-Path Transformer for 3D Human Pose Estimation".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34.5(2024):3260-3270.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。