中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

文献类型:期刊论文

作者Zitong, Yu4; Benjia, Zhou5; Jun, Wan6; Pichao, Wang1; Haoyu, Chen4; Xin, Liu2; Stan, Z., Li3; Guoying, Zhao4
刊名IEEE Transactions on Image Processing
出版日期2021
卷号30页码:5626-5640
英文摘要

Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)- based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for multi-sampling-rate branches and lateral connections among varied modalities. The resultant multi-modal multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics. Comprehensive experiments are performed on three benchmark datasets (IsoGD, NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in both single- and multi-modality settings. The code is available at https://github.com/ZitongYu/3DCDC-NAS.

语种英语
源URL[http://ir.ia.ac.cn/handle/173211/57115]  
专题多模态人工智能系统全国重点实验室
通讯作者Jun, Wan; Guoying, Zhao
作者单位1.Alibaba Group
2.Tianjin University
3.Westlake University
4.University of Oulu
5.Macau University of Science and Technology
6.Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Zitong, Yu,Benjia, Zhou,Jun, Wan,et al. Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition[J]. IEEE Transactions on Image Processing,2021,30:5626-5640.
APA Zitong, Yu.,Benjia, Zhou.,Jun, Wan.,Pichao, Wang.,Haoyu, Chen.,...&Guoying, Zhao.(2021).Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition.IEEE Transactions on Image Processing,30,5626-5640.
MLA Zitong, Yu,et al."Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition".IEEE Transactions on Image Processing 30(2021):5626-5640.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。