A Unified Multimodal De- and Re-Coupling Framework for RGB-D Motion Recognition
文献类型:期刊论文
作者 | Zhou, Benjia1; Wang, Pichao2,3; Wan, Jun1,4,5; Liang, Yanyan1; Wang, Fan2 |
刊名 | IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE |
出版日期 | 2023-10-01 |
卷号 | 45期号:10页码:11428-11442 |
ISSN号 | 0162-8828 |
关键词 | Spatiotemporal phenomena Representation learning Training Optimization Task analysis Three-dimensional displays Solid modeling Complement feature late fusion motion recognition RGB-D video augmentation |
DOI | 10.1109/TPAMI.2023.3274783 |
通讯作者 | Wang, Pichao(pichaowang@gmail.com) ; Wan, Jun(jun.wan@ia.ac.cn) |
英文摘要 | recognition is a promising direction in computer vision, but the training of video classification models is much harder than images due to insufficient data and considerable parameters. To get around this, some works strive to explore multimodal cues from RGB-D data. Although improving motion recognition to some extent, these methods still face sub-optimal situations in the following aspects: (i) Data augmentation, i.e., the scale of the RGB-D datasets is still limited, and few efforts have been made to explore novel data augmentation strategies for videos; (ii) Optimization mechanism, i.e., the tightly space-time-entangled network structure brings more challenges to spatiotemporal information modeling; And (iii) cross-modal knowledge fusion, i.e., the high similarity between multimodal representations leads to insufficient late fusion. To alleviate these drawbacks, we propose to improve RGB-D-based motion recognition both from data and algorithm perspectives in this article. In more detail, firstly, we introduce a novel video data augmentation method dubbed ShuffleMix, which acts as a supplement to MixUp, to provide additional temporal regularization for motion recognition. Secondly, a Unified Multimodal De-coupling and multi-stage Re-coupling framework, termed UMDR, is proposed for video representation learning. Finally, a novel cross-modal Complement Feature Catcher (CFCer) is explored to mine potential commonalities features in multimodal information as the auxiliary fusion stream, to improve the late fusion results. The seamless combination of these novel designs forms a robust spatiotemporal representation and achieves better performance than state-of-the-art methods on four public motion datasets. Specifically, UMDR achieves unprecedented improvements of ? 4.5% on the Chalearn IsoGD dataset. |
WOS关键词 | SCALE GESTURE RECOGNITION ; FUSION ; NETWORKS |
资助项目 | National Key Research and Development Plan ; External cooperation key project of Chinese Academy Sciences[2021YFE0205700] ; Science and Technology Development Fund of Macau[173211KYSB20200002] ; Science and Technology Development Fund of Macau[0123/2022/A3] ; Science and Technology Development Fund of Macau[0070/2020/AMJ] ; Guangdong Provincial Key Ramp;D Programme[0004/2020/A1] ; Open Research Projects of Zhejiang Lab[2019B010148001] ; CCF-Zhipu AI Large Model OF[2021KH0AB07] ; Alibaba Group through Alibaba Research Intern Program ; [202219] |
WOS研究方向 | Computer Science ; Engineering |
语种 | 英语 |
出版者 | IEEE COMPUTER SOC |
WOS记录号 | WOS:001068816800002 |
资助机构 | National Key Research and Development Plan ; External cooperation key project of Chinese Academy Sciences ; Science and Technology Development Fund of Macau ; Guangdong Provincial Key Ramp;D Programme ; Open Research Projects of Zhejiang Lab ; CCF-Zhipu AI Large Model OF ; Alibaba Group through Alibaba Research Intern Program |
源URL | [http://ir.ia.ac.cn/handle/173211/53023] |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Wang, Pichao; Wan, Jun |
作者单位 | 1.Macau Univ Sci & Technol, Taipa 999078, Macau, Peoples R China 2.Alibaba Grp US Inc, DAMO Acad, Bellevue, WA 98004 USA 3.Amazon, Seattle, WA 98109 USA 4.Chinese Acad Sci CASIA, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 5.Univ Chinese Acad Sci UCAS, Sch Artificial Intelligence, Beijing 100049, Peoples R China |
推荐引用方式 GB/T 7714 | Zhou, Benjia,Wang, Pichao,Wan, Jun,et al. A Unified Multimodal De- and Re-Coupling Framework for RGB-D Motion Recognition[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,2023,45(10):11428-11442. |
APA | Zhou, Benjia,Wang, Pichao,Wan, Jun,Liang, Yanyan,&Wang, Fan.(2023).A Unified Multimodal De- and Re-Coupling Framework for RGB-D Motion Recognition.IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,45(10),11428-11442. |
MLA | Zhou, Benjia,et al."A Unified Multimodal De- and Re-Coupling Framework for RGB-D Motion Recognition".IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 45.10(2023):11428-11442. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。