中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Attentional Composition Networks for Long-Tailed Human Action Recognition

文献类型:期刊论文

作者Wang, Haoran5; Wang, Yajie5; Yu, Baosheng1; Zhan, Yibing2; Yuan, Chunfeng3; Yang, Wankou4
刊名ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS
出版日期2024
卷号20期号:1页码:18
ISSN号1551-6857
关键词Compositional learning long tail few-shot zero-shot action recognition
DOI10.1145/3603253
通讯作者Wang, Haoran(wanghaoran@ise.neu.edu.cn)
英文摘要The problem of long-tailed visual recognition has been receiving increasing research attention. However, the long-tailed distribution problem remains underexplored for video-based visual recognition. To address this issue, in this article we propose a compositional learning based solution for video-based human action recognition. Our method, named Attentional Composition Networks (ACN), first learns verb-like and prepositionlike components, then shuffles these components to generate samples for the tail classes in the feature space to augment the data for the tail classes. Specifically, during training, we represent each action video by a graph that captures the spatial-temporal relations (edges) among detected human/object instances (nodes). Then, ACN utilizes the position information to decompose each action into a set of verb and preposition representations using the edge features in the graph. After that, the verb and preposition features from different videos are combined via an attention structure to synthesize feature representations for tail classes. This way, we can enrich the data for the tail classes and consequently improve the action recognition for these classes. To evaluate the compositional human action recognition, we further contribute a new human action recognition dataset, namely NEU-Interaction (NEU-I). Experimental results on both Something-Something V2 and the proposed NEU-I demonstrate the effectiveness of the proposed method for long-tailed, few-shot, and zero-shot problems in human action recognition. Source code and the NEU-I dataset are available at https://github.com/YajieW99/ACN.
资助项目Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project[2021ZD0111700] ; Fundamental Research Funds for the Central Universities of China[N2304012] ; National Nature Science Foundation of China[61773117] ; National Nature Science Foundation of China[61972397] ; National Nature Science Foundation of China[62276061] ; National Nature Science Foundation of China[62002090]
WOS研究方向Computer Science
语种英语
出版者ASSOC COMPUTING MACHINERY
WOS记录号WOS:001080441800008
资助机构Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project ; Fundamental Research Funds for the Central Universities of China ; National Nature Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/52978]  
专题多模态人工智能系统全国重点实验室
通讯作者Wang, Haoran
作者单位1.Univ Sydney, Sch Comp Sci, Fac Engn, Darlington, NSW 2008, Australia
2.JD Explore Acad, Beijing 100176, Peoples R China
3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
4.Southeast Univ, Sch Automat, Nanjing, Peoples R China
5.Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China
推荐引用方式
GB/T 7714
Wang, Haoran,Wang, Yajie,Yu, Baosheng,et al. Attentional Composition Networks for Long-Tailed Human Action Recognition[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,2024,20(1):18.
APA Wang, Haoran,Wang, Yajie,Yu, Baosheng,Zhan, Yibing,Yuan, Chunfeng,&Yang, Wankou.(2024).Attentional Composition Networks for Long-Tailed Human Action Recognition.ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,20(1),18.
MLA Wang, Haoran,et al."Attentional Composition Networks for Long-Tailed Human Action Recognition".ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 20.1(2024):18.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。