Attentional Composition Networks for Long-Tailed Human Action Recognition
文献类型:期刊论文
作者 | Wang, Haoran5; Wang, Yajie5; Yu, Baosheng1; Zhan, Yibing2; Yuan, Chunfeng3; Yang, Wankou4 |
刊名 | ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS |
出版日期 | 2024 |
卷号 | 20期号:1页码:18 |
ISSN号 | 1551-6857 |
关键词 | Compositional learning long tail few-shot zero-shot action recognition |
DOI | 10.1145/3603253 |
通讯作者 | Wang, Haoran(wanghaoran@ise.neu.edu.cn) |
英文摘要 | The problem of long-tailed visual recognition has been receiving increasing research attention. However, the long-tailed distribution problem remains underexplored for video-based visual recognition. To address this issue, in this article we propose a compositional learning based solution for video-based human action recognition. Our method, named Attentional Composition Networks (ACN), first learns verb-like and prepositionlike components, then shuffles these components to generate samples for the tail classes in the feature space to augment the data for the tail classes. Specifically, during training, we represent each action video by a graph that captures the spatial-temporal relations (edges) among detected human/object instances (nodes). Then, ACN utilizes the position information to decompose each action into a set of verb and preposition representations using the edge features in the graph. After that, the verb and preposition features from different videos are combined via an attention structure to synthesize feature representations for tail classes. This way, we can enrich the data for the tail classes and consequently improve the action recognition for these classes. To evaluate the compositional human action recognition, we further contribute a new human action recognition dataset, namely NEU-Interaction (NEU-I). Experimental results on both Something-Something V2 and the proposed NEU-I demonstrate the effectiveness of the proposed method for long-tailed, few-shot, and zero-shot problems in human action recognition. Source code and the NEU-I dataset are available at https://github.com/YajieW99/ACN. |
资助项目 | Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project[2021ZD0111700] ; Fundamental Research Funds for the Central Universities of China[N2304012] ; National Nature Science Foundation of China[61773117] ; National Nature Science Foundation of China[61972397] ; National Nature Science Foundation of China[62276061] ; National Nature Science Foundation of China[62002090] |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | ASSOC COMPUTING MACHINERY |
WOS记录号 | WOS:001080441800008 |
资助机构 | Major Science and Technology Innovation 2030 New Generation Artificial Intelligence key project ; Fundamental Research Funds for the Central Universities of China ; National Nature Science Foundation of China |
源URL | [http://ir.ia.ac.cn/handle/173211/52978] |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Wang, Haoran |
作者单位 | 1.Univ Sydney, Sch Comp Sci, Fac Engn, Darlington, NSW 2008, Australia 2.JD Explore Acad, Beijing 100176, Peoples R China 3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 4.Southeast Univ, Sch Automat, Nanjing, Peoples R China 5.Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Haoran,Wang, Yajie,Yu, Baosheng,et al. Attentional Composition Networks for Long-Tailed Human Action Recognition[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,2024,20(1):18. |
APA | Wang, Haoran,Wang, Yajie,Yu, Baosheng,Zhan, Yibing,Yuan, Chunfeng,&Yang, Wankou.(2024).Attentional Composition Networks for Long-Tailed Human Action Recognition.ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS,20(1),18. |
MLA | Wang, Haoran,et al."Attentional Composition Networks for Long-Tailed Human Action Recognition".ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS 20.1(2024):18. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。