中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning

文献类型:期刊论文

作者Zhou, Wanting1; Kong, Longteng1,2; Han, Yushan1; Qin, Jie3; Sun, Zhenan4
刊名IEEE TRANSACTIONS ON MULTIMEDIA
出版日期2024
卷号26页码:353-366
关键词Group activity representation learning group activity recognition self-supervised learning transformer predictive coding
ISSN号1520-9210
DOI10.1109/TMM.2023.3265280
通讯作者Kong, Longteng(konglongteng@bupt.edu.cn) ; Qin, Jie(jie.qin@nuaa.edu.cn)
英文摘要Group activity analysis has attracted remarkable attention recently due to the widespread applications in security, entertainment and military. This article targets at learning group activity representations with self-supervision, which differs from the majorities relying heavily on manually annotated labels. Moreover, existing Self-Supervised Learning (SSL) methods for videos are sub-optimal to generate such representations because of the complex context dynamics in group activities. In this article, an end-to-end framework termed Contextualized Relation Predictive Model (Con-RPM) is proposed for self-supervised group activity representation learning with predictive coding. It involves the Serial-Parallel Transformer Encoder (SPTrans-Encoder) to model the context of spatial interactions and temporal variations, and the Hybrid Context Transformer Decoder (HConTrans-Decoder) to predict the future spatio-temporal relations guided by holistic scene context. Additionally, to improve the discriminability and consistency of prediction, we introduce a united loss integrating group-wise and person-wise contrastive losses in frame-level as well as the adversarial loss in global sequence-level. Consequently, our Con-RPM learns robust group representations via describing temporal evolutions of individual relationships and scene semantics explicitly. Extensive experimental results on downstream tasks indicate the effectiveness and generalization of our model in self-supervised learning, and present state-of-the-art performance on the Volleyball, Collective Activity, VolleyTactic, and Choi's New datasets.
资助项目National Natural Science Foundation of China
WOS研究方向Computer Science ; Telecommunications
语种英语
WOS记录号WOS:001157873000028
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/57758]  
专题多模态人工智能系统全国重点实验室
通讯作者Kong, Longteng; Qin, Jie
作者单位1.Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China
2.Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China
3.Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China
4.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Zhou, Wanting,Kong, Longteng,Han, Yushan,et al. Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:353-366.
APA Zhou, Wanting,Kong, Longteng,Han, Yushan,Qin, Jie,&Sun, Zhenan.(2024).Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning.IEEE TRANSACTIONS ON MULTIMEDIA,26,353-366.
MLA Zhou, Wanting,et al."Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):353-366.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。