Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning
文献类型:期刊论文
作者 | Zhou, Wanting1![]() ![]() ![]() |
刊名 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
出版日期 | 2024 |
卷号 | 26页码:353-366 |
关键词 | Group activity representation learning group activity recognition self-supervised learning transformer predictive coding |
ISSN号 | 1520-9210 |
DOI | 10.1109/TMM.2023.3265280 |
通讯作者 | Kong, Longteng(konglongteng@bupt.edu.cn) ; Qin, Jie(jie.qin@nuaa.edu.cn) |
英文摘要 | Group activity analysis has attracted remarkable attention recently due to the widespread applications in security, entertainment and military. This article targets at learning group activity representations with self-supervision, which differs from the majorities relying heavily on manually annotated labels. Moreover, existing Self-Supervised Learning (SSL) methods for videos are sub-optimal to generate such representations because of the complex context dynamics in group activities. In this article, an end-to-end framework termed Contextualized Relation Predictive Model (Con-RPM) is proposed for self-supervised group activity representation learning with predictive coding. It involves the Serial-Parallel Transformer Encoder (SPTrans-Encoder) to model the context of spatial interactions and temporal variations, and the Hybrid Context Transformer Decoder (HConTrans-Decoder) to predict the future spatio-temporal relations guided by holistic scene context. Additionally, to improve the discriminability and consistency of prediction, we introduce a united loss integrating group-wise and person-wise contrastive losses in frame-level as well as the adversarial loss in global sequence-level. Consequently, our Con-RPM learns robust group representations via describing temporal evolutions of individual relationships and scene semantics explicitly. Extensive experimental results on downstream tasks indicate the effectiveness and generalization of our model in self-supervised learning, and present state-of-the-art performance on the Volleyball, Collective Activity, VolleyTactic, and Choi's New datasets. |
资助项目 | National Natural Science Foundation of China |
WOS研究方向 | Computer Science ; Telecommunications |
语种 | 英语 |
WOS记录号 | WOS:001157873000028 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Natural Science Foundation of China |
源URL | [http://ir.ia.ac.cn/handle/173211/57758] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Kong, Longteng; Qin, Jie |
作者单位 | 1.Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China 2.Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China 3.Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China 4.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Zhou, Wanting,Kong, Longteng,Han, Yushan,et al. Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:353-366. |
APA | Zhou, Wanting,Kong, Longteng,Han, Yushan,Qin, Jie,&Sun, Zhenan.(2024).Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning.IEEE TRANSACTIONS ON MULTIMEDIA,26,353-366. |
MLA | Zhou, Wanting,et al."Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):353-366. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。