中国科学院机构知识库网格系统: Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning

Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning

文献类型：期刊论文


作者	Zhou, Wanting1 ; Kong, Longteng 1,2; Han, Yushan 1; Qin, Jie3 ; Sun, Zhenan4
刊名	IEEE TRANSACTIONS ON MULTIMEDIA
出版日期	2024
卷号	26 页码:353-366
关键词	Group activity representation learning group activity recognition self-supervised learning transformer predictive coding
ISSN号	1520-9210
DOI	10.1109/TMM.2023.3265280
通讯作者	Kong, Longteng(konglongteng@bupt.edu.cn) ; Qin, Jie(jie.qin@nuaa.edu.cn)
英文摘要	Group activity analysis has attracted remarkable attention recently due to the widespread applications in security, entertainment and military. This article targets at learning group activity representations with self-supervision, which differs from the majorities relying heavily on manually annotated labels. Moreover, existing Self-Supervised Learning (SSL) methods for videos are sub-optimal to generate such representations because of the complex context dynamics in group activities. In this article, an end-to-end framework termed Contextualized Relation Predictive Model (Con-RPM) is proposed for self-supervised group activity representation learning with predictive coding. It involves the Serial-Parallel Transformer Encoder (SPTrans-Encoder) to model the context of spatial interactions and temporal variations, and the Hybrid Context Transformer Decoder (HConTrans-Decoder) to predict the future spatio-temporal relations guided by holistic scene context. Additionally, to improve the discriminability and consistency of prediction, we introduce a united loss integrating group-wise and person-wise contrastive losses in frame-level as well as the adversarial loss in global sequence-level. Consequently, our Con-RPM learns robust group representations via describing temporal evolutions of individual relationships and scene semantics explicitly. Extensive experimental results on downstream tasks indicate the effectiveness and generalization of our model in self-supervised learning, and present state-of-the-art performance on the Volleyball, Collective Activity, VolleyTactic, and Choi's New datasets.
资助项目	National Natural Science Foundation of China
WOS研究方向	Computer Science ; Telecommunications
语种	英语
WOS记录号	WOS:001157873000028
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	National Natural Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/57758]
专题	多模态人工智能系统全国重点实验室
通讯作者	Kong, Longteng; Qin, Jie
作者单位	1.Beijing Univ Posts & Telecommun, Beijing 100876, Peoples R China 2.Beihang Univ, Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing 100191, Peoples R China 3.Nanjing Univ Aeronaut & Astronaut, Nanjing 211106, Peoples R China 4.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Zhou, Wanting,Kong, Longteng,Han, Yushan,et al. Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:353-366.
APA	Zhou, Wanting,Kong, Longteng,Han, Yushan,Qin, Jie,&Sun, Zhenan.(2024).Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning.IEEE TRANSACTIONS ON MULTIMEDIA,26,353-366.
MLA	Zhou, Wanting,et al."Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):353-366.

入库方式： OAI收割

来源：自动化研究所

下载0

Contextualized Relation Predictive Model for Self-Supervised Group Activity Representation Learning

其他版本