Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization
文献类型:期刊论文
作者 | Shao, Yuxiang1; Zhang, Feifei2,3; Xu, Changsheng4,5,6![]() |
刊名 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
出版日期 | 2024 |
卷号 | 26页码:6717-6729 |
关键词 | Contrastive learning knowledge distillation weakly-supervised temporal action localization |
ISSN号 | 1520-9210 |
DOI | 10.1109/TMM.2024.3355628 |
通讯作者 | Xu, Changsheng(csxu@nlpr.ia.ac.cn) |
英文摘要 | Weakly-supervised temporal action localization aims to localize action instances from untrimmed videos with only video-level labels. Due to the lack of frame-wise annotations, most methods embrace a localization-by-classification paradigm. However, the large supervision gap between classification and localization hinders models from obtaining accurate snippet-wise classification sequences and action proposals. We propose a snippet-to-prototype contrastive consensus network (SPCC-Net) to simultaneously generate feature-level and label-level supervision information to narrow the supervision gap between classification and localization. Specifically, the network adopts a two-stream framework incorporating the optical flow and fusion streams to fully leverage the motion and complementary information from multiple modalities. Firstly, the snippet-to-prototype contrast module is executed within each stream to learn prototypes for all categories and contrast them with action snippets to guarantee intra-class compactness and inter-class separability of snippet features. Secondly, for generating accurate label-level supervision information through complementary information of multimodal features, the multi-modality consensus module ensures not only category consistency through knowledge distillation but also semantic consistency through contrastive learning. Finally, we introduce the auxiliary multiple instance learning (MIL) loss to alleviate the issue that existing MIL-based methods only localize sparse discriminative snippets. Extensive experiments are conducted on two public datasets, THUMOS-14 and ActivityNet-1.3, to demonstrate the superior performance of our method over state-of-the-art methods. |
资助项目 | National Key Research and Development Plan of China |
WOS研究方向 | Computer Science ; Telecommunications |
语种 | 英语 |
WOS记录号 | WOS:001200272600003 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Key Research and Development Plan of China |
源URL | [http://ir.ia.ac.cn/handle/173211/58653] ![]() |
专题 | 自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队 |
通讯作者 | Xu, Changsheng |
作者单位 | 1.Tianjin Univ Technol, Tianjin 300382, Peoples R China 2.Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China 3.Tianjin Univ Technol, Key Lab Comp Vis & Syst, Minist Educ, Tianjin 300384, Peoples R China 4.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 5.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 6.Peng Cheng Lab, Shenzhen 518066, Peoples R China |
推荐引用方式 GB/T 7714 | Shao, Yuxiang,Zhang, Feifei,Xu, Changsheng. Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2024,26:6717-6729. |
APA | Shao, Yuxiang,Zhang, Feifei,&Xu, Changsheng.(2024).Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization.IEEE TRANSACTIONS ON MULTIMEDIA,26,6717-6729. |
MLA | Shao, Yuxiang,et al."Snippet-to-Prototype Contrastive Consensus Network for Weakly Supervised Temporal Action Localization".IEEE TRANSACTIONS ON MULTIMEDIA 26(2024):6717-6729. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。