中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Learnable Feature Augmentation Framework for Temporal Action Localization

文献类型:期刊论文

作者Tang, Yepeng1,2,3; Wang, Weining4; Zhang, Chunjie1,2; Liu, Jing5,6; Zhao, Yao1,2
刊名IEEE TRANSACTIONS ON IMAGE PROCESSING
出版日期2024
卷号33页码:4002-4015
关键词Feature extraction Task analysis Semantics Location awareness Detectors Data augmentation Training Temporal action detection temporal action localization feature augmentation
ISSN号1057-7149
DOI10.1109/TIP.2024.3413599
通讯作者Zhang, Chunjie(cjzhang@bjtu.edu.cn)
英文摘要Temporal action localization (TAL) has drawn much attention in recent years, however, the performance of previous methods is still far from satisfactory due to the lack of annotated untrimmed video data. To deal with this issue, we propose to improve the utilization of current data through feature augmentation. Given an input video, we first extract video features with pre-trained video encoders, and then randomly mask various semantic contents of video features to consider different views of video features. To avoid damaging important action-related semantic information, we further develop a learnable feature augmentation framework to generate better views of videos. In particular, a Mask-based Feature Augmentation Module (MFAM) is proposed. The MFAM has three advantages: 1) it captures the temporal and semantic relationships of original video features, 2) it generates masked features with indispensable action-related information, and 3) it randomly recycles some masked information to ensure diversity. Finally, we input the masked features and the original features into shared action detectors respectively, and perform action classification and localization jointly for model learning. The proposed framework can improve the robustness and generalization of action detectors by learning more and better views of videos. In the testing stage, the MFAM can be removed, which does not bring extra computational costs. Extensive experiments are conducted on four TAL benchmark datasets. Our proposed framework significantly improves different TAL models and achieves the state-of-the-art performances.
WOS关键词REPRESENTATION
资助项目Fundamental Research Funds for the Central Universities ; National Natural Science Foundation of China[U21B2043] ; National Natural Science Foundation of China[62072026] ; National Natural Science Foundation of China[62102419] ; National Natural Science Foundation of China[62120106009] ; Beijing Natural Science Foundation[JQ20022] ; Chinese Association for Artificial Intelligence (CAAI)-Compute Architecture for Neural Networks (CANN) Open Fund, developed on OpenI Community
WOS研究方向Computer Science ; Engineering
语种英语
WOS记录号WOS:001258857900003
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构Fundamental Research Funds for the Central Universities ; National Natural Science Foundation of China ; Beijing Natural Science Foundation ; Chinese Association for Artificial Intelligence (CAAI)-Compute Architecture for Neural Networks (CANN) Open Fund, developed on OpenI Community
源URL[http://ir.ia.ac.cn/handle/173211/59160]  
专题自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者Zhang, Chunjie
作者单位1.Beijing Jiaotong Univ, Beijing Key Lab Adv Informat Sci, Sch Comp Sci & Technol, Beijing 100044, Peoples R China
2.Inst Informat Sci, Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing 100044, Peoples R China
3.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China
4.Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China
5.Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China
6.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Tang, Yepeng,Wang, Weining,Zhang, Chunjie,et al. Learnable Feature Augmentation Framework for Temporal Action Localization[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2024,33:4002-4015.
APA Tang, Yepeng,Wang, Weining,Zhang, Chunjie,Liu, Jing,&Zhao, Yao.(2024).Learnable Feature Augmentation Framework for Temporal Action Localization.IEEE TRANSACTIONS ON IMAGE PROCESSING,33,4002-4015.
MLA Tang, Yepeng,et al."Learnable Feature Augmentation Framework for Temporal Action Localization".IEEE TRANSACTIONS ON IMAGE PROCESSING 33(2024):4002-4015.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。