中国科学院机构知识库网格系统: Learnable Feature Augmentation Framework for Temporal Action Localization

Learnable Feature Augmentation Framework for Temporal Action Localization

文献类型：期刊论文


作者	Tang, Yepeng 1,2,3; Wang, Weining4 ; Zhang, Chunjie1,2 ; Liu, Jing5,6 ; Zhao, Yao 1,2
刊名	IEEE TRANSACTIONS ON IMAGE PROCESSING
出版日期	2024
卷号	33 页码:4002-4015
关键词	Feature extraction Task analysis Semantics Location awareness Detectors Data augmentation Training Temporal action detection temporal action localization feature augmentation
ISSN号	1057-7149
DOI	10.1109/TIP.2024.3413599
通讯作者	Zhang, Chunjie(cjzhang@bjtu.edu.cn)
英文摘要	Temporal action localization (TAL) has drawn much attention in recent years, however, the performance of previous methods is still far from satisfactory due to the lack of annotated untrimmed video data. To deal with this issue, we propose to improve the utilization of current data through feature augmentation. Given an input video, we first extract video features with pre-trained video encoders, and then randomly mask various semantic contents of video features to consider different views of video features. To avoid damaging important action-related semantic information, we further develop a learnable feature augmentation framework to generate better views of videos. In particular, a Mask-based Feature Augmentation Module (MFAM) is proposed. The MFAM has three advantages: 1) it captures the temporal and semantic relationships of original video features, 2) it generates masked features with indispensable action-related information, and 3) it randomly recycles some masked information to ensure diversity. Finally, we input the masked features and the original features into shared action detectors respectively, and perform action classification and localization jointly for model learning. The proposed framework can improve the robustness and generalization of action detectors by learning more and better views of videos. In the testing stage, the MFAM can be removed, which does not bring extra computational costs. Extensive experiments are conducted on four TAL benchmark datasets. Our proposed framework significantly improves different TAL models and achieves the state-of-the-art performances.
WOS关键词	REPRESENTATION
资助项目	Fundamental Research Funds for the Central Universities ; National Natural Science Foundation of China[U21B2043] ; National Natural Science Foundation of China[62072026] ; National Natural Science Foundation of China[62102419] ; National Natural Science Foundation of China[62120106009] ; Beijing Natural Science Foundation[JQ20022] ; Chinese Association for Artificial Intelligence (CAAI)-Compute Architecture for Neural Networks (CANN) Open Fund, developed on OpenI Community
WOS研究方向	Computer Science ; Engineering
语种	英语
WOS记录号	WOS:001258857900003
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	Fundamental Research Funds for the Central Universities ; National Natural Science Foundation of China ; Beijing Natural Science Foundation ; Chinese Association for Artificial Intelligence (CAAI)-Compute Architecture for Neural Networks (CANN) Open Fund, developed on OpenI Community
源URL	[http://ir.ia.ac.cn/handle/173211/59160]
专题	自动化研究所_模式识别国家重点实验室_图像与视频分析团队
通讯作者	Zhang, Chunjie
作者单位	1.Beijing Jiaotong Univ, Beijing Key Lab Adv Informat Sci, Sch Comp Sci & Technol, Beijing 100044, Peoples R China 2.Inst Informat Sci, Beijing Jiaotong Univ, Sch Comp Sci & Technol, Beijing 100044, Peoples R China 3.Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China 4.Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China 5.Chinese Acad Sci, Inst Automat, Lab Cognit & Decis Intelligence Complex Syst, Beijing 100190, Peoples R China 6.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
推荐引用方式 GB/T 7714	Tang, Yepeng,Wang, Weining,Zhang, Chunjie,et al. Learnable Feature Augmentation Framework for Temporal Action Localization[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING,2024,33:4002-4015.
APA	Tang, Yepeng,Wang, Weining,Zhang, Chunjie,Liu, Jing,&Zhao, Yao.(2024).Learnable Feature Augmentation Framework for Temporal Action Localization.IEEE TRANSACTIONS ON IMAGE PROCESSING,33,4002-4015.
MLA	Tang, Yepeng,et al."Learnable Feature Augmentation Framework for Temporal Action Localization".IEEE TRANSACTIONS ON IMAGE PROCESSING 33(2024):4002-4015.

入库方式： OAI收割

来源：自动化研究所

下载0

Learnable Feature Augmentation Framework for Temporal Action Localization

其他版本