中国科学院机构知识库网格系统: Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos

文献类型：期刊论文


作者	Guyue Hu2 ; Bin He 1; Hanwang Zhang 2
刊名	Machine Intelligence Research
出版日期	2023
卷号	20 期号:2 页码:249-262
关键词	Prompt learning video-language pretrained models instructional videos procedure understanding knowledge distilling
ISSN号	2731-538X
DOI	10.1007/s11633-022-1409-1
英文摘要	Instructional videos are very useful for completing complex daily tasks, which naturally contain abundant clip-narration pairs. Existing works for procedure understanding are keen on pretraining various video-language models with these pairs and then fine tuning downstream classifiers and localizers in predetermined category space. These video-language models are proficient at representing short-term actions, basic objects, and their combinations, but they are still far from understanding long-term procedures. In addition, the predetermined procedure category faces the problem of combination disaster and is inherently inapt to unseen procedures. Therefore, we propose a novel compositional prompt learning (CPL) framework to understand long-term procedures by prompting short-term video-language models and reformulating several classical procedure understanding tasks into general video-text matching problems. Specifically, the proposed CPL consists of one visual prompt and three compositional textual prompts (including the action prompt, object prompt, and procedure prompt), which could compositionally distill knowledge from short-term video-language models to facilitate long-term procedure understanding. Besides, the task reformulation enables our CPL to perform well in all zero-shot, few shot, and fully-supervised settings. Extensive experiments on two widely-used datasets for procedure understanding demonstrate the effectiveness of the proposed approach.
源URL	[http://ir.ia.ac.cn/handle/173211/55978]
专题	自动化研究所_学术期刊_International Journal of Automation and Computing
作者单位	1.The 15th Research Institute of China Electronics Technology Group Corporation, Beijing 100083, China 2.Nanyang Technological University, Singapore 639798, Singapore
推荐引用方式 GB/T 7714	Guyue Hu,Bin He,Hanwang Zhang. Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos[J]. Machine Intelligence Research,2023,20(2):249-262.
APA	Guyue Hu,Bin He,&Hanwang Zhang.(2023).Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos.Machine Intelligence Research,20(2),249-262.
MLA	Guyue Hu,et al."Compositional Prompting Video-language Models to Understand Procedure in Instructional Videos".Machine Intelligence Research 20.2(2023):249-262.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。