中国科学院机构知识库网格系统: Coarse-to-Fine Recurrently Aligned Transformer with Balance Tokens for Video Moment Retrieval and Highlight Detection

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Coarse-to-Fine Recurrently Aligned Transformer with Balance Tokens for Video Moment Retrieval and Highlight Detection

文献类型：会议论文


作者	Pan Yi2 ; Zhang Yujia2 ; Chang Hui2 ; Shiying Sun2 ; Zhou Feihu 1; Zhao Xiaoguang2
出版日期	2024-06
会议日期	2024-6
会议地点	日本横滨
英文摘要	Video moment retrieval (MR) and highlight de tection (HD) are two user-oriented video understanding tasks aimed at extracting query-dependent or highlighted moments to provide valuable content for users. While many recent works have proposed solutions for the joint task of MR and HD leveraging transformer architecture, we argue that existing approaches have not adequately aligned the video and text modalities using basic transformer encoders, and have overlooked the misalignment between irrelevant video clips and text queries. To address these issues, we introduce COREBA: a Coarse-to-Fine Recurrently Aligned Transformer with Balance Tokens. Firstly, we design a plug-and-play Coarse-to-Fine Cross-modal interaction (CFC) module, replacing the original transformer encoder to align the two modalities in a progressive manner. Secondly, we present a novel Recurrent Alignment Mechanism (RAM) to deeply align the modalities in a recurrent fashion. Thirdly, to mitigate the misalignment problem, we append text queries with learnable Balance Tokens to restrict the text information fused with irrelevant clips. Extensive experiments validate the effectiveness and superiority of our proposed method.
会议录出版者	IJCNN
源URL	[http://ir.ia.ac.cn/handle/173211/57093]
专题	多模态人工智能系统全国重点实验室
通讯作者	Chang Hui
作者单位	1.中国人民解放军总医院第一医学中心 2.中国科学院自动化研究所
推荐引用方式 GB/T 7714	Pan Yi,Zhang Yujia,Chang Hui,et al. Coarse-to-Fine Recurrently Aligned Transformer with Balance Tokens for Video Moment Retrieval and Highlight Detection[C]. 见:. 日本横滨. 2024-6.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。