Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video-Text Retrieval
文献类型:期刊论文
作者 | Wang, Wei![]() ![]() ![]() ![]() |
刊名 | IEEE Transactions on Multimedia
![]() |
出版日期 | 2022 |
页码 | 1-15 |
英文摘要 | The problem of video-text retrieval, which searches videos via natural language descriptions or vice versa, has attracted growing attention due to the explosive scale of videos produced every day. The dominant approaches for this problem follow the pipeline that firstly learn compact feature representations of videos and texts, and then jointly embed them into a common feature space where matched video-text pairs are close and unmatched pairs are far away. However, most of them neither consider the structural similarities among crossmodal samples in a global view, nor leverage useful information from other relevant retrieval processes. We argue that both information have great potential for video-text retrieval. In this paper, we propose to extract useful knowledge from the retrieval process by exploiting structural similarities via Graph Neural Networks (GNNs) and then progressively transfer useful knowledge from relevant retrieval processes in a general-tospecific manner to assist the current retrieval process. Specifically, for the retrieval of the current query, we first construct a sequence of query-graphs whose central queries are chosen from distant to close to the current query. Then we conduct knowledgeguided message passing in each query-graph to exploit regional structural similarities and gather knowledge of different levels from the updated query-graphs with a knowledge-based attention mechanism. Finally, we transfer the extracted useful knowledge from general to specific to assist the current retrieval process. Extensive experimental results show that our model outperforms the state-of-the-arts on four benchmarks. |
源URL | [http://ir.ia.ac.cn/handle/173211/51525] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
作者单位 | 1.NLPR, Institute of Automation, Chinese Academy of Sciences 2.School of Artifical Intelligence, University of Chinese Academy of Sciences 3.Peng Cheng Laboratory |
推荐引用方式 GB/T 7714 | Wang, Wei,Gao, Junyu,Yang, Xiaoshan,et al. Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video-Text Retrieval[J]. IEEE Transactions on Multimedia,2022:1-15. |
APA | Wang, Wei,Gao, Junyu,Yang, Xiaoshan,&Xu, Changsheng.(2022).Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video-Text Retrieval.IEEE Transactions on Multimedia,1-15. |
MLA | Wang, Wei,et al."Many Hands Make Light Work: Transferring Knowledge from Auxiliary Tasks for Video-Text Retrieval".IEEE Transactions on Multimedia (2022):1-15. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。