Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval
文献类型:期刊论文
作者 | Wang, Wei1,2,3![]() ![]() ![]() ![]() |
刊名 | IEEE TRANSACTIONS ON MULTIMEDIA
![]() |
出版日期 | 2021 |
卷号 | 23页码:2386-2397 |
关键词 | Feature extraction Encoding Task analysis Semantics Data models Cognition Focusing Video-text retrieval graph neural network coarse-to-fine strategy |
ISSN号 | 1520-9210 |
DOI | 10.1109/TMM.2020.3011288 |
通讯作者 | Xu, Changsheng(csxu@nlpr.ia.ac.cn) |
英文摘要 | We address the problem of video-text retrieval that searches videos via natural language description or vice versa. Most state-of-the-art methods only consider cross-modal learning for two or three data points in isolation, ignoring to get benefit from the structural information of other data points from a global view. In this paper, we propose to exploit the comprehensive relationships among cross-modal samples via Graph Neural Networks (GNN). To improve the discriminative ability for accurately finding the positive sample, a Coarse-to-Fine GNN is constructed, which can progressively optimize the retrieval results via multi-step reasoning. Specifically, we first adopt heuristic edge features to represent relationships. Then we design a scoring module in each layer to rank the edges connected to the query node and drop the edges with lower scores. Finally, to alleviate the class imbalance issue, we propose a random-drop focal loss to optimize the whole framework. Extensive experimental results show that our method consistently outperforms the state-of-the-arts on four benchmarks. |
WOS关键词 | FEATURES ; IMAGE |
资助项目 | National Key Research and Development Program of China[2018AAA0102200] ; National Natural Science Foundation of China[61720106006] ; National Natural Science Foundation of China[61721004] ; National Natural Science Foundation of China[61832002] ; National Natural Science Foundation of China[61702511] ; National Natural Science Foundation of China[61751211] ; National Natural Science Foundation of China[61532009] ; National Natural Science Foundation of China[U1836220] ; National Natural Science Foundation of China[U1705262] ; National Natural Science Foundation of China[61872424] ; National Natural Science Foundation of China[61936005] ; Key Research Program of Frontier Sciences of CAS[QYZDJSSWJSC039] ; Research Program of National Laboratory of Pattern Recognition[Z-2018007] |
WOS研究方向 | Computer Science ; Telecommunications |
语种 | 英语 |
WOS记录号 | WOS:000679533800018 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Key Research and Development Program of China ; National Natural Science Foundation of China ; Key Research Program of Frontier Sciences of CAS ; Research Program of National Laboratory of Pattern Recognition |
源URL | [http://ir.ia.ac.cn/handle/173211/45590] ![]() |
专题 | 自动化研究所_模式识别国家重点实验室_多媒体计算与图形学团队 多模态人工智能系统全国重点实验室 |
通讯作者 | Xu, Changsheng |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 3.PengCheng Lab, Shenzhen, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Wei,Gao, Junyu,Yang, Xiaoshan,et al. Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2021,23:2386-2397. |
APA | Wang, Wei,Gao, Junyu,Yang, Xiaoshan,&Xu, Changsheng.(2021).Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval.IEEE TRANSACTIONS ON MULTIMEDIA,23,2386-2397. |
MLA | Wang, Wei,et al."Learning Coarse-to-Fine Graph Neural Networks for Video-Text Retrieval".IEEE TRANSACTIONS ON MULTIMEDIA 23(2021):2386-2397. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。