中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Robust Video-Text Retrieval Via Noisy Pair Calibration

文献类型:期刊论文

作者Zhang, Huaiwen1,2,3; Yang, Yang1,2,3; Qi, Fan4; Qian, Shengsheng5,6; Xu, Changsheng5,6
刊名IEEE TRANSACTIONS ON MULTIMEDIA
出版日期2023
卷号25页码:8632-8645
ISSN号1520-9210
关键词Noise calibration uncertainty video text retrieval
DOI10.1109/TMM.2023.3239183
通讯作者Qian, Shengsheng(shengsheng.qian@nlpr.ia.ac.cn)
英文摘要Video-text retrieval is a fundamental task in managing the emerging massive amounts of video data. The main challenge focuses on learning a common representation space for videos and queries where the similarity measurement can reflect the semantic closeness. However, existing video-text retrieval models may suffer from the following noise in the common space learning procedure: First, the video-text correspondences in positive pairs may not be exact matches. The crowdsourcing annotation for existing datasets leads to inevitable tagging noise for non-expert annotators. Second, the learning of video-text representation is based on the negative samples randomly sampled. Instances that are semantically similar to the query may be incorrectly categorized as negative samples. To alleviate the adverse impact of these noisy pairs, we propose a novel robust video-text retrieval method that protects the model from noisy positive and negative pairs by identifying and calibrating noisy pairs with their uncertainty score. In particular, we propose a noisy pair identifier, which divides the training dataset into noisy and clean subsets based on the estimated uncertainty of each pair. Then, with the help of uncertainties, we calibrate the two types of noisy pairs with an adaptive margin triplet loss and a weighted triplet loss function, respectively. To verify the effectiveness of our methods, we conduct extensive experiments on three widely used datasets. Experimental results show that the proposed robust video-text retrieval methods successfully identify and calibrate the noisy pairs and improve retrieval performance.
资助项目National Natural Science Foundation of China
WOS研究方向Computer Science ; Telecommunications
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001125902000070
资助机构National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/54883]  
专题多模态人工智能系统全国重点实验室
通讯作者Qian, Shengsheng
作者单位1.Inner Mongolia Univ, Coll Comp Sci, Mongolia 010031, Peoples R China
2.Natl & Local Joint Engn Res Ctr Intelligent Inform, Mongolia 010031, Peoples R China
3.Inner Mongolia Key Lab Mongolian Informat Proc Tec, Hohhot 010021, Peoples R China
4.Tianjin Univ Technol, Sch Comp Sci & Engn, Tianjin 300384, Peoples R China
5.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
6.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Huaiwen,Yang, Yang,Qi, Fan,et al. Robust Video-Text Retrieval Via Noisy Pair Calibration[J]. IEEE TRANSACTIONS ON MULTIMEDIA,2023,25:8632-8645.
APA Zhang, Huaiwen,Yang, Yang,Qi, Fan,Qian, Shengsheng,&Xu, Changsheng.(2023).Robust Video-Text Retrieval Via Noisy Pair Calibration.IEEE TRANSACTIONS ON MULTIMEDIA,25,8632-8645.
MLA Zhang, Huaiwen,et al."Robust Video-Text Retrieval Via Noisy Pair Calibration".IEEE TRANSACTIONS ON MULTIMEDIA 25(2023):8632-8645.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。