中国科学院机构知识库网格系统: DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation

DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation

文献类型：期刊论文


作者	Chen, Yuxin6,7 ; Zhang, Ziqi6 ; Qi, Zhongang 1; Yuan, Chunfeng6 ; Wang, Jie2 ; Shan, Ying 1; Li, Bing3,6 ; Hu, Weiming6,7,8 ; Qie, Xiaohu 4; Wu, Jianping 5
刊名	IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
出版日期	2024-04-01
卷号	34 期号:4 页码:2041-2055
关键词	Chinese video captioning evaluation dual-reconstruction transformer
ISSN号	1051-8215
DOI	10.1109/TCSVT.2023.3299932
英文摘要	Video captioning evaluation aims at assessing the semantic consistency between video and candidate text, which should include measurement from two aspects: faithfulness (whether the information conveyed by candidate is correct w.r.t. video) and comprehensiveness (whether the main video content is covered by candidate). However, previous approaches have difficulty in evaluating faithfulness and comprehensiveness due to heavy reliance on references or heterogeneous of visual and textual data. In this paper, we propose a vision-involved evaluation metric based on a novel DuAl-Reconstruction Transformer, named DARTScore. DARTScore formulates the caption evaluation task as a dual-reconstruction problem to evaluate both faithfulness and comprehensiveness explicitly. Since the word in a candidate is usually related to several frames, DARTScore adaptively collects relevant frames to reconstruct the word and computes the reconstruction accuracy as faithfulness to inherently reflect whether the word information is contained in the video. In the inversive way, DARTScore reconstructs each frame with relevant words to evaluate comprehensiveness. By integrating fine-grained bidirectional reconstruction accuracies, DARTScore drills into each word in candidate and each frame in video to fully evaluate the semantic consistency. Furthermore, we collect and annotate two Chinese datasets with a large domain gap, named CRAETE-EVAL and VATEX-ZH-EVAL, to systematically evaluate existing metrics and fill the blank of Chinese video captioning evaluation. Experimental results show that DARTScore achieves higher correlation with human judgments, has lower reference reliance, and generalizes well to data from different domains.
WOS关键词	NETWORK
资助项目	Beijing Natural Science Foundation
WOS研究方向	Engineering
语种	英语
WOS记录号	WOS:001197960500021
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	Beijing Natural Science Foundation
源URL	[http://ir.ia.ac.cn/handle/173211/57056]
专题	自动化研究所_模式识别国家重点实验室_视频内容安全团队
通讯作者	Yuan, Chunfeng
作者单位	1.Tencent PCG, ARC Lab, Shenzhen 518057, Peoples R China 2.Tencent PCG, IPS Search, Shenzhen 518057, Peoples R China 3.People AI Inc, Beijing 100190, Peoples R China 4.Tencent PCG, Shenzhen 518057, Peoples R China 5.Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China 6.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 7.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100190, Peoples R China 8.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
推荐引用方式 GB/T 7714	Chen, Yuxin,Zhang, Ziqi,Qi, Zhongang,et al. DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2024,34(4):2041-2055.
APA	Chen, Yuxin.,Zhang, Ziqi.,Qi, Zhongang.,Yuan, Chunfeng.,Wang, Jie.,...&Wu, Jianping.(2024).DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,34(4),2041-2055.
MLA	Chen, Yuxin,et al."DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 34.4(2024):2041-2055.

入库方式： OAI收割

来源：自动化研究所

下载0

DARTScore: DuAl-Reconstruction Transformer for Video Captioning Evaluation

其他版本