中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks

文献类型:会议论文

作者Zhang, Yingying1,4; Gao, Junyu1,4; Yang, Xiaoshan1,2,4; Liu, Chang3; Li, Yan3; Xu, Changsheng1,2,4
出版日期2020-04-03
会议日期2020-02-07
会议地点Palo Alto, California USA
英文摘要

With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts.

源URL[http://ir.ia.ac.cn/handle/173211/51531]  
专题多模态人工智能系统全国重点实验室
作者单位1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences
2.Peng Cheng Laboratory
3.Kuaishou Technology
4.School of Artifical Intelligence, University of Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Zhang, Yingying,Gao, Junyu,Yang, Xiaoshan,et al. Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks[C]. 见:. Palo Alto, California USA. 2020-02-07.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。