Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks
文献类型:会议论文
作者 | Zhang, Yingying1,4![]() ![]() ![]() ![]() ![]() |
出版日期 | 2020-04-03 |
会议日期 | 2020-02-07 |
会议地点 | Palo Alto, California USA |
英文摘要 | With the increasing prevalence of portable computing devices, browsing unedited videos is time-consuming and tedious. Video highlight detection has the potential to significantly ease this situation, which discoveries moments of user's major or special interest in a video. Existing methods suffer from two problems. Firstly, most existing approaches only focus on learning holistic visual representations of videos but ignore object semantics for inferring video highlights. Secondly, current state-of-the-art approaches often adopt the pairwise ranking-based strategy, which cannot enjoy the global information to infer highlights. Therefore, we propose a novel video highlight framework, named VH-GNN, to construct an object-aware graph and model the relationships between objects from a global view. To reduce computational cost, we decompose the whole graph into two types of graphs: a spatial graph to capture the complex interactions of object within each frame, and a temporal graph to obtain object-aware representation of each frame and capture the global information. In addition, we optimize the framework via a proposed multi-stage loss, where the first stage aims to determine the highlight-probability and the second stage leverage the relationships between frames and focus on hard examples from the former stage. Extensive experiments on two standard datasets strongly evidence that VH-GNN obtains significant performance compared with state-of-the-arts. |
源URL | [http://ir.ia.ac.cn/handle/173211/51531] ![]() |
专题 | 多模态人工智能系统全国重点实验室 |
作者单位 | 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences 2.Peng Cheng Laboratory 3.Kuaishou Technology 4.School of Artifical Intelligence, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Zhang, Yingying,Gao, Junyu,Yang, Xiaoshan,et al. Find objects and focus on highlights: Mining object semantics for video highlight detection via graph neural networks[C]. 见:. Palo Alto, California USA. 2020-02-07. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。