中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Find and Focus: Retrieve and Localize Video Events with Natural Language Queries

文献类型:会议论文

作者Dian Shao; Yu Xiong; Yue Zhao; Qingqiu Huang; Yu Qiao; Dahua Lin
出版日期2018
会议日期2018
英文摘要The thriving of video sharing services brings new challenges to video retrieval, e.g. the rapid growth in video duration and content diversity. Meeting such challenges calls for new techniques that can effectively retrieve videos with natural language queries. Existing methods along this line, which mostly rely on embedding videos as a whole, remain far from satisfactory for real-world applications due to the limited expressive power. In this work, we aim to move beyond this limitation by delving into the internal structures of both sides, the queries and the videos. Specifically, we propose a new framework called Find and Focus (FIFO), which not only performs top-level matching (paragraph vs. video), but also makes part-level associations, localizing a video clip for each sentence in the query with the help of a focusing guide. These levels are complementary – the top-level matching narrows the search while the part-level localization refines the results. On both ActivityNet Captions and modified LSMDC datasets, the proposed framework achieves remarkable performance gains.
URL标识查看原文
源URL[http://ir.siat.ac.cn:8080/handle/172644/13691]  
专题深圳先进技术研究院_集成所
推荐引用方式
GB/T 7714
Dian Shao,Yu Xiong,Yue Zhao,et al. Find and Focus: Retrieve and Localize Video Events with Natural Language Queries[C]. 见:. 2018.

入库方式: OAI收割

来源:深圳先进技术研究院

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。