Multimodal graph neural network for video procedural captioning
文献类型:期刊论文
作者 | Ji, Lei1,2,3; Tu, Rongcheng5; Lin, Kevin4; Wang, Lijuan4; Duan, Nan1 |
刊名 | NEUROCOMPUTING
![]() |
出版日期 | 2022-06-01 |
卷号 | 488页码:88-96 |
关键词 | Multimodal video captioning Graph neural network |
ISSN号 | 0925-2312 |
DOI | 10.1016/j.neucom.2022.02.062 |
英文摘要 | Video procedural captioning aims to generate detailed descriptive captions for all steps in a long instructional video. The peculiarity of this problem is the procedural dependency between the events to generate consistent captions among the video. However, existing video (dense) captioning methods only consider intra-event or sequential inter-event context and are hard to model the non-sequential context dependency between events. In this paper, inspired by the recent success of graph neural networks in capturing the relations for structured data, we propose a novel Multimodal Graph Neural Network (MGNN) for dense video procedural captioning in capturing the procedural structure between events. Specifically, we construct temporal sequential graph and semantic non-sequential graph for a multi modal heterogeneous graph. Moreover, we adopt the graph neural network to enhance the visual and text features, and fuse both features for further caption generation. Extensive experiments demonstrate the proposed MGNN is effective in generating coherent captions on both the Youcook2 and Activitynet Captions benchmark.(c) 2022 Elsevier B.V. All rights reserved. |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000782470900008 |
出版者 | ELSEVIER |
源URL | [http://119.78.100.204/handle/2XEOYT63/18894] ![]() |
专题 | 中国科学院计算技术研究所期刊论文_英文 |
通讯作者 | Ji, Lei |
作者单位 | 1.Microsoft Res Asia, Beijing, Peoples R China 2.Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China 3.Univ Chinese Acad Sci, Beijing, Peoples R China 4.Microsoft, Redmond, WA USA 5.Beijing Inst Technol, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Ji, Lei,Tu, Rongcheng,Lin, Kevin,et al. Multimodal graph neural network for video procedural captioning[J]. NEUROCOMPUTING,2022,488:88-96. |
APA | Ji, Lei,Tu, Rongcheng,Lin, Kevin,Wang, Lijuan,&Duan, Nan.(2022).Multimodal graph neural network for video procedural captioning.NEUROCOMPUTING,488,88-96. |
MLA | Ji, Lei,et al."Multimodal graph neural network for video procedural captioning".NEUROCOMPUTING 488(2022):88-96. |
入库方式: OAI收割
来源:计算技术研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。