中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Scene captioning with deep fusion of images and point clouds

文献类型:期刊论文

作者Yu, Qiang2,4; Zhang, Chunxia1; Weng, Lubin4; Xiang, Shiming2,3; Pan, Chunhong3
刊名PATTERN RECOGNITION LETTERS
出版日期2022-06-01
卷号158页码:9-15
关键词Scene captioning Point cloud Deep fusion Scene captioning Point cloud Deep fusion
ISSN号0167-8655
DOI10.1016/j.patrec.2022.04.017
通讯作者Yu, Qiang(qiang.yu@ia.ac.cn)
英文摘要Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved.
资助项目National Key Research and Development Program of China[2020AAA0104903] ; National Natural Science Foundation of China[62072039] ; National Natural Science Foundation of China[62076242] ; National Natural Science Foundation of China[61976208]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:000797731300002
出版者ELSEVIER
资助机构National Key Research and Development Program of China ; National Natural Science Foundation of China
源URL[http://ir.ia.ac.cn/handle/173211/49500]  
专题自动化研究所_模式识别国家重点实验室_遥感图像处理团队
通讯作者Yu, Qiang
作者单位1.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
4.Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Yu, Qiang,Zhang, Chunxia,Weng, Lubin,et al. Scene captioning with deep fusion of images and point clouds[J]. PATTERN RECOGNITION LETTERS,2022,158:9-15.
APA Yu, Qiang,Zhang, Chunxia,Weng, Lubin,Xiang, Shiming,&Pan, Chunhong.(2022).Scene captioning with deep fusion of images and point clouds.PATTERN RECOGNITION LETTERS,158,9-15.
MLA Yu, Qiang,et al."Scene captioning with deep fusion of images and point clouds".PATTERN RECOGNITION LETTERS 158(2022):9-15.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。