中国科学院机构知识库网格系统: Scene captioning with deep fusion of images and point clouds

Scene captioning with deep fusion of images and point clouds

文献类型：期刊论文


作者	Yu, Qiang2,4 ; Zhang, Chunxia 1; Weng, Lubin4 ; Xiang, Shiming2,3 ; Pan, Chunhong3
刊名	PATTERN RECOGNITION LETTERS
出版日期	2022-06-01
卷号	158 页码:9-15
关键词	Scene captioning Point cloud Deep fusion Scene captioning Point cloud Deep fusion
ISSN号	0167-8655
DOI	10.1016/j.patrec.2022.04.017
通讯作者	Yu, Qiang(qiang.yu@ia.ac.cn)
英文摘要	Recently, the fusion of images and point clouds has received appreciable attentions in various fields, for example, autonomous driving, whose advantage over single-modal vision has been verified. However, it has not been extensively exploited in the scene captioning task. In this paper, a novel scene captioning framework with deep fusion of images and point clouds based on region correlation and attention is proposed to improve performances of captioning models. In our model, a symmetrical processing pipeline is designed for point clouds and images. First, 3D and 2D region features are generated respectively through region proposal generation, proposal fusion, and region pooling modules. Then, a feature fusion module is designed to integrate features according to the region correlation rule and the attention mechanism, which increases the interpretability of the fusion process and results in a sequence of fused visual features. Finally, the fused features are transformed into captions by an attention-based caption generation module. Comprehensive experiments indicate that the performance of our model reaches the state of the art.(c) 2022 Elsevier B.V. All rights reserved.
资助项目	National Key Research and Development Program of China[2020AAA0104903] ; National Natural Science Foundation of China[62072039] ; National Natural Science Foundation of China[62076242] ; National Natural Science Foundation of China[61976208]
WOS研究方向	Computer Science
语种	英语
WOS记录号	WOS:000797731300002
出版者	ELSEVIER
资助机构	National Key Research and Development Program of China ; National Natural Science Foundation of China
源URL	[http://ir.ia.ac.cn/handle/173211/49500]
专题	自动化研究所_模式识别国家重点实验室_遥感图像处理团队
通讯作者	Yu, Qiang
作者单位	1.Beijing Inst Technol, Sch Comp Sci & Technol, Beijing 100081, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 3.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China 4.Chinese Acad Sci, Inst Automat, Res Ctr Aerosp Informat, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Yu, Qiang,Zhang, Chunxia,Weng, Lubin,et al. Scene captioning with deep fusion of images and point clouds[J]. PATTERN RECOGNITION LETTERS,2022,158:9-15.
APA	Yu, Qiang,Zhang, Chunxia,Weng, Lubin,Xiang, Shiming,&Pan, Chunhong.(2022).Scene captioning with deep fusion of images and point clouds.PATTERN RECOGNITION LETTERS,158,9-15.
MLA	Yu, Qiang,et al."Scene captioning with deep fusion of images and point clouds".PATTERN RECOGNITION LETTERS 158(2022):9-15.

入库方式： OAI收割

来源：自动化研究所

下载0

Scene captioning with deep fusion of images and point clouds

其他版本