中国科学院机构知识库网格系统: MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion

文献类型：期刊论文


作者	Xu RT(许镕涛); Jiguang Zhang; Jiaxi Sun; Changwei Wang; Yifan Wu; Shibiao Xu; Weiliang Meng; Xiaopeng Zhang
刊名	Information Fusion
出版日期	2024-05
页码	102493
英文摘要	The complete understanding of 3D scenes is crucial in robotic visual perception, impacting tasks such as motion planning and map localization. However, due to the limited field of view and scene occlusion constraints of sensors, inferring complete scene geometry and semantic information from restricted observations is challenging. In this work, we propose a novel Multimodal Representation Fusion Transformer framework (MRFTrans) that robustly fuses semantic, geometric occupancy, and depth representations for monocular-image-based scene completion. MRFTrans centers on an affinity representation fusion transformer, integrating geometric occupancy and semantic relationships within a transformer architecture. This integration enables the modeling of long-range dependencies within scenes for inferring missing information. Additionally, we present a depth representation fusion method, efficiently extracting reliable depth knowledge from biased monocular estimates. Extensive experiments demonstrate MRFTrans’s superiority, setting a new benchmark on SemanticKITTI and NYUv2 datasets. It significantly enhances completeness and accuracy, particularly in large structures, movable objects, and scene components with major occlusions. The results underscore the benefits of the affinity-aware transformer and robust depth fusion in monocular-image-based completion.
源URL	[http://ir.ia.ac.cn/handle/173211/57546]
专题	模式识别国家重点实验室_三维可视计算
作者单位	Institute of Automation，Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Xu RT,Jiguang Zhang,Jiaxi Sun,et al. MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion[J]. Information Fusion,2024:102493.
APA	Xu RT.,Jiguang Zhang.,Jiaxi Sun.,Changwei Wang.,Yifan Wu.,...&Xiaopeng Zhang.(2024).MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion.Information Fusion,102493.
MLA	Xu RT,et al."MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion".Information Fusion (2024):102493.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。