中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion

文献类型:期刊论文

作者Xu RT(许镕涛); Jiguang Zhang; Jiaxi Sun; Changwei Wang; Yifan Wu; Shibiao Xu; Weiliang Meng; Xiaopeng Zhang
刊名Information Fusion
出版日期2024-05
页码102493
英文摘要

The complete understanding of 3D scenes is crucial in robotic visual perception, impacting tasks such as motion planning and map localization. However, due to the limited field of view and scene occlusion constraints of sensors, inferring complete scene geometry and semantic information from restricted observations is challenging. In this work, we propose a novel Multimodal Representation Fusion Transformer framework (MRFTrans) that robustly fuses semantic, geometric occupancy, and depth representations for monocular-image-based scene completion. MRFTrans centers on an affinity representation fusion transformer, integrating geometric occupancy and semantic relationships within a transformer architecture. This integration enables the modeling of long-range dependencies within scenes for inferring missing information. Additionally, we present a depth representation fusion method, efficiently extracting reliable depth knowledge from biased monocular estimates. Extensive experiments demonstrate MRFTrans’s superiority, setting a new benchmark on SemanticKITTI and NYUv2 datasets. It significantly enhances completeness and accuracy, particularly in large structures, movable objects, and scene components with major occlusions. The results underscore the benefits of the affinity-aware transformer and robust depth fusion in monocular-image-based completion.

源URL[http://ir.ia.ac.cn/handle/173211/57546]  
专题模式识别国家重点实验室_三维可视计算
作者单位Institute of Automation,Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Xu RT,Jiguang Zhang,Jiaxi Sun,et al. MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion[J]. Information Fusion,2024:102493.
APA Xu RT.,Jiguang Zhang.,Jiaxi Sun.,Changwei Wang.,Yifan Wu.,...&Xiaopeng Zhang.(2024).MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion.Information Fusion,102493.
MLA Xu RT,et al."MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion".Information Fusion (2024):102493.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。