MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion
文献类型:期刊论文
作者 | Xu RT(许镕涛)![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
刊名 | Information Fusion
![]() |
出版日期 | 2024-05 |
页码 | 102493 |
英文摘要 | The complete understanding of 3D scenes is crucial in robotic visual perception, impacting tasks such as motion planning and map localization. However, due to the limited field of view and scene occlusion constraints of sensors, inferring complete scene geometry and semantic information from restricted observations is challenging. In this work, we propose a novel Multimodal Representation Fusion Transformer framework (MRFTrans) that robustly fuses semantic, geometric occupancy, and depth representations for monocular-image-based scene completion. MRFTrans centers on an affinity representation fusion transformer, integrating geometric occupancy and semantic relationships within a transformer architecture. This integration enables the modeling of long-range dependencies within scenes for inferring missing information. Additionally, we present a depth representation fusion method, efficiently extracting reliable depth knowledge from biased monocular estimates. Extensive experiments demonstrate MRFTrans’s superiority, setting a new benchmark on SemanticKITTI and NYUv2 datasets. It significantly enhances completeness and accuracy, particularly in large structures, movable objects, and scene components with major occlusions. The results underscore the benefits of the affinity-aware transformer and robust depth fusion in monocular-image-based completion. |
源URL | [http://ir.ia.ac.cn/handle/173211/57546] ![]() |
专题 | 模式识别国家重点实验室_三维可视计算 |
作者单位 | Institute of Automation,Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Xu RT,Jiguang Zhang,Jiaxi Sun,et al. MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion[J]. Information Fusion,2024:102493. |
APA | Xu RT.,Jiguang Zhang.,Jiaxi Sun.,Changwei Wang.,Yifan Wu.,...&Xiaopeng Zhang.(2024).MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion.Information Fusion,102493. |
MLA | Xu RT,et al."MRFTrans: Multimodal Representation Fusion Transformer for Monocular 3D Semantic Scene Completion".Information Fusion (2024):102493. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。