中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
DefFusion: Deformable Multimodal Representa tion Fusion for 3D Semantic Segmentation

文献类型:会议论文

作者Xu RT(许镕涛); Changwei Wang; Duzhen Zhang; Man Zhang; Shibiao Xu; Weiliang Meng; Xiaopeng Zhang
出版日期2024-05
会议日期2024-5
会议地点日本横滨
英文摘要
The complementarity between camera and LiDAR data makes fusion methods a promising approach to improve 3D semantic segmentation performance. Recent transformer-based methods have also demonstrated superiority in segmentation. However, multimodal solutions incorporating transformers are underexplored and face two key inherent difficulties: over- attention and noise from different modal data. To overcome these challenges, we propose a Deformable Multimodal Representation Fusion (DefFusion) framework consisting mainly of a Deformable Representation Fusion Transformer and Dynamic Representation Augmentation Modules. The Deformable Representation Fusion Transformer introduces the deformable mechanism in multimodal fusion, avoiding over- attention and improving efficiency by adaptively modeling a 2D key/value set for a given 3D query, thus enabling multimodal fusion with higher flexibility. To enhance the 2D representation and 3D representation, the Dynamic Representation Enhancement Module is proposed to dynamically remove noise in the input representation via Dynamic Grouped Representation Generation and Dynamic Mask Generation. Extensive experiments validate that our model achieves the best 3D semantic segmentation performance on SemanticKITTI and NuScenes benchmarks.
源URL[http://ir.ia.ac.cn/handle/173211/57545]  
专题模式识别国家重点实验室_三维可视计算
作者单位Institute of Automation,Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Xu RT,Changwei Wang,Duzhen Zhang,et al. DefFusion: Deformable Multimodal Representa tion Fusion for 3D Semantic Segmentation[C]. 见:. 日本横滨. 2024-5.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。