DefFusion: Deformable Multimodal Representa tion Fusion for 3D Semantic Segmentation
文献类型:会议论文
作者 | Xu RT(许镕涛)![]() ![]() ![]() ![]() ![]() ![]() ![]() |
出版日期 | 2024-05 |
会议日期 | 2024-5 |
会议地点 | 日本横滨 |
英文摘要 | The complementarity between camera and LiDAR data makes fusion methods a promising approach to improve 3D semantic segmentation performance. Recent transformer-based methods have also demonstrated superiority in segmentation. However, multimodal solutions incorporating transformers are underexplored and face two key inherent difficulties: over- attention and noise from different modal data. To overcome these challenges, we propose a Deformable Multimodal Representation Fusion (DefFusion) framework consisting mainly of a Deformable Representation Fusion Transformer and Dynamic Representation Augmentation Modules. The Deformable Representation Fusion Transformer introduces the deformable mechanism in multimodal fusion, avoiding over- attention and improving efficiency by adaptively modeling a 2D key/value set for a given 3D query, thus enabling multimodal fusion with higher flexibility. To enhance the 2D representation and 3D representation, the Dynamic Representation Enhancement Module is proposed to dynamically remove noise in the input representation via Dynamic Grouped Representation Generation and Dynamic Mask Generation. Extensive experiments validate that our model achieves the best 3D semantic segmentation performance on SemanticKITTI and NuScenes benchmarks. |
源URL | [http://ir.ia.ac.cn/handle/173211/57545] ![]() |
专题 | 模式识别国家重点实验室_三维可视计算 |
作者单位 | Institute of Automation,Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Xu RT,Changwei Wang,Duzhen Zhang,et al. DefFusion: Deformable Multimodal Representa tion Fusion for 3D Semantic Segmentation[C]. 见:. 日本横滨. 2024-5. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。