中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
BEVBert: Multimodal Map Pre-training for Language-guided Navigation

文献类型:会议论文

作者Dong An; Yuankai Qi; Yangguang Li; Yan Huang; Liang Wang; Tieniu Tan; Jing Shao
出版日期2023-10
会议日期2023-10-2
会议地点Paris, France
英文摘要

Large-scale pre-training has shown promising results on the vision-and-language navigation (VLN) task. However, most existing pre-training methods employ discrete panora mas to learn visual-textual associations. This requires the model to implicitly correlate incomplete, duplicate observations within the panoramas, which may impair an agent’s spatial understanding. Thus, we propose a new map-based pre-training paradigm that is spatial-aware for use in VLN. Concretely, we build a local metric map to explicitly aggregate incomplete observations and remove duplicates, while modeling navigation dependency in a global topological map. This hybrid design can balance the demand of VLN for both short-term reasoning and long-term planning. Then, based on the hybrid map, we devise a pre-training framework to learn a multimodal map representation, which enhances spatial-aware cross-modal reasoning thereby facilitating the language-guided navigation goal. Extensive experiments demonstrate the effectiveness of the map-based pre-training route for VLN, and the proposed method achieves state-of-the-art on four VLN benchmarks.

会议录Proceedings of the IEEE International Conference on Computer Vision
语种英语
源URL[http://ir.ia.ac.cn/handle/173211/56611]  
专题自动化研究所_智能感知与计算研究中心
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.Australian Institute for Machine Learning, University of Adelaide
3.Shanghai AI Laboratory
4.Nanjing University
5.SenseTime Research
6.School of Future Technology, UCAS
推荐引用方式
GB/T 7714
Dong An,Yuankai Qi,Yangguang Li,et al. BEVBert: Multimodal Map Pre-training for Language-guided Navigation[C]. 见:. Paris, France. 2023-10-2.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。