中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision

文献类型:会议论文

作者Keji He5,6,7; Yan Huang6,7; Qi Wu5; Jianhua Yang1; Dong An4,7; Shuanglin Sima6,7; Liang Wang2,3,6,7
出版日期2021-12
会议日期2021-12-7至2021-12-10
会议地点线上
英文摘要

In Vision-and-Language Navigation (VLN) task, an agent is asked to navigate inside 3D indoor environments following given instructions. Cross-modal alignment is one of the most critical challenges in VLN because the predicted trajectory needs to match the given instruction accurately. In this paper, we address the cross-modal alignment challenge from the perspective of fine-grain. Firstly, to alleviate weak cross-modal alignment supervision from coarse-grained data, we introduce a human-annotated fine-grained VLN dataset, namely Landmark-RxR. Secondly, to further enhance local cross-modal alignment under fine-grained supervision, we investigate the focal-oriented rewards with soft and hard forms, by focusing on the critical points sampled from fine-grained Landmark-RxR. Moreover, to fully evaluate the navigation process, we also propose a re-initialization mechanism that makes metrics insensitive to difficult points, which can cause the agent to deviate from the correct trajectories. Experimental results show that our agent has superior navigation performance on Landmark-RxR, en-RxR and R2R. Our dataset and code are available at https://github.com/hekj/Landmark-RxR.

源URL[http://ir.ia.ac.cn/handle/173211/57626]  
专题自动化研究所_智能感知与计算研究中心
作者单位1.School of Artificial Intelligence, Beijing University of Posts and Telecommunications
2.Center for Excellence in Brain Science and Intelligence Technology
3.Chinese Academy of Sciences, Artificial Intelligence Research
4.School of Future Technology, University of Chinese Academy of Sciences
5.School of Computer Science, University of Adelaide
6.School of Artificial Intelligence, University of Chinese Academy of Sciences
7.Center for Research on Intelligent Perception and Computing National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Keji He,Yan Huang,Qi Wu,et al. Landmark-RxR: Solving Vision-and-Language Navigation with Fine-Grained Alignment Supervision[C]. 见:. 线上. 2021-12-7至2021-12-10.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。