Memory-Adaptive Vision-and-Language Navigation
文献类型:期刊论文
作者 | Keji He4,5![]() ![]() ![]() ![]() ![]() ![]() |
刊名 | Pattern Recognition
![]() |
出版日期 | 2024 |
卷号 | 153页码:110511 |
关键词 | Vision-and-Language Navigation Memory bank History noises Memory-Adaptive Model |
英文摘要 | Vision-and-Language Navigation (VLN) requests an agent to navigate in 3D environments following given instructions, where history is critical for decision-making in dynamic navigation process. Particularly, a memory bank storing histories is widely used in existing methods to incorporate with multimodel representations in current scenes for better decision-making. However, by weighting each history with a simple scalar, those methods cannot purely utilize the informative cues that co-exist with detrimental contents in each history, thereby inevitably introducing noises into decision-making. To that end, we propose a novel Memory-Adaptive Model (MAM) that can dynamically restrain the detrimental contents in histories for retaining contents that benefit navigation only. Specifically, two key modules, Visual and Textual Adaptive Modules, are designed to restrain history noises based on scene-related vision and text, respectively. A Reliability Estimator Module is further introduced to refine above adaptation operations. Our experiments on the widely used RxR and R2R datasets show that MAM outperforms its baseline method by 4.0%/2.5% and 2%/1% on the validation unseen/test split, respectively, wrt the SR metric. |
源URL | [http://ir.ia.ac.cn/handle/173211/57628] ![]() |
专题 | 自动化研究所_智能感知与计算研究中心 |
作者单位 | 1.National University of Singapore 2.School of Future Technology, University of Chinese Academy of Sciences 3.ByteDance AI Lab 4.Center for Research on Intelligent Perception and Computing, State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences 5.School of Artificial Intelligence, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Keji He,Ya Jing,Yan Huang,et al. Memory-Adaptive Vision-and-Language Navigation[J]. Pattern Recognition,2024,153:110511. |
APA | Keji He,Ya Jing,Yan Huang,Zhihe Lu,Dong An,&Liang Wang.(2024).Memory-Adaptive Vision-and-Language Navigation.Pattern Recognition,153,110511. |
MLA | Keji He,et al."Memory-Adaptive Vision-and-Language Navigation".Pattern Recognition 153(2024):110511. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。