中国科学院机构知识库网格系统: Dynamic-horizon model-based value estimation with latent imagination

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Dynamic-horizon model-based value estimation with latent imagination

文献类型：期刊论文


作者	Wang JJ(王俊杰)1,2 ; Zhang QC(张启超)1,2 ; Zhao DB(赵冬斌)1,2
刊名	IEEE Transactions on Neural Networks and Learning Systems
出版日期	2022-10
页码	1-14
关键词	Latent world model model-based value expansion (MVE) reinforcement learning reinforcement learning
英文摘要	Existing model-based value expansion (MVE) methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, a proper horizon setting is essential to world-model-based policy learning. Meanwhile, choosing an appropriate horizon value is time-consuming, especially for visual control tasks. In this article, we investigate the idea of adaptively using the model knowledge for value expansion. We propose a novel world-model-based method called dynamic-horizon MVE (DMVE) to adjust the use of the world model with adaptive rollout horizon selection. Based on the reconstruction-based technique, the raw and reconstructed images are both used to obtain multihorizon rollouts by utilizing latent imagination. Then, a horizon reliability degree detection approach is given to select appropriate horizons and obtain more accurate value estimation by the reconstructed value expansion errors. Experimental results on the mainstream benchmark visual control tasks show that DMVE outperforms all baselines in sample efficiency and final performance. In addition, experiments on the autonomous driving lane-changing task further demonstrate the scalability of our method. The codes of DMVE are available at https://github.com/JunjieWang95/dmve.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/51719]
专题	复杂系统管理与控制国家重点实验室_深度强化学习
通讯作者	Zhang QC(张启超)
作者单位	1.中国科学院自动化研究所 2.中国科学院大学
推荐引用方式 GB/T 7714	Wang JJ,Zhang QC,Zhao DB. Dynamic-horizon model-based value estimation with latent imagination[J]. IEEE Transactions on Neural Networks and Learning Systems,2022:1-14.
APA	Wang JJ,Zhang QC,&Zhao DB.(2022).Dynamic-horizon model-based value estimation with latent imagination.IEEE Transactions on Neural Networks and Learning Systems,1-14.
MLA	Wang JJ,et al."Dynamic-horizon model-based value estimation with latent imagination".IEEE Transactions on Neural Networks and Learning Systems (2022):1-14.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。