中国科学院机构知识库网格系统: Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

文献类型：会议论文


作者	Zhang Qingyang2,3 ; Yang Yiming2 ; Ruan Jingqing 2,3; Xiong Xuantang 1,2; Xing Dengpeng1,2 ; Xu Bo1,2,3
出版日期	2023-06
会议日期	2023-6
会议地点	澳大利亚
关键词	强化学习，分层强化学习
英文摘要	Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance
会议录出版者	IEEE
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/57587]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
作者单位	1.中国科学院大学人工智能学院 2.中国科学院自动化研究所 3.中国科学院大学未来技术学院
推荐引用方式 GB/T 7714	Zhang Qingyang,Yang Yiming,Ruan Jingqing,et al. Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs[C]. 见:. 澳大利亚. 2023-6.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。