中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Stable Training of Bellman Error in Reinforcement Learning

文献类型:会议论文

作者Gong C(龚晨)2; Bai YP(白云鹏)2; Hou XW(侯新文)2; Ji XH(季晓慧)1
出版日期2020-11
会议日期November 18–22
会议地点Thailand
英文摘要
The optimization of Bellman error is the key to value function learning in prin
ciple. However, it always suffffers from unstable training and slow convergence. In this paper, we investigate the problem of optimizing Bellman error distribution, aiming at stabilizing the process of Bellman error training. Then, we propose a framework that the Bellman error distribution at the current time approximates the previous one, under the hypothesis that the
Bellman error follows a stationary random process if the training process is convergent, which can stabilize the value function learning. Next, we minimize the distance of two distributions with the Stein Variational Gradient Descend (SVGD) method, which benefifits the balance of exploration and exploitation in parameter space. Then, we incorporate this framework in the advantage actor-critic (A2C) algorithms. Experimental results on discrete control problems, show our algorithm getting average returns and smaller Bellman errors than both A2C
algorithms and anchor method. Besides, it would stabilize the training process.
语种英语
源URL[http://ir.ia.ac.cn/handle/173211/52196]  
专题自动化研究所_复杂系统管理与控制国家重点实验室_机器人应用与理论组
通讯作者Hou XW(侯新文)
作者单位1.School of Information Engineering, China University of Geosciences in Beijing
2.Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Gong C,Bai YP,Hou XW,et al. Stable Training of Bellman Error in Reinforcement Learning[C]. 见:. Thailand. November 18–22.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。