Stable Training of Bellman Error in Reinforcement Learning
文献类型:会议论文
作者 | Gong C(龚晨)2![]() ![]() ![]() |
出版日期 | 2020-11 |
会议日期 | November 18–22 |
会议地点 | Thailand |
英文摘要 | The optimization of Bellman error is the key to value function learning in prin
ciple. However, it always suffffers from unstable training and slow convergence. In this paper, we investigate the problem of optimizing Bellman error distribution, aiming at stabilizing the process of Bellman error training. Then, we propose a framework that the Bellman error distribution at the current time approximates the previous one, under the hypothesis that the
Bellman error follows a stationary random process if the training process is convergent, which can stabilize the value function learning. Next, we minimize the distance of two distributions with the Stein Variational Gradient Descend (SVGD) method, which benefifits the balance of exploration and exploitation in parameter space. Then, we incorporate this framework in the advantage actor-critic (A2C) algorithms. Experimental results on discrete control problems, show our algorithm getting average returns and smaller Bellman errors than both A2C
algorithms and anchor method. Besides, it would stabilize the training process. |
语种 | 英语 |
源URL | [http://ir.ia.ac.cn/handle/173211/52196] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_机器人应用与理论组 |
通讯作者 | Hou XW(侯新文) |
作者单位 | 1.School of Information Engineering, China University of Geosciences in Beijing 2.Institute of Automation, Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Gong C,Bai YP,Hou XW,et al. Stable Training of Bellman Error in Reinforcement Learning[C]. 见:. Thailand. November 18–22. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。