中国科学院机构知识库网格系统: Stable Training of Bellman Error in Reinforcement Learning

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Stable Training of Bellman Error in Reinforcement Learning

文献类型：会议论文


作者	Gong C(龚晨)2 ; Bai YP(白云鹏)2 ; Hou XW(侯新文)2 ; Ji XH(季晓慧)1
出版日期	2020-11
会议日期	November 18–22
会议地点	Thailand
英文摘要	The optimization of Bellman error is the key to value function learning in prin ciple. However, it always suffffers from unstable training and slow convergence. In this paper, we investigate the problem of optimizing Bellman error distribution, aiming at stabilizing the process of Bellman error training. Then, we propose a framework that the Bellman error distribution at the current time approximates the previous one, under the hypothesis that the Bellman error follows a stationary random process if the training process is convergent, which can stabilize the value function learning. Next, we minimize the distance of two distributions with the Stein Variational Gradient Descend (SVGD) method, which benefifits the balance of exploration and exploitation in parameter space. Then, we incorporate this framework in the advantage actor-critic (A2C) algorithms. Experimental results on discrete control problems, show our algorithm getting average returns and smaller Bellman errors than both A2C algorithms and anchor method. Besides, it would stabilize the training process.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/52196]
专题	自动化研究所_复杂系统管理与控制国家重点实验室_机器人应用与理论组
通讯作者	Hou XW(侯新文)
作者单位	1.School of Information Engineering, China University of Geosciences in Beijing 2.Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Gong C,Bai YP,Hou XW,et al. Stable Training of Bellman Error in Reinforcement Learning[C]. 见:. Thailand. November 18–22.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。