中国科学院机构知识库网格系统: Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

文献类型：期刊论文


作者	Chen, Yurou 1,2; Zhang, Fengyi1,2 ; Liu, Zhiyong1,2,3
刊名	NEURAL NETWORKS
出版日期	2024
卷号	169 页码:764-777
关键词	Reinforcement Learning Policy gradient Actor-critic Value function Bias-variance trade-off
ISSN号	0893-6080
DOI	10.1016/j.neunet.2023.10.023
通讯作者	Liu, Zhiyong(zhiyong.liu@ia.ac.cn)
英文摘要	Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
资助项目	National Key Research and Development Plan of China[2020AAA0108902]
WOS研究方向	Computer Science ; Neurosciences & Neurology
语种	英语
WOS记录号	WOS:001118772900001
出版者	PERGAMON-ELSEVIER SCIENCE LTD
资助机构	National Key Research and Development Plan of China
源URL	[http://ir.ia.ac.cn/handle/173211/55051]
专题	多模态人工智能系统全国重点实验室
通讯作者	Liu, Zhiyong
作者单位	1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China
推荐引用方式 GB/T 7714	Chen, Yurou,Zhang, Fengyi,Liu, Zhiyong. Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms[J]. NEURAL NETWORKS,2024,169:764-777.
APA	Chen, Yurou,Zhang, Fengyi,&Liu, Zhiyong.(2024).Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms.NEURAL NETWORKS,169,764-777.
MLA	Chen, Yurou,et al."Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms".NEURAL NETWORKS 169(2024):764-777.

入库方式： OAI收割

来源：自动化研究所

下载0

Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

其他版本