Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms
文献类型:期刊论文
作者 | Chen, Yurou1,2; Zhang, Fengyi1,2; Liu, Zhiyong1,2,3 |
刊名 | NEURAL NETWORKS |
出版日期 | 2024 |
卷号 | 169页码:764-777 |
ISSN号 | 0893-6080 |
关键词 | Reinforcement Learning Policy gradient Actor-critic Value function Bias-variance trade-off |
DOI | 10.1016/j.neunet.2023.10.023 |
通讯作者 | Liu, Zhiyong(zhiyong.liu@ia.ac.cn) |
英文摘要 | Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE). |
资助项目 | National Key Research and Development Plan of China[2020AAA0108902] |
WOS研究方向 | Computer Science ; Neurosciences & Neurology |
语种 | 英语 |
出版者 | PERGAMON-ELSEVIER SCIENCE LTD |
WOS记录号 | WOS:001118772900001 |
资助机构 | National Key Research and Development Plan of China |
源URL | [http://ir.ia.ac.cn/handle/173211/55051] |
专题 | 多模态人工智能系统全国重点实验室 |
通讯作者 | Liu, Zhiyong |
作者单位 | 1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | Chen, Yurou,Zhang, Fengyi,Liu, Zhiyong. Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms[J]. NEURAL NETWORKS,2024,169:764-777. |
APA | Chen, Yurou,Zhang, Fengyi,&Liu, Zhiyong.(2024).Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms.NEURAL NETWORKS,169,764-777. |
MLA | Chen, Yurou,et al."Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms".NEURAL NETWORKS 169(2024):764-777. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。