中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms

文献类型:期刊论文

作者Chen, Yurou1,2; Zhang, Fengyi1,2; Liu, Zhiyong1,2,3
刊名NEURAL NETWORKS
出版日期2024
卷号169页码:764-777
ISSN号0893-6080
关键词Reinforcement Learning Policy gradient Actor-critic Value function Bias-variance trade-off
DOI10.1016/j.neunet.2023.10.023
通讯作者Liu, Zhiyong(zhiyong.liu@ia.ac.cn)
英文摘要Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).
资助项目National Key Research and Development Plan of China[2020AAA0108902]
WOS研究方向Computer Science ; Neurosciences & Neurology
语种英语
出版者PERGAMON-ELSEVIER SCIENCE LTD
WOS记录号WOS:001118772900001
资助机构National Key Research and Development Plan of China
源URL[http://ir.ia.ac.cn/handle/173211/55051]  
专题多模态人工智能系统全国重点实验室
通讯作者Liu, Zhiyong
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
3.Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Chen, Yurou,Zhang, Fengyi,Liu, Zhiyong. Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms[J]. NEURAL NETWORKS,2024,169:764-777.
APA Chen, Yurou,Zhang, Fengyi,&Liu, Zhiyong.(2024).Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms.NEURAL NETWORKS,169,764-777.
MLA Chen, Yurou,et al."Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms".NEURAL NETWORKS 169(2024):764-777.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。