中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

文献类型:期刊论文

作者Zhu, Yuanheng1; Zhao, Dongbin1; He, Haibo2; Ji, Junhong3
刊名COGNITIVE COMPUTATION
出版日期2015-12-01
卷号7期号:6页码:763-771
关键词Approximate policy iteration Approximation error Optimal control Fuzzy approximator
英文摘要Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.
WOS标题词Science & Technology ; Technology ; Life Sciences & Biomedicine
类目[WOS]Computer Science, Artificial Intelligence ; Neurosciences
研究领域[WOS]Computer Science ; Neurosciences & Neurology
关键词[WOS]NONLINEAR-SYSTEMS ; FEEDBACK-CONTROL ; MOBILE ROBOTS ; ALGORITHM
收录类别SCI
语种英语
WOS记录号WOS:000366329200012
公开日期2016-02-26
源URL[http://ir.ia.ac.cn/handle/173211/10525]  
专题复杂系统管理与控制国家重点实验室_深度强化学习
作者单位1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
2.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA
3.Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China
推荐引用方式
GB/T 7714
Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems[J]. COGNITIVE COMPUTATION,2015,7(6):763-771.
APA Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,&Ji, Junhong.(2015).Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems.COGNITIVE COMPUTATION,7(6),763-771.
MLA Zhu, Yuanheng,et al."Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems".COGNITIVE COMPUTATION 7.6(2015):763-771.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。