Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems
文献类型:期刊论文
作者 | Zhu, Yuanheng1![]() ![]() |
刊名 | COGNITIVE COMPUTATION
![]() |
出版日期 | 2015-12-01 |
卷号 | 7期号:6页码:763-771 |
关键词 | Approximate policy iteration Approximation error Optimal control Fuzzy approximator |
英文摘要 | Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem. |
WOS标题词 | Science & Technology ; Technology ; Life Sciences & Biomedicine |
类目[WOS] | Computer Science, Artificial Intelligence ; Neurosciences |
研究领域[WOS] | Computer Science ; Neurosciences & Neurology |
关键词[WOS] | NONLINEAR-SYSTEMS ; FEEDBACK-CONTROL ; MOBILE ROBOTS ; ALGORITHM |
收录类别 | SCI |
语种 | 英语 |
WOS记录号 | WOS:000366329200012 |
公开日期 | 2016-02-26 |
源URL | [http://ir.ia.ac.cn/handle/173211/10525] ![]() |
专题 | 复杂系统管理与控制国家重点实验室_深度强化学习 |
作者单位 | 1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA 3.Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China |
推荐引用方式 GB/T 7714 | Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems[J]. COGNITIVE COMPUTATION,2015,7(6):763-771. |
APA | Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,&Ji, Junhong.(2015).Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems.COGNITIVE COMPUTATION,7(6),763-771. |
MLA | Zhu, Yuanheng,et al."Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems".COGNITIVE COMPUTATION 7.6(2015):763-771. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。