中国科学院机构知识库网格系统: Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

文献类型：期刊论文


作者	Zhu, Yuanheng1 ; Zhao, Dongbin1 ; He, Haibo 2; Ji, Junhong 3
刊名	COGNITIVE COMPUTATION
出版日期	2015-12-01
卷号	7 期号:6 页码:763-771
关键词	Approximate policy iteration Approximation error Optimal control Fuzzy approximator
英文摘要	Approximate policy iteration (API) is studied to solve undiscounted optimal control problems in this paper. A discrete-time system with the continuous-state space and the finite-action set is considered. As approximation technique is used for the continuous-state space, approximation errors exist in the calculation and disturb the convergence of the original policy iteration. In our research, we analyze and prove the convergence of API for undiscounted optimal control. We use an iterative method to implement approximate policy evaluation and demonstrate that the error between approximate and exact value functions is bounded. Then, with the finite-action set, the greedy policy in policy improvement is generated directly. Our main theorem proves that if a sufficiently accurate approximator is used, API converges to the optimal policy. For implementation, we introduce a fuzzy approximator and verify the performance on the puddle world problem.
WOS标题词	Science & Technology ; Technology ; Life Sciences & Biomedicine
类目[WOS]	Computer Science, Artificial Intelligence ; Neurosciences
研究领域[WOS]	Computer Science ; Neurosciences & Neurology
关键词[WOS]	NONLINEAR-SYSTEMS ; FEEDBACK-CONTROL ; MOBILE ROBOTS ; ALGORITHM
收录类别	SCI
语种	英语
WOS记录号	WOS:000366329200012
公开日期	2016-02-26
源URL	[http://ir.ia.ac.cn/handle/173211/10525]
专题	复杂系统管理与控制国家重点实验室_深度强化学习
作者单位	1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Rhode Isl, Dept Elect Comp & Biomed Engn, Kingston, RI 02881 USA 3.Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China
推荐引用方式 GB/T 7714	Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,et al. Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems[J]. COGNITIVE COMPUTATION,2015,7(6):763-771.
APA	Zhu, Yuanheng,Zhao, Dongbin,He, Haibo,&Ji, Junhong.(2015).Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems.COGNITIVE COMPUTATION,7(6),763-771.
MLA	Zhu, Yuanheng,et al."Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems".COGNITIVE COMPUTATION 7.6(2015):763-771.

入库方式： OAI收割

来源：自动化研究所

下载0

Convergence Proof of Approximate Policy Iteration for Undiscounted Optimal Control of Discrete-Time Systems

其他版本