A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward
文献类型:期刊论文
作者 | Liang, Mingming1,2![]() ![]() |
刊名 | NEUROCOMPUTING
![]() |
出版日期 | 2021-02-01 |
卷号 | 424页码:23-34 |
关键词 | Adaptive critic designs Adaptive dynamic programming Policy iteration Neural networks Neuro-dynamic programming Nonlinear systems Optimal control |
ISSN号 | 0925-2312 |
DOI | 10.1016/j.neucom.2020.11.014 |
通讯作者 | Liang, Mingming(liangmingming@gdut.edu.cn) |
英文摘要 | This paper constructs a partial policy iteration adaptive dynamic programming (ADP) algorithm to solve the optimal control problem of nonlinear systems with discounted total reward. Compared with traditional policy iteration ADP algorithm, the approach updates the iterative control law only in a local region of the global system state space. With the benefit of this feature, the overall computational burden at each iteration for processing units can be significantly reduced. Hence, this feature enables our algorithm to be successfully executed on low-performance devices such as smartphones, smartwatches and the Internet of Things (IoT) objects. We provide the convergency analysis to show that the generated sequence of value functions is monotonically nonincreasing and can finally reach a local optimum. In addition, the corresponding local policy space is developed theoretically for the first time. Besides, when the sequence of the local system state spaces is chosen properly, we prove that the developed algorithm is capable of finding the global optimal performance index function for the nonlinear systems. Finally, we present a numerical simulation to demonstrate the effectiveness of the proposed algorithm. (c) 2020 Elsevier B.V. All rights reserved. |
WOS关键词 | LINEAR-SYSTEMS ; ROBUST-CONTROL ; GAMES |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000611084200003 |
出版者 | ELSEVIER |
源URL | [http://ir.ia.ac.cn/handle/173211/43115] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队 |
通讯作者 | Liang, Mingming |
作者单位 | 1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China |
推荐引用方式 GB/T 7714 | Liang, Mingming,Wei, Qinglai. A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward[J]. NEUROCOMPUTING,2021,424:23-34. |
APA | Liang, Mingming,&Wei, Qinglai.(2021).A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward.NEUROCOMPUTING,424,23-34. |
MLA | Liang, Mingming,et al."A partial policy iteration ADP algorithm for nonlinear neuro-optimal control with discounted total reward".NEUROCOMPUTING 424(2021):23-34. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。