中国科学院机构知识库网格系统: Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

文献类型：期刊论文


作者	Zhu, Liao1,2 ; Wei, Qinglai3,4,5 ; Guo, Ping 1,2
刊名	IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
出版日期	2024-05-10
页码	11
关键词	Adaptive dynamic programming nonlinear systems online learning optimal control reinforcement learning (RL)
ISSN号	2168-2216
DOI	10.1109/TSMC.2024.3392756
通讯作者	Guo, Ping(pguo@bnu.edu.cn)
英文摘要	In this article, a real-time online off-policy reinforcement learning (RL) method is developed for the optimal control problem of unknown continuous-time nonlinear systems. First, by applying the temporal difference technique to the iterative procedure of off-policy RL, the iterative value function and the iterative policy input can be learned in real-time online. It is proven that the fitting error of neural network (NN) weights is exponentially convergent in each iteration. Second, a model-free Hamilton-Jacobi-Bellman equation (MF-HJBE) is deduced by taking the limit of the iterative procedure of off-policy RL. In this manner, it not only eliminates system dynamics in the classical HJBE, but also vanishes the iteration index. By applying temporal difference to the MF-HJBE, a real-time online tuning rule is designed to learn the optimal value function and the optimal policy input. It is proven that the fitting error of NN weights caused by the real-time online tuning rule is exponentially convergent. Note that the two online tuning rules, the iterative one and the real-time one, use only current and previous state data extracted from system trajectories. Meanwhile, it is proven using the Lyapunov's direct method that the system solution is uniformly ultimately bounded. Finally, simulation results demonstrate the validity of the proffered method.
资助项目	National Key Reseanch and Development Program of China
WOS研究方向	Automation & Control Systems ; Computer Science
语种	英语
WOS记录号	WOS:001218600800001
出版者	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构	National Key Reseanch and Development Program of China
源URL	[http://ir.ia.ac.cn/handle/173211/58374]
专题	自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队
通讯作者	Guo, Ping
作者单位	1.Beijing Normal Univ, Int Acad Ctr Complex Syst, Zhuhai 519087, Peoples R China 2.Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China 3.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 4.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 5.Macau Univ Sci & Technol, Inst Syst Engn, Macau, Peoples R China
推荐引用方式 GB/T 7714	Zhu, Liao,Wei, Qinglai,Guo, Ping. Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,2024:11.
APA	Zhu, Liao,Wei, Qinglai,&Guo, Ping.(2024).Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks.IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,11.
MLA	Zhu, Liao,et al."Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks".IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2024):11.

入库方式： OAI收割

来源：自动化研究所

下载0

Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

其他版本