Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks
文献类型:期刊论文
作者 | Zhu, Liao1,2![]() ![]() |
刊名 | IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
![]() |
出版日期 | 2024-05-10 |
页码 | 11 |
关键词 | Adaptive dynamic programming nonlinear systems online learning optimal control reinforcement learning (RL) |
ISSN号 | 2168-2216 |
DOI | 10.1109/TSMC.2024.3392756 |
通讯作者 | Guo, Ping(pguo@bnu.edu.cn) |
英文摘要 | In this article, a real-time online off-policy reinforcement learning (RL) method is developed for the optimal control problem of unknown continuous-time nonlinear systems. First, by applying the temporal difference technique to the iterative procedure of off-policy RL, the iterative value function and the iterative policy input can be learned in real-time online. It is proven that the fitting error of neural network (NN) weights is exponentially convergent in each iteration. Second, a model-free Hamilton-Jacobi-Bellman equation (MF-HJBE) is deduced by taking the limit of the iterative procedure of off-policy RL. In this manner, it not only eliminates system dynamics in the classical HJBE, but also vanishes the iteration index. By applying temporal difference to the MF-HJBE, a real-time online tuning rule is designed to learn the optimal value function and the optimal policy input. It is proven that the fitting error of NN weights caused by the real-time online tuning rule is exponentially convergent. Note that the two online tuning rules, the iterative one and the real-time one, use only current and previous state data extracted from system trajectories. Meanwhile, it is proven using the Lyapunov's direct method that the system solution is uniformly ultimately bounded. Finally, simulation results demonstrate the validity of the proffered method. |
资助项目 | National Key Reseanch and Development Program of China |
WOS研究方向 | Automation & Control Systems ; Computer Science |
语种 | 英语 |
WOS记录号 | WOS:001218600800001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
资助机构 | National Key Reseanch and Development Program of China |
源URL | [http://ir.ia.ac.cn/handle/173211/58374] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队 |
通讯作者 | Guo, Ping |
作者单位 | 1.Beijing Normal Univ, Int Acad Ctr Complex Syst, Zhuhai 519087, Peoples R China 2.Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China 3.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China 4.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 5.Macau Univ Sci & Technol, Inst Syst Engn, Macau, Peoples R China |
推荐引用方式 GB/T 7714 | Zhu, Liao,Wei, Qinglai,Guo, Ping. Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,2024:11. |
APA | Zhu, Liao,Wei, Qinglai,&Guo, Ping.(2024).Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks.IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,11. |
MLA | Zhu, Liao,et al."Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks".IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2024):11. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。