中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks

文献类型:期刊论文

作者Zhu, Liao1,2; Wei, Qinglai3,4,5; Guo, Ping1,2
刊名IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS
出版日期2024-05-10
页码11
关键词Adaptive dynamic programming nonlinear systems online learning optimal control reinforcement learning (RL)
ISSN号2168-2216
DOI10.1109/TSMC.2024.3392756
通讯作者Guo, Ping(pguo@bnu.edu.cn)
英文摘要In this article, a real-time online off-policy reinforcement learning (RL) method is developed for the optimal control problem of unknown continuous-time nonlinear systems. First, by applying the temporal difference technique to the iterative procedure of off-policy RL, the iterative value function and the iterative policy input can be learned in real-time online. It is proven that the fitting error of neural network (NN) weights is exponentially convergent in each iteration. Second, a model-free Hamilton-Jacobi-Bellman equation (MF-HJBE) is deduced by taking the limit of the iterative procedure of off-policy RL. In this manner, it not only eliminates system dynamics in the classical HJBE, but also vanishes the iteration index. By applying temporal difference to the MF-HJBE, a real-time online tuning rule is designed to learn the optimal value function and the optimal policy input. It is proven that the fitting error of NN weights caused by the real-time online tuning rule is exponentially convergent. Note that the two online tuning rules, the iterative one and the real-time one, use only current and previous state data extracted from system trajectories. Meanwhile, it is proven using the Lyapunov's direct method that the system solution is uniformly ultimately bounded. Finally, simulation results demonstrate the validity of the proffered method.
资助项目National Key Reseanch and Development Program of China
WOS研究方向Automation & Control Systems ; Computer Science
语种英语
WOS记录号WOS:001218600800001
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
资助机构National Key Reseanch and Development Program of China
源URL[http://ir.ia.ac.cn/handle/173211/58374]  
专题自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队
通讯作者Guo, Ping
作者单位1.Beijing Normal Univ, Int Acad Ctr Complex Syst, Zhuhai 519087, Peoples R China
2.Beijing Normal Univ, Sch Syst Sci, Beijing 100875, Peoples R China
3.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
4.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
5.Macau Univ Sci & Technol, Inst Syst Engn, Macau, Peoples R China
推荐引用方式
GB/T 7714
Zhu, Liao,Wei, Qinglai,Guo, Ping. Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,2024:11.
APA Zhu, Liao,Wei, Qinglai,&Guo, Ping.(2024).Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks.IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS,11.
MLA Zhu, Liao,et al."Online Off-Policy Reinforcement Learning for Optimal Control of Unknown Nonlinear Systems Using Neural Networks".IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS (2024):11.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。