Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis
文献类型:期刊论文
作者 | Wei, Qinglai1![]() ![]() |
刊名 | IEEE TRANSACTIONS ON CYBERNETICS
![]() |
出版日期 | 2017-05-01 |
卷号 | 47期号:5页码:1224-1237 |
关键词 | Adaptive Critic Designs Adaptive Dynamic Programming (Adp) Approximate Dynamic Programming Neural Networks (Nns) Neuro-dynamic Programming Optimal Control Q-learning |
DOI | 10.1109/TCYB.2016.2542923 |
文献子类 | Article |
英文摘要 | In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm. |
WOS关键词 | OPTIMAL TRACKING CONTROL ; ZERO-SUM GAMES ; H-INFINITY CONTROL ; INPUT-OUTPUT DATA ; DEAD-ZONE INPUT ; NONLINEAR-SYSTEMS ; ALGORITHM ; DESIGN ; REPRESENTATION ; APPROXIMATION |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000399797000009 |
资助机构 | National Natural Science Foundation (NNSF) of China(61374105 ; Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3) ; Open Research Project from SKLMCCS(20150104) ; National Science Foundation(ECCS-1405173 ; Office of Naval Research, Arlington, VA, USA(N00014-13-1-0562 ; U.S. Army Research Office(W911NF-11-D-0001) ; China NNSF(61120106011) ; China Education Ministry Project 111(B08015) ; 61304079 ; IIS-1208623) ; N000141410718) ; 61273140) |
源URL | [http://ir.ia.ac.cn/handle/173211/13630] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队 |
作者单位 | 1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA 3.Northeastern Univ, Shenyang 110036, Peoples R China 4.Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110036, Peoples R China 5.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China |
推荐引用方式 GB/T 7714 | Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,et al. Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis[J]. IEEE TRANSACTIONS ON CYBERNETICS,2017,47(5):1224-1237. |
APA | Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,Yan, Pengfei,&Song, Ruizhuo.(2017).Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis.IEEE TRANSACTIONS ON CYBERNETICS,47(5),1224-1237. |
MLA | Wei, Qinglai,et al."Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis".IEEE TRANSACTIONS ON CYBERNETICS 47.5(2017):1224-1237. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。