中国科学院机构知识库网格系统: Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

文献类型：期刊论文


作者	Wei, Qinglai1 ; Lewis, Frank L.2,3; Sun, Qiuye 4; Yan, Pengfei1 ; Song, Ruizhuo 5
刊名	IEEE TRANSACTIONS ON CYBERNETICS
出版日期	2017-05-01
卷号	47 期号:5 页码:1224-1237
关键词	Adaptive Critic Designs Adaptive Dynamic Programming (Adp) Approximate Dynamic Programming Neural Networks (Nns) Neuro-dynamic Programming Optimal Control Q-learning
DOI	10.1109/TCYB.2016.2542923
文献子类	Article
英文摘要	In this paper, a novel discrete-time deterministic Q-learning algorithm is developed. In each iteration of the developed Q-learning algorithm, the iterative Q function is updated for all the state and control spaces, instead of updating for a single state and a single control in traditional Q-learning algorithm. A new convergence criterion is established to guarantee that the iterative Q function converges to the optimum, where the convergence criterion of the learning rates for traditional Q-learning algorithms is simplified. During the convergence analysis, the upper and lower bounds of the iterative Q function are analyzed to obtain the convergence criterion, instead of analyzing the iterative Q function itself. For convenience of analysis, the convergence properties for undiscounted case of the deterministic Q-learning algorithm are first developed. Then, considering the discounted factor, the convergence criterion for the discounted case is established. Neural networks are used to approximate the iterative Q function and compute the iterative control law, respectively, for facilitating the implementation of the deterministic Q-learning algorithm. Finally, simulation results and comparisons are given to illustrate the performance of the developed algorithm.
WOS关键词	OPTIMAL TRACKING CONTROL ; ZERO-SUM GAMES ; H-INFINITY CONTROL ; INPUT-OUTPUT DATA ; DEAD-ZONE INPUT ; NONLINEAR-SYSTEMS ; ALGORITHM ; DESIGN ; REPRESENTATION ; APPROXIMATION
WOS研究方向	Computer Science
语种	英语
WOS记录号	WOS:000399797000009
资助机构	National Natural Science Foundation (NNSF) of China(61374105 ; Fundamental Research Funds for the Central Universities(FRF-TP-15-056A3) ; Open Research Project from SKLMCCS(20150104) ; National Science Foundation(ECCS-1405173 ; Office of Naval Research, Arlington, VA, USA(N00014-13-1-0562 ; U.S. Army Research Office(W911NF-11-D-0001) ; China NNSF(61120106011) ; China Education Ministry Project 111(B08015) ; 61304079 ; IIS-1208623) ; N000141410718) ; 61273140)
源URL	[http://ir.ia.ac.cn/handle/173211/13630]
专题	自动化研究所_复杂系统管理与控制国家重点实验室_智能化团队
作者单位	1.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China 2.Univ Texas Arlington, UTA Res Inst, Arlington, TX 76118 USA 3.Northeastern Univ, Shenyang 110036, Peoples R China 4.Northeastern Univ, Sch Informat Sci & Engn, Shenyang 110036, Peoples R China 5.Univ Sci & Technol Beijing, Sch Automat & Elect Engn, Beijing 100083, Peoples R China
推荐引用方式 GB/T 7714	Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,et al. Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis[J]. IEEE TRANSACTIONS ON CYBERNETICS,2017,47(5):1224-1237.
APA	Wei, Qinglai,Lewis, Frank L.,Sun, Qiuye,Yan, Pengfei,&Song, Ruizhuo.(2017).Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis.IEEE TRANSACTIONS ON CYBERNETICS,47(5),1224-1237.
MLA	Wei, Qinglai,et al."Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis".IEEE TRANSACTIONS ON CYBERNETICS 47.5(2017):1224-1237.

入库方式： OAI收割

来源：自动化研究所

下载0

Discrete-Time Deterministic Q-Learning: A Novel Convergence Analysis

其他版本