中国科学院机构知识库网格系统: 马氏决策过程的递阶强化学习与灵敏度分析

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

马氏决策过程的递阶强化学习与灵敏度分析

文献类型：学位论文


作者	王利存
学位类别	工学博士
答辩日期	2001-02-01
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	郑应平
关键词	强化学习马氏决策过程半马氏决策过程灵敏度分析递阶算法仿真 Reinforcement Learning Markov Decision Processes Semi-Markov Decision Processes Sensitivity Analysis Hierarchical Algorithm Sim
学位专业	控制理论与控制工程
中文摘要	强化学习集成了人工智能与最优控制的基本思想，为求解大规模随机决策、优化和控制问题提供了一种有效方法，正逐渐引起人工智能、自动控制、运筹学、经济管理等各领域的研究兴趣。在现有马氏决策和强化学习理论已有成果的基础上，本文研究了平均型马氏决策问题的递阶强化学习；在半马氏过程灵敏度分析的基础上，研究了半马氏决策过程的表现一评判（Actor-Critic）算法，并对可重入排队网络的灵敏度分析问题进行了研究。论文的主要内容和贡献如下： 1．在分析马氏和半马氏决策过程强化学习算法的基础上，研究了平均型马氏决策过程的递阶强化学习算法。将算法应用于闭环可重入排队系统的调度问题，计算机仿真结果表明算法由于已有的启发式调度方法。 2．针对事件驱动马氏决策过程的特征，研究了事件驱动的平均型马氏决策问题的递阶强化学习，并且，对于一类特殊但广泛存在的事件驱动马氏决策问题，进一步给出了一种简化的强化学习算法。应用给出的两种算法研究了M/M/1 排队的接纳控制问题。 3．根据半马尔可夫过程Poisson方程解的性质，推导了半马尔可夫过程平均型品质指标对模型参数的导数公式，进而给出了基于值函数学习的灵敏度分析算法。为了说明算法的有效性，给出了数值仿真例子。 4．对具有参数化随机策略的有限状态半马氏决策过程，给出了平均型指标对随机策略参数的梯度公式和相应的估计算法，研究了半马氏决策过程的表现— 评判算法，数值仿真结果表明了算法的有效性。 5．对可数状态马尔可夫链，给出了平均型品质指标对模型参数的导数公式和估计算法。建立开环可重入排队网络的马氏模型，应用给出的算法研究了一定调度策略和系统结构下可重入生产系统品质指标对模型参数的灵敏度问题，计算机仿真结果证明了算法的实用性和有效性
英文摘要	Reinforcement learning integrates the basic ideas of artificial intelligence and optimal control，and it provides an efficient method to solve large-scale stochastic decision，optimization，and control problems。Researchers in the fields of artificial intelligence，automatic control，operations research，and economic management are paying more and more attentions on these aspects。Based on the existed achievements on Markov decision processes and reinforcement learning，hierarchical reinforcement learning algorithms are studied for Markov decision process and event-driven Markov decision process with average performance。After analyzing the sensitivity of semi-Markov processes average performance with respect to model parameters，this dissertation studies the Actor-Critic algorithms of semi-Markov decision processes。 Also，sensitivity analysis for a kind of re-entrant queueing networks is given。 Main works and contributions of this dissertation are as follows： 1．Based on the analysis of reinforcement learning algorithms of Markov and semi- Markov decision processes，hierarchical reinforcement learning algorithm of Markov decision processes with average performance is studied。And the algorithm is applied to close re-entrant queueing system scheduling problem。Computer simulation results demonstrate that the new algorithm is superior to existed heuristic scheduling methods。 2．Hierarchical reinforcement learning algorithm is designed for event-driven Markov decision processes with average performance。And for a special but widely existed class of event。driven Markov decision processes，a simple reinforcement learning algorithm is given。Using theses two new algorithms，the dissertation investigates the admission control problem of M/M/1 queueing system。 3．Based on the property of Poisson equation of semi-Markov processes，gradient formula of semi。Markov processes average performance with respect to model parameters are derived and sensitivity analysis algorithm is studied based on value function learning。In order to show the efficiency of this algorithm，numerical simulation example is given。 4．For finite state semi-Markov decision processes with parameterized stochastic policies，gradient formula of semi-Markov processes average performance with respect to stochastic policy parameters are derived。And a gradient estimation algorithm is studied。Then，an Actor-Critic algorithm for semi-Markov decision processes is given。Numerical simulation result shows the efficiency of the algorithm。 5．For countable state Markov chain，gradient formula of average performance with respect to model parameters and gradient estimation algorithm is studied。Markov model for a kind of open re-entrant queueing networks is established。Using the given algorithm，performance sensitivity with respect to system parameters of reentrant manufacturing systems under some scheduling policy is analyzed Computer simulation r
语种	中文
其他标识符	651
源URL	[http://ir.ia.ac.cn/handle/173211/5716]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	王利存. 马氏决策过程的递阶强化学习与灵敏度分析[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2001.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。