中国科学院机构知识库网格系统: 基于数据的自适应动态规划最优控制与微分博弈研究

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

基于数据的自适应动态规划最优控制与微分博弈研究

文献类型：学位论文


作者	李宏亮
学位类别	工学博士
答辩日期	2015-05-21
授予单位	中国科学院大学
授予地点	中国科学院自动化研究所
导师	刘德荣
关键词	智能控制自适应动态规划神经网络最优控制微分博弈 Intelligent control adaptive dynamic programming neural networks optimal controldifferential games differential games
其他题名	Data-Based Adaptive Dynamic Programming for Optimal Control and Differential Games
学位专业	控制理论与控制工程
中文摘要	自适应动态规划(adaptive dynamic programming, ADP)可以解决传统动态规划中的“维数灾难”问题, 已经成为智能控制与计算智能领域最新的研究热点.ADP是一种具有自学习和优化能力的智能控制方法, 在求解复杂非线性系统的最优控制问题中具有极大的潜力. 然而很多实际系统通常具有高度的非线性、未知的动态特性、模型的不确定性等, 难以建立精确的数学模型. 因此, 研究基于数据的控制与优化方法在理论上和实践上都有重要价值. 本文在综述当前研究现状的基础上, 以现代控制理论、机器学习、博弈论等为主要工具, 研究基于数据的ADP理论和方法, 以解决模型未知的非线性系统最优控制与微分博弈(零和博弈与非零和博弈)问题. 本文的主要贡献包括以下四个方面. 1. 对于非线性离散时间系统最优控制问题, 从理论上分析了当迭代过程中存在近似误差时ADP方法的收敛性, 并分别建立了近似值迭代、近似策略迭代和近似乐观策略迭代三种ADP算法的误差边界. 理论结果表明尽管每步迭代过程中存在近似误差, 近似值函数仍可以收敛到最优值函数的有限邻域内. 然后将该结果推广到了基于Q函数的ADP方法中, 从而提出了基于数据的迭代ADP方法. 最后给出了基于多层前馈神经网络的实现方法以及仿真验证. 2. 针对状态连续、控制离散的模型未知系统最优控制问题, 提出了基于流形正则化的无模型近似策略迭代方法. 采用了无监督的流形正则化特征学习方法从离线数据中自动学习值函数近似结构的基函数, 然后将学到的基函数用于L2正则化最小二乘策略迭代算法, 并给出了算法性能分析. 该方法能够学习状态空间的内在结构信息, 避免了人工设计特征的问题, 并能够给出直接的基函数扩展. 最后在倒立摆平衡控制和能源存储优化问题中验证了所提方法的有效性. 3. 研究了模型未知的连续时间系统零和微分博弈问题, 提出了基于数据的无模型积分策略迭代ADP算法来在线学习纳什均衡解, 给出了算法的收敛性分析, 并给出了基于线性参数化结构的实现方法以及仿真验证. 该方法不对未知系统进行辨识, 仅仅利用在线测量数据, 并且同时更新值函数、控制策略和干扰策略. 最后将该结果推广到了非线性连续时间系统零和博弈问题. 4. 研究了模型未知的连续时间系统多人非零和微分博弈问题, 提出了基于策略迭代的在线同步近似优化学习方法. 证明了求解非零和博弈的策略迭代与拟牛顿迭代是等价的. 采用了模型神经网络在线辨识未知系统, 并证明了神经网络权值的收敛性. 对于每个参与者, 采用了评判神经网络和执行神经网络分别近似其值函数和控制策略, 但仅需要调整评判网络的权值, 从而降低了学习过程中的计算复杂性, 并采用Lyapunov方法证明了闭环系统的一致最终有界稳定性.
英文摘要	Adaptive dynamic programming (ADP) can solve the problem of “curse of dimensionality” in the traditional dynamic programming, and has become a hot topic in the field of intelligent control and computational intelligence recently. As a kind of intelligent control methods with self-learning and optimization capabilities, ADP has great potential in solving the optimal control problems of complex nonlinear systems. However, many practical systems usually have strong nonlinearities, unknown dynamics, model uncertainties, and so on, and thus it is difficult to establish accurate mathematical models. Therefore, the study on the data-based control and optimization methods has the significant meaning both in theory and in practice. On the basis of review and summary of the corresponding research, this thesis employs the modern control theory, machine learning, and game theory as the major tools, studies the data-based ADP theory and method, aiming to solve the optimal control and differential games (zero-sum games and non-zero-sum games) of nonlinear systems with unknown dynamics. The main contributions of this thesis include the following four parts.1. The convergence of ADP algorithms for optimal control problems of nonlinear discrete-time systems is analyzed considering the approximation errors during the iteration, and the error bounds are established for approximate value iteration, approximate policy iteration and approximate optimistic policy iteration algorithms, respectively. The theoretical results show that the iterative approximate value function can converge to a finite neighborhood of the optimal value function, although there exist approximation errors. Then, the obtained results are extended to the ADP algorithms based on the Q-function, and thus the databased ADP methods are developed. Finally, the implementation methods using multilayer feedforward neural networks and the simulation examples are provided. 2. A model-free approximate policy iteration scheme based on manifold regularization is developed for optimal control of unknown systems with continuous state spaces and discrete action spaces. The proposed algorithm uses the unsupervised manifold regularized feature learning method to automatically learn basis representations for value function approximation from the collected data. Then, it applies the learned basis functions to the L2 regularized least-squares policy iteration algorithm. The performance analysis of this algorithm is also...
语种	中文
其他标识符	201218014628006
源URL	[http://ir.ia.ac.cn/handle/173211/6671]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	李宏亮. 基于数据的自适应动态规划最优控制与微分博弈研究[D]. 中国科学院自动化研究所. 中国科学院大学. 2015.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。