中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
连续时间自适应动态规划及其在复杂系统控制中的应用

文献类型:学位论文

作者李超
学位类别工学博士
答辩日期2017-05-24
授予单位中国科学院大学
授予地点北京
导师刘德荣
关键词自适应动态规划 智能控制 最优控制 神经网络 复杂系统
中文摘要

自适应动态规划(Adaptive Dynamic Programming, ADP)结合了现代控制理论中的最优控制和自适应控制、计算智能中的人工神经网络以及机器学习中强化学习的思想,可以解决传统动态规划中的“维数灾难”问题,是一种具有学习和优化能力的智能控制方法,在求解连续时间复杂非线性系统的控制问题中具有极大的潜力。在当今社会生活和工业领域中存在着大量的复杂系统,这些实际系统通常具有未知的动态特性、高度的非线性和不确定性,难于建立机理模型,而传统的控制理论一般都依赖于精确的数学模型,致使其应用受到了很大限制。因此,研究连续时间ADP理论及其在复杂系统控制中的应用具有重要价值。本文的主要工作和贡献体现在以下三个方面。

1. 对于有限时间最优输出跟踪控制问题,构建了一个由系统状态和参考轨迹共同作为状态的增广系统,从理论上证明了增广系统的有限时间最优调节控制问题与原问题的等价性。在模型部分未知的情况下,提出了基于策略迭代的ADP学习算法来在线实时求解最优控制策略,然后给出了算法的性能分析,并给出了基于线性参数化结构的实现方法以及仿真验证。

2. 研究了模型未知的弱耦合非线性系统最优控制问题,提出了基于数据的在线学习ADP迭代算法。根据最优性原理,原系统转化为三个解耦并降阶的子系统,由此给出了基于子系统的控制策略,从理论上分析了该控制策略的近似最优性。对于每个子系统,采用了评判神经网络和执行神经网络分别近似其值函数和控制策略,同步调整权值,并给出了基于最小二乘法的实现方法以及仿真验证。

3. 研究了复杂系统控制问题中的仿射非线性系统鲁棒控制以及内部交联非线性系统分散控制,提出了无模型积分策略迭代ADP算法。该方法不对未知系统进行辨识,仅利用在线测量数据,并且同时更新值函数和控制策略。对于鲁棒控制问题,理论分析证明了在名义系统的最优控制策略基础上改变反馈增益所得控制律的鲁棒性。对于交联系统分散控制问题,理论分析证明了在独立子系统最优控制策略基础上改变反馈增益所得控制律组的稳定性。最后在多机电力系统控制问题的仿真实验中验证了所提方法的有效性。

英文摘要

By combining with optimal control, adaptive control, neural networks and reinforcement learning, adaptive dynamic programming (ADP) can be used to solve the problem of “curse of dimensionality” in the traditional dynamic programming. As a kind of intelligent control methods with learning and optimization capabilities, ADP has great potential in solving the control problems for continuous-time complex nonlinear systems. There are a large number of complex systems in daily life and industry. These real physical systems usually have unknown system dynamics, strong nonlinearities and uncertainties. Hence, it is difficult to establish accurate mathematical models. In traditional control theory, many methods depend on the accurate models and this restricts the implementation. Therefore, the study on the continuous-time ADP with its applications in control of complex systems has significant meaning. The main contributions of this thesis include the following three parts.

    1. An augmented system is constructed with an augmented state which consists of the system state and the reference trajectory to solve the finite horizon optimal output tracking control. The theoretical results show the equivalence between the finite horizon optimal regulator control of the augmented system and the original problem. An online learning ADP algorithm based on policy iteration is developed to solve the optimal control policy in real-time with partially unknown system dynamics. The performance analysis of this algorithm is given. The implementation method using linear parameterized structures and the simulation example are also provided.

    2. A data-based online learning ADP algorithm is developed for optimal control of weakly coupled nonlinear systems with completely unknown dynamics. According to the principle of optimality, the original system is reformulated into three decoupled and reduced-order subsystems. The approximate optimality of the control policy which is derived from the optimal control laws of the subsystems is analyzed. For each subsystem, a critic neural network and an action neural network are used to approximate its value function and control policy, respectively. The weights of the neural networks are updated synchronously. The least squares method is used to implement the algorithm and the simulation examples are provided.

    3. A model-free integral policy iteration ADP algorithm is developed to solve the robust control of affine nonlinear systems and the decentralized control of nonlinear interconnected systems. This proposed method dose not require to identify the unknown dynamics, but only makes use of online measured data. The algorithm updates the value function and the control policy simultaneously. For the robust control problem, the robustness of the control policy by increasing a feedback gain to the optimal controller of the nominal system is theoretical analyzed and proved. For the decentralized control problem, the stability of the control policy by adding some local feedback gains to the optimal control laws of the isolated subsystems is theoretical analyzed and proved. Finally, the effectiveness of the proposed method is demonstrated in the control of multimachine power system.

源URL[http://ir.ia.ac.cn/handle/173211/14620]  
专题毕业生_博士学位论文
作者单位中国科学院自动化研究所
推荐引用方式
GB/T 7714
李超. 连续时间自适应动态规划及其在复杂系统控制中的应用[D]. 北京. 中国科学院大学. 2017.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。