中国科学院机构知识库网格系统: Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces

文献类型：期刊论文


作者	Wang JP(王军平); You Kang Shi; Wen Sheng Zhang; Ian Thomas; Shi Hui Duan
刊名	IEEE Transactions on Industrial Informatics Information
出版日期	2019
卷号	15 期号:4 页码:2395-2404
英文摘要	The sequential decision-making problem with large-scale state spaces is an important and challenging topic for multitask reinforcement learning (MTRL). Training near-optimality policies across tasks suffers from prior knowledge deficiency in discrete-time nonlinear environment, especially for continuous task variations, requiring scalability approaches to transfer prior knowledge among new tasks when considering large number of tasks. This paper proposes a multitask policy adversarial learning (MTPAL) method for learning a nonlinear feedback policy that generalizes across multiple tasks, making cognizance ability of robot much closer to human-level decision making. The key idea is to construct a parametrized policy model directly from large high-dimensional observations by deep function approximators, and then train optimal of sequential decision policy for each new task by an adversarial process, in which simultaneously two models are trained: a multitask policy generator transforms samples drawn from a prior distribution into samples from a complex data distribution with higher dimensionality, and a multitask policy discriminator decides whether the given sample is prior distribution from human-level empirically derived or from the generator. All the related human-level empirically derived are integrated into the sequential decision policy, transferring human-level policy at every layer in a deep policy network. Extensive experimental testing result of four different WeiChai Power manufacturing data sets shows that our approach can surpass human performance simultaneously from cart-pole to production assembly control.
源URL	[http://ir.ia.ac.cn/handle/173211/51634]
专题	精密感知与控制研究中心_人工智能与机器学习
通讯作者	Wang JP(王军平)
作者单位	Institute of Automation, Chinese Academy of Science
推荐引用方式 GB/T 7714	Wang JP,You Kang Shi,Wen Sheng Zhang,et al. Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces[J]. IEEE Transactions on Industrial Informatics Information,2019,15(4):2395-2404.
APA	Wang JP,You Kang Shi,Wen Sheng Zhang,Ian Thomas,&Shi Hui Duan.(2019).Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces.IEEE Transactions on Industrial Informatics Information,15(4),2395-2404.
MLA	Wang JP,et al."Multitask Policy Adversarial Learning for Human-Level Control With Large State Spaces".IEEE Transactions on Industrial Informatics Information 15.4(2019):2395-2404.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。