中国科学院机构知识库网格系统: SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

文献类型：期刊论文


作者	Li, Xiaoshuang2,7 ; Wang, Xiao6,7 ; Zheng, Xinhu 5; Jin, Junchen 1; Huang, Yanhao 4; Zhang, Jun Jason 3,7; Wang, Fei-Yue6,7
刊名	NEUROCOMPUTING
出版日期	2022-01-07
卷号	467 页码:300-309
关键词	Deep reinforcement learning Behavioral cloning Dynamic demonstration Double DQN
ISSN号	0925-2312
DOI	10.1016/j.neucom.2021.09.064
通讯作者	Wang, Fei-Yue(feiyue.wang@ia.ac.cn)
英文摘要	Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDDBC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence. (c) 2021 Elsevier B.V. All rights reserved.
WOS关键词	LEVEL CONTROL ; ROBOT ; GAME ; GO
资助项目	National Key R&D Program of China[2018AAA0101500] ; National Key R&D Program of China[2018AAA0101502]
WOS研究方向	Computer Science
语种	英语
WOS记录号	WOS:000709984900012
出版者	ELSEVIER
资助机构	National Key R&D Program of China
源URL	[http://ir.ia.ac.cn/handle/173211/46291]
专题	自动化研究所_复杂系统管理与控制国家重点实验室_先进控制与自动化团队
通讯作者	Wang, Fei-Yue
作者单位	1.PCITECH, PCI Intelligent Bldg,2 Xincen Fourth Rd, Guangzhou 510653, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 3.Wuhan Univ, Sch Elect Engn & Automat, Wuhan 430072, Peoples R China 4.China Elect Power Res Inst, State Key Lab Power Grid Safety & Energy Conserva, Beijing 100192, Peoples R China 5.Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA 6.Qingdao Acad Intelligent Ind, Parallel Intelligence Res Ctr, Qingdao 266109, Peoples R China 7.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
推荐引用方式 GB/T 7714	Li, Xiaoshuang,Wang, Xiao,Zheng, Xinhu,et al. SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning[J]. NEUROCOMPUTING,2022,467:300-309.
APA	Li, Xiaoshuang.,Wang, Xiao.,Zheng, Xinhu.,Jin, Junchen.,Huang, Yanhao.,...&Wang, Fei-Yue.(2022).SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning.NEUROCOMPUTING,467,300-309.
MLA	Li, Xiaoshuang,et al."SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning".NEUROCOMPUTING 467(2022):300-309.

入库方式： OAI收割

来源：自动化研究所

下载0

SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

其他版本