中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning

文献类型:期刊论文

作者Li, Xiaoshuang2,7; Wang, Xiao6,7; Zheng, Xinhu5; Jin, Junchen1; Huang, Yanhao4; Zhang, Jun Jason3,7; Wang, Fei-Yue6,7
刊名NEUROCOMPUTING
出版日期2022-01-07
卷号467页码:300-309
关键词Deep reinforcement learning Behavioral cloning Dynamic demonstration Double DQN
ISSN号0925-2312
DOI10.1016/j.neucom.2021.09.064
通讯作者Wang, Fei-Yue(feiyue.wang@ia.ac.cn)
英文摘要Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDDBC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence. (c) 2021 Elsevier B.V. All rights reserved.
WOS关键词LEVEL CONTROL ; ROBOT ; GAME ; GO
资助项目National Key R&D Program of China[2018AAA0101500] ; National Key R&D Program of China[2018AAA0101502]
WOS研究方向Computer Science
语种英语
WOS记录号WOS:000709984900012
出版者ELSEVIER
资助机构National Key R&D Program of China
源URL[http://ir.ia.ac.cn/handle/173211/46291]  
专题自动化研究所_复杂系统管理与控制国家重点实验室_先进控制与自动化团队
通讯作者Wang, Fei-Yue
作者单位1.PCITECH, PCI Intelligent Bldg,2 Xincen Fourth Rd, Guangzhou 510653, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
3.Wuhan Univ, Sch Elect Engn & Automat, Wuhan 430072, Peoples R China
4.China Elect Power Res Inst, State Key Lab Power Grid Safety & Energy Conserva, Beijing 100192, Peoples R China
5.Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA
6.Qingdao Acad Intelligent Ind, Parallel Intelligence Res Ctr, Qingdao 266109, Peoples R China
7.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
推荐引用方式
GB/T 7714
Li, Xiaoshuang,Wang, Xiao,Zheng, Xinhu,et al. SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning[J]. NEUROCOMPUTING,2022,467:300-309.
APA Li, Xiaoshuang.,Wang, Xiao.,Zheng, Xinhu.,Jin, Junchen.,Huang, Yanhao.,...&Wang, Fei-Yue.(2022).SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning.NEUROCOMPUTING,467,300-309.
MLA Li, Xiaoshuang,et al."SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning".NEUROCOMPUTING 467(2022):300-309.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。