SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning
文献类型:期刊论文
作者 | Li, Xiaoshuang2,7![]() ![]() ![]() |
刊名 | NEUROCOMPUTING
![]() |
出版日期 | 2022-01-07 |
卷号 | 467页码:300-309 |
关键词 | Deep reinforcement learning Behavioral cloning Dynamic demonstration Double DQN |
ISSN号 | 0925-2312 |
DOI | 10.1016/j.neucom.2021.09.064 |
通讯作者 | Wang, Fei-Yue(feiyue.wang@ia.ac.cn) |
英文摘要 | Deep Reinforcement Learning (DRL) has proven its capability to learn optimal policies in decision-making problems by directly interacting with environments. Meanwhile, supervised learning methods also show great capability of learning from data. However, how to combine DRL with supervised learning and leverage additional knowledge and data to assist the DRL agent remains difficult. This study proposes a novel Supervised Assisted Deep Reinforcement Learning (SADRL) framework integrating deep Q-learning from dynamic demonstrations with a behavioral cloning model (DQfDD-BC). Specifically, the proposed DQfDDBC method leverages historical demonstrations to pre-train a behavioral cloning model and consistently update it by learning the dynamically updated demonstrations. A supervised expert loss function is designed to compare actions generated by the DRL model with those obtained from the BC model to provide advantageous guidance for policy improvements. Experimental results in several OpenAI Gym environments show that the proposed approach accelerates the learning processes, and meanwhile, adapts to different performance levels of demonstrations. As illustrated in an ablation study, the dynamic demonstration and expert loss mechanisms using a BC model contribute to improving the learning convergence performance compared with the baseline models. We believe that SADRL provides an elegant framework and the proposed method can promote the integration of human experience and machine intelligence. (c) 2021 Elsevier B.V. All rights reserved. |
WOS关键词 | LEVEL CONTROL ; ROBOT ; GAME ; GO |
资助项目 | National Key R&D Program of China[2018AAA0101500] ; National Key R&D Program of China[2018AAA0101502] |
WOS研究方向 | Computer Science |
语种 | 英语 |
WOS记录号 | WOS:000709984900012 |
出版者 | ELSEVIER |
资助机构 | National Key R&D Program of China |
源URL | [http://ir.ia.ac.cn/handle/173211/46291] ![]() |
专题 | 自动化研究所_复杂系统管理与控制国家重点实验室_先进控制与自动化团队 |
通讯作者 | Wang, Fei-Yue |
作者单位 | 1.PCITECH, PCI Intelligent Bldg,2 Xincen Fourth Rd, Guangzhou 510653, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China 3.Wuhan Univ, Sch Elect Engn & Automat, Wuhan 430072, Peoples R China 4.China Elect Power Res Inst, State Key Lab Power Grid Safety & Energy Conserva, Beijing 100192, Peoples R China 5.Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA 6.Qingdao Acad Intelligent Ind, Parallel Intelligence Res Ctr, Qingdao 266109, Peoples R China 7.Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China |
推荐引用方式 GB/T 7714 | Li, Xiaoshuang,Wang, Xiao,Zheng, Xinhu,et al. SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning[J]. NEUROCOMPUTING,2022,467:300-309. |
APA | Li, Xiaoshuang.,Wang, Xiao.,Zheng, Xinhu.,Jin, Junchen.,Huang, Yanhao.,...&Wang, Fei-Yue.(2022).SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning.NEUROCOMPUTING,467,300-309. |
MLA | Li, Xiaoshuang,et al."SADRL: Merging human experience with machine intelligence via supervised assisted deep reinforcement learning".NEUROCOMPUTING 467(2022):300-309. |
入库方式: OAI收割
来源:自动化研究所
浏览0
下载0
收藏0
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。