中国科学院机构知识库网格系统: Explicitly Learning Policy Under Partial Observability in Multiagent Reinforcement Learning

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Explicitly Learning Policy Under Partial Observability in Multiagent Reinforcement Learning

文献类型：会议论文


作者	Yang, Chen1,2 ; Yang, Guangkai1,2 ; Chen, Hao2 ; Zhang, Junge1,2
出版日期	2023-06
会议日期	2023-6
会议地点	Queensland, Australia
DOI	10.1109/IJCNN54540.2023.10191476
英文摘要	We explore explicit solutions for multiagent reinforcement learning (MARL) under the constraint of partial observability. With a general framework of centralized training with decentralized execution (CTDE), existing methods implicitly alleviate partial observability by introducing global information during centralized training. However, such implicit solution cannot well address partial observability and shows low sample efficiency in many MARL problems. In this paper, we focus on the influence of partial observability on the policy of agents, and formally derive an ideal form of policy that maximizes MARL objective under partial observability. Furthermore, we develop a new method named Explicitly Learning Policy (ELP), which adopts a novel teacher-student structure and utilizes knowledge distillation to explicitly learn individual policy under partial observability for each agent. Compared to prior methods, ELP presents a more general and interpretable training process, and the procedure of ELP can be easily extended to existing methods for performance boost. Our empirical experiments on StarCraft II micromanagement benchmark show that ELP significantly outperforms prevailing state-of-the-art baselines, which demonstrates the advantage of ELP in addressing partial observability and improving sample efficiency.
会议录出版者	IEEE
会议录出版地	IEEE
语种	英语
URL标识	查看原文
源URL	[http://ir.ia.ac.cn/handle/173211/56653]
专题	智能系统与工程
作者单位	1.Institute of Automation, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Yang, Chen,Yang, Guangkai,Chen, Hao,et al. Explicitly Learning Policy Under Partial Observability in Multiagent Reinforcement Learning[C]. 见:. Queensland, Australia. 2023-6.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。