中国科学院机构知识库网格系统: Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks

文献类型：会议论文


作者	Pei Xu1,2 ; Junge Zhang2 ; Kaiqi Huang1,2,3
出版日期	2023-08
会议日期	2023-8
会议地点	Macao, China
英文摘要	Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine this joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting.
会议录出版者	International Joint Conference on Artificial Intelligence
源URL	[http://ir.ia.ac.cn/handle/173211/52051]
专题	智能系统与工程
作者单位	1.School of Artificial Intelligence, University of Chinese Academy of Sciences 2.CRISE, Institute of Automation, Chinese Academy of Sciences 3.CAS, Center for Excellence in Brain Science and Intelligence Technology
推荐引用方式 GB/T 7714	Pei Xu,Junge Zhang,Kaiqi Huang. Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks[C]. 见:. Macao, China. 2023-8.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。