Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
文献类型:会议论文
作者 | Pei Xu1,2; Junge Zhang2; Kaiqi Huang1,2,3 |
出版日期 | 2023-08 |
会议日期 | 2023-8 |
会议地点 | Macao, China |
英文摘要 | Exploration under sparse rewards is a key challenge for multi-agent reinforcement learning problems. Previous works argue that complex dynamics between agents and the huge exploration space in MARL scenarios amplify the vulnerability of classical count-based exploration methods when combined with agents parameterized by neural networks, resulting in inefficient exploration. In this paper, we show that introducing constrained joint policy diversity into a classical count-based method can significantly improve exploration when agents are parameterized by neural networks. Specifically, we propose a joint policy diversity to measure the difference between current joint policy and previous joint policies, and then use a filtering-based exploration constraint to further refine this joint policy diversity. Under the sparse-reward setting, we show that the proposed method significantly outperforms the state-of-the-art methods in the multiple-particle environment, the Google Research Football, and StarCraft II micromanagement tasks. To the best of our knowledge, on the hard 3s_vs_5z task which needs non-trivial strategies to defeat enemies, our method is the first to learn winning strategies without domain knowledge under the sparse-reward setting. |
会议录出版者 | International Joint Conference on Artificial Intelligence |
源URL | [http://ir.ia.ac.cn/handle/173211/52051] |
专题 | 智能系统与工程 |
作者单位 | 1.School of Artificial Intelligence, University of Chinese Academy of Sciences 2.CRISE, Institute of Automation, Chinese Academy of Sciences 3.CAS, Center for Excellence in Brain Science and Intelligence Technology |
推荐引用方式 GB/T 7714 | Pei Xu,Junge Zhang,Kaiqi Huang. Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks[C]. 见:. Macao, China. 2023-8. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。