MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning
文献类型:会议论文
作者 | Zhiwei Xu1,2![]() ![]() ![]() ![]() |
出版日期 | 2021-09 |
会议日期 | 18-22 July 2021 |
会议地点 | Shenzhen, China |
英文摘要 | In the real world, many tasks require multiple agents to cooperate with each other under the condition of local observations. To solve such problems, many multi-agent reinforcement learning methods based on Centralized Training with Decentralized Execution have been proposed. One representative class of work is value decomposition, which decomposes the global joint Q-value Q jt into individual Q-values Q a to guide individuals' behaviors, e.g. VDN (Value-Decomposition Networks) and QMIX. However, these baselines often ignore the randomness in the situation. We propose MMD-MIX, a method that combines distributional reinforcement learning and value decomposition to alleviate the above weaknesses. Besides, to improve data sampling efficiency, we were inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to explicitly introduce randomness into the MMD-MIX. The experiments demonstrate that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge (SMAC) environment. |
语种 | 英语 |
源URL | [http://ir.ia.ac.cn/handle/173211/56518] ![]() |
专题 | 融合创新中心_决策指挥与体系智能 |
通讯作者 | Guoliang Fan |
作者单位 | 1.School of Artificial Intelligence, University of Chinese Academy of Sciences 2.Institute of Automation, Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Zhiwei Xu,Dapeng Li,Yunpeng Bai,et al. MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning[C]. 见:. Shenzhen, China. 18-22 July 2021. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。