SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning
文献类型:会议论文
作者 | Zhiwei Xu1,2![]() ![]() ![]() ![]() |
出版日期 | 2022 |
会议日期 | May 9-13, 2022 |
会议地点 | Auckland, New Zealand |
DOI | 10.5555/3535850.3536006 |
页码 | 1400-1408 |
英文摘要 | As one of the solutions to the decentralized partially observable Markov decision process (Dec-POMDP) problems, the value decomposition method has achieved significant results recently. However, most value decomposition methods require the fully observable state of the environment during training, but this is not feasible in some scenarios where only incomplete and noisy observations can be obtained. Therefore, we propose a novel value decomposition framework, named State Inference for value DEcomposition (SIDE), which eliminates the need to know the global state by simultaneously seeking solutions to the two problems of optimal control and state inference. SIDE can be extended to any value decomposition method to tackle partially observable problems. By comparing with the performance of different algorithms in StarCraft II micromanagement tasks, we verified that though without accessible states, SIDE can infer the current state that contributes to the reinforcement learning process based on past local observations and even achieve superior results to many baselines in some complex scenarios. |
语种 | 英语 |
URL标识 | 查看原文 |
源URL | [http://ir.ia.ac.cn/handle/173211/56522] ![]() |
专题 | 融合创新中心_决策指挥与体系智能 |
通讯作者 | Guoliang Fan |
作者单位 | 1.Institute of Automation, Chinese Academy of Sciences 2.School of Artificial Intelligence, University of Chinese Academy of Sciences |
推荐引用方式 GB/T 7714 | Zhiwei Xu,Yunpeng Bai,Dapeng Li,et al. SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning[C]. 见:. Auckland, New Zealand. May 9-13, 2022. |
入库方式: OAI收割
来源:自动化研究所
其他版本
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。