中国科学院机构知识库网格系统: Learning Superior Cooperative Policy in Competitive Multi-team Reinforcement Learning

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

Learning Superior Cooperative Policy in Competitive Multi-team Reinforcement Learning

文献类型：会议论文


作者	Qingxu Fu1,2 ; Tenghai Qiu1,2 ; Zhiqiang Pu1,2 ; Jianqiang Yi1,2 ; Xiaolin Ai1,2 ; Wanmai Yuan 1,2
出版日期	2023-06
会议日期	2023-6
会议地点	Gold Coast, Australia
英文摘要	Multi-agent Reinforcement Learning (MARL) has become a powerful tool for addressing multi-agent challenges. Existing studies have explored numerous models to use MARL to solve single-team cooperation (competition) problems and adversarial problems with opponents controlled by static knowledge-based policies. However, most studies in the literature often ignore adversarial multi-team problems involving dynamically evolving opponents. We investigate adversarial multi-team problems where all participating teams use MARL learners to learn policies against each other. Two objectives are achieved in this study. Firstly, we design an adversarial team-versus-team learning framework to generate cooperative multi-agent policies to compete against opponents without preprogrammed opponent partners or any supervision. Secondly, we explore the key factors to achieve win-rate superiority during dynamic competitions. Then we put forward a novel FeedBack MARL (FBMARL) algorithm that takes advantage of feedback loops to adjust optimizer hyper-parameters based on real-time game statistics. Finally, the effectiveness of our FBMARL model is tested in a benchmark environment named Multi-Team Decentralized Collective Assault (MT-DCA). The results demonstrate that our feedback MARL model can achieve superior performance over baseline competitor MARL learners in 2-team and 3-team dynamic competitions.
源URL	[http://ir.ia.ac.cn/handle/173211/57225]
专题	综合信息系统研究中心_飞行器智能技术
作者单位	1.80146-中国科学院自动化研究所 2.80170-中国科学院大学
推荐引用方式 GB/T 7714	Qingxu Fu,Tenghai Qiu,Zhiqiang Pu,et al. Learning Superior Cooperative Policy in Competitive Multi-team Reinforcement Learning[C]. 见:. Gold Coast, Australia. 2023-6.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。