中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Learning in bi-level markov games

文献类型:会议论文

作者Meng Linghui1,2; Ruan Jingqing1,2; Xing Dengpeng1; Xu Bo1,2
出版日期2022-07
会议日期2022.7.18-2022.7.23
会议地点Padua, Italy
英文摘要

Although multi-agent reinforcement learning (MARL) has demonstrated remarkable progress in tackling sophisticated cooperative tasks, the assumption that agents take simultaneous actions still limits the applicability of MARL for many real-world problems. In this work, we relax the assumption by proposing the framework of the bi-level Markov game (BMG). BMG breaks the simultaneity by assigning two players with a leader-follower relationship in which the leader considers the policy of the follower who is taking the best response based on the leader's actions. We propose two provably convergent algorithms to solve BMG: BMG-1 and BMG-2. The former uses the standard Q-learning, while the latter relieves solving the local Stackelberg equilibrium in BMG-1 with the further two-step transition to estimate the state value. For both methods, we consider temporal difference learning techniques with both tabular and neural network representations. To verify the effectiveness of our BMG framework, we test on a series of games, including Seeker, Cooperative Navigation, and Football, that are challenging to existing MARL solvers find challenging to solve: Seeker, Cooperative Navigation, and Football. Experimental results show that our BMG methods achieve competitive advantages in terms of better performance and lower variance.  

源URL[http://ir.ia.ac.cn/handle/173211/57335]  
专题数字内容技术与服务研究中心_听觉模型与认知计算
作者单位1.Institute of Automation, Chinese Academy of Sciences
2.School of Artificial Intelligence, University of Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Meng Linghui,Ruan Jingqing,Xing Dengpeng,et al. Learning in bi-level markov games[C]. 见:. Padua, Italy. 2022.7.18-2022.7.23.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。