中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
Empirical Policy Optimization for n-Player Markov Games

文献类型:期刊论文

作者Yuanheng Zhu; Weifan Li; Mengchen Zhao; Jianye Hao; Dongbin Zhao
刊名IEEE Transactions on Cybernetics
出版日期2022
页码doi={10.1109/TCYB.2022.3179775}
英文摘要

In single-agent Markov decision processes, an agent
can optimize its policy based on the interaction with the environment.
In multiplayer Markov games (MGs), however, the
interaction is nonstationary due to the behaviors of other players,
so the agent has no fixed optimization objective. The challenge
becomes finding equilibrium policies for all players. In this
research, we treat the evolution of player policies as a dynamical
process and propose a novel learning scheme for Nash equilibrium.
The core is to evolve one’s policy according to not
just its current in-game performance, but an aggregation of its
performance over history. We show that for a variety of MGs,
players in our learning scheme will provably converge to a point
that is an approximation to Nash equilibrium. Combined with
neural networks, we develop an empirical policy optimization
algorithm, which is implemented in a reinforcement-learning
framework and runs in a distributed way, with each player
optimizing its policy based on own observations. We use two
numerical examples to validate the convergence property on
small-scale MGs, and a pong example to show the potential on
large games.

源URL[http://ir.ia.ac.cn/handle/173211/51532]  
专题复杂系统管理与控制国家重点实验室_深度强化学习
推荐引用方式
GB/T 7714
Yuanheng Zhu,Weifan Li,Mengchen Zhao,et al. Empirical Policy Optimization for n-Player Markov Games[J]. IEEE Transactions on Cybernetics,2022:doi={10.1109/TCYB.2022.3179775}.
APA Yuanheng Zhu,Weifan Li,Mengchen Zhao,Jianye Hao,&Dongbin Zhao.(2022).Empirical Policy Optimization for n-Player Markov Games.IEEE Transactions on Cybernetics,doi={10.1109/TCYB.2022.3179775}.
MLA Yuanheng Zhu,et al."Empirical Policy Optimization for n-Player Markov Games".IEEE Transactions on Cybernetics (2022):doi={10.1109/TCYB.2022.3179775}.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。