中国科学院机构知识库网格系统: AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

中国科学院机构知识库网格

Chinese Academy of Sciences Institutional Repositories Grid

AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

文献类型：会议论文


作者	Zhao EM(赵恩民)1,3 ; Yan RY(闫仁业)1,3; Li JQ(李金秋)1,3; Li K(李凯)3 ; Xing JL(兴军亮)1,2,3
出版日期	2021-02
会议日期	2022-02-22
会议地点	线上
DOI	无
英文摘要	Heads-up no-limit Texas hold’em (HUNL) is the quintessen tial game with imperfect information. Representative prior works like DeepStack and Libratus heavily rely on counter factual regret minimization (CFR) and its variants to tackle HUNL. However, the prohibitive computation cost of CFR iteration makes it diffificult for subsequent researchers to learn the CFR model in HUNL and apply it in other practical ap plications. In this work, we present AlphaHoldem, a high performance and lightweight HUNL AI obtained with an end to-end self-play reinforcement learning framework. The pro posed framework adopts a pseudo-siamese architecture to di rectly learn from the input state information to the output ac tions by competing the learned model with its different his torical versions. The main technical contributions include a novel state representation of card and betting information, a multi-task self-play training loss function, and a new model evaluation and selection metric to generate the fifinal model. In a study involving 100,000 hands of poker, AlphaHoldem defeats Slumbot and DeepStack using only one PC with three days training. At the same time, AlphaHoldem only takes 2.9 milliseconds for each decision-making using only a single GPU, more than 1,000 times faster than DeepStack.
语种	英语
源URL	[http://ir.ia.ac.cn/handle/173211/52251]
专题	融合创新中心_决策指挥与体系智能
作者单位	1.School of Artificial Intelligence, University of Chinese Academy of Sciences 2.Tsinghua University 3.Institute of Automation, Chinese Academy of Sciences
推荐引用方式 GB/T 7714	Zhao EM,Yan RY,Li JQ,et al. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning[C]. 见:. 线上. 2022-02-22.

入库方式： OAI收割

来源：自动化研究所

浏览0

下载0

收藏0

其他版本

除非特别说明，本系统中所有内容都受版权保护，并保留所有权利。