中国科学院机构知识库网格
Chinese Academy of Sciences Institutional Repositories Grid
AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning

文献类型:会议论文

作者Zhao EM(赵恩民)1,3; Yan RY(闫仁业)1,3; Li JQ(李金秋)1,3; Li K(李凯)3; Xing JL(兴军亮)1,2,3
出版日期2021-02
会议日期2022-02-22
会议地点线上
DOI
英文摘要
Heads-up no-limit Texas hold’em (HUNL) is the quintessen
tial game with imperfect information. Representative prior
works like DeepStack and Libratus heavily rely on counter
factual regret minimization (CFR) and its variants to tackle
HUNL. However, the prohibitive computation cost of CFR
iteration makes it diffificult for subsequent researchers to learn
the CFR model in HUNL and apply it in other practical ap
plications. In this work, we present AlphaHoldem, a high
performance and lightweight HUNL AI obtained with an end
to-end self-play reinforcement learning framework. The pro
posed framework adopts a pseudo-siamese architecture to di
rectly learn from the input state information to the output ac
tions by competing the learned model with its different his
torical versions. The main technical contributions include a
novel state representation of card and betting information, a
multi-task self-play training loss function, and a new model
evaluation and selection metric to generate the fifinal model.
In a study involving 100,000 hands of poker, AlphaHoldem
defeats Slumbot and DeepStack using only one PC with three
days training. At the same time, AlphaHoldem only takes 2.9
milliseconds for each decision-making using only a single
GPU, more than 1,000 times faster than DeepStack.
语种英语
源URL[http://ir.ia.ac.cn/handle/173211/52251]  
专题融合创新中心_决策指挥与体系智能
作者单位1.School of Artificial Intelligence, University of Chinese Academy of Sciences
2.Tsinghua University
3.Institute of Automation, Chinese Academy of Sciences
推荐引用方式
GB/T 7714
Zhao EM,Yan RY,Li JQ,et al. AlphaHoldem: High-Performance Artificial Intelligence for Heads-Up No-Limit Poker via End-to-End Reinforcement Learning[C]. 见:. 线上. 2022-02-22.

入库方式: OAI收割

来源:自动化研究所

浏览0
下载0
收藏0
其他版本

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。