gs2.aamas07 - Carnegie Mellon School of Computer Science
Download
Report
Transcript gs2.aamas07 - Carnegie Mellon School of Computer Science
Better automated abstraction techniques
for imperfect information games,
with application to Texas Hold’em poker
Andrew Gilpin and Tuomas Sandholm
Carnegie Mellon University
Computer Science Department
Games and information
• Perfect information games: agents have complete knowledge
about the state of the world
– E.g.: Chess, Go, Checkers
• Imperfect information games: agents face uncertainty about the
state of the world
– E.g.:
• Robot facing adversaries in an uncertain, stochastic environment
• Almost any economic situation in which the other participants have private
information (e.g. valuations, quality information)
• Almost any card game in which the other players’ cards are hidden
– Presents several challenges for AI
•
•
•
•
Imperfect information
Risk assessment and management
Speculation and counter-speculation
Signaling and interpreting signals
– Misrepresentation (bluffing…)
Nash equilibrium
• In multiagent systems, an agent’s outcome
depends on the actions of the other agents
an agent’s optimal strategy depends on the
strategies of the other agents
• Game theory defines how agents should play
• Definition: A Nash equilibrium is a strategy for
each agent such that no agent benefits by
deviating to another strategy
Finding an equilibrium
• In two-person zero-sum games,
– If opponent plays a non-equilibrium strategy, that can only help me
– Nash equilibria are interchangeable, so there is no equilibrium selection problem
– Equilibrium can be found using LP
• Any extensive form game (of perfect recall) can be converted into a matrix game
– Create one pure strategy in the matrix game for every possible pure contingency plan
in the sequential game
– Leads to exponential blowup in #strategies
• Sequence form: More compact representation based on sequences of moves
rather than pure strategies [Romanovskii 62, Koller & Megiddo 92, von Stengel 96]
– Two-person zero-sum games can be solved in time polynomial in size of game tree
– Doesn’t scale to Rhode Island Hold’em (3.1 billion nodes) or Texas Hold’em (1018
nodes)
Automated abstraction [EC-06]
• Automatic method for performing abstractions in a broad class
of sequential games of imperfect information
• Equilibrium-preserving game transformation
– certain information sets are merged, and
– certain nodes within an information set are collapsed
• GameShrink identifies and applies all those transformations
– Õ(n2) time
• n = #nodes in the signal tree. In poker, these are possible card deals
• Run-time tends to be highly sublinear in the size of the game tree
• Solved Rhode Island Hold’em
– Largest poker game solved to date by over four orders of magnitude
• Also developed approximate (lossy) version of GameShrink
– Uses a similarity metric on nodes in the signal tree (e.g., |#wins1 - #wins2|
+ |#losses1 - #losses2|) and a similarity threshold
Texas Hold’em poker
Tackling Texas Hold’em:
High-level view
• Two-person game tree has ~1018 leaves
– Too large to run lossless GameShrink
– Even after that, LP would be too large
• Already too large when we applied this to first two rounds
• We split the 4 betting rounds into two phases
– Phase I (first 3 rounds) solved offline using new
approximate version of GameShrink followed by LP
– Phase II (last 2 rounds):
• abstractions computed offline
• real-time equilibrium computation using updated hand
probabilities and anytime LP
Optimized approximate abstractions
• Original version of GameShrink is “greedy” when used as an
approximation algorithm => lopsided abstractions
• GS2 instead finds an abstraction via clustering & IP
• For round 1 in signal tree, use k-means clustering
– Similarity metric is win probability (ties count as half a win)
• Computed based on rollout
• For each round 2..3 of signal tree:
– For each group i of hands (children of a parent at round – 1):
• use k-means clustering to split group i into ki abstract “states”
• for each value of ki, compute expected error (considering hand probs)
– IP decides how many children different parents (from round – 1) may have:
Decide ki’s to minimize this expected error, subject to Σi ki ≤ Kround
• Kround is set based on acceptable LP size
• Solving this IP is fast in practice
Phase I (first three rounds)
• Optimized abstraction
– Round 1
• There are 1,326 hands, of which 169 are strategically different
• We allowed 15 abstract states
– Round 2
• There are 25,989,600 distinct possible hands
– GameShrink (in lossless mode for Phase I) determined there are ~106 strategically
different hands (still too large to solve)
• Allowed 225 abstract states
– Round 3
• There are 1,221,511,200 distinct possible hands
• Allowed 900 abstract states
• Optimizing the approximate abstraction took 3 days on 4 CPUs
• LP took 7 days and 80 GB using CPLEX’s barrier method
Mitigating effect of round-based abstraction
(i.e., having 2 phases)
• For leaves of Phase I, GS1 & Sparbot assumed
rollout
• Can do better by estimating the amount of
betting that occurs in later rounds
– Incorporate this info in LP for Phase I
• For each possible hand strength and in each
possible betting situation, we store the
probability of each possible action
– Mine history of how betting has gone in later rounds
– from 100,000’s of hands that Sparbot played
Example of betting in 4th round
Player 1 has bet. Player 2 to fold, call, or raise
Phase II (last two rounds)
• Abstractions computed offline
– Betting history doesn’t matter => ( 52
) situations
4
– Simple suit isomorphisms at the root of Phase II halves this
– For each such setting, we use clustering+IP to generate an abstraction with 10
and 100 strategically different hands in the last two rounds, respectively
• Real-time equilibrium computation (using LP)
– So that our strategies are specific to particular hand (too many to precompute)
– Updated hand probabilities from Phase I equilibrium using betting histories
and community card history:
• si is player i’s strategy, h is an information set
– Conditional choice of primal vs. dual simplex
• Achieve anytime capability for the player that is us
– Dealing with running off the equilibrium path
Precompute several databases
• db5: possible wins and losses (for a single player) for every
combination of two hole cards and three community cards
(25,989,600 entries)
– Used by GameShrink for quickly comparing the similarity of two hands
• db223: possible wins and losses (for both players) for every
combination of pairs of two hole cards and three community
cards based on a roll-out of the remaining cards (14,047,378,800
entries)
– Used for computing payoffs of the Phase I game to speed up the LP
creation
• handval: concise encoding of a 7-card hand rank used for fast
comparisons of hands (133,784,560 entries)
– Used in several places, including in the construction of db5 and db223
• Colexicographical ordering used to compute indices into the
databases allowing for very fast lookups
Experimental results
Opponent
GS1
Sparbot
Vexbot
Series won
38 of 50
28 of 50
32 of 50
Win rate
(small bets per 100)
+3.12
+0.43
-0.62
• GS1: Game theory-based player, old version of automated
abstraction, no strategy simulation in later rounds [Gilpin &
Sandholm AAAI-06]
• Sparbot: Game theory-based player, manual abstraction
[Billings et al 2003]
• Vexbot: Opponent modeling, miximax search with statistical
sampling [Billings et al 2004]
Measuring the contribution of new techniques
Opponent
Series won
Win rate
(small bets per 100)
GS2 without improved
abstraction and without
estimated payoffs
GS2 without improved
abstraction
48 of 50
+2.87
35 of 50
+2.73
GS2 without estimated
payoffs
44 of 50
+0.72
Summary
• Main ideas
– Optimized approximate state-space abstraction based on clustering and IP
– Computing leaf payoffs for truncated game based on statistics
• Automatically generated competitive Texas Hold’em bot
– First phase (rounds 1, 2 & 3): automated abstraction & LP solved offline,
using statistical data to compute payoffs at end of round 3
– Second phase (rounds 3 & 4): abstraction precomputed automatically; LP
solved in real-time using updated hand probabilities and anytime
• Techniques are applicable to many sequential games of imperfect
information, not just poker
Ongoing research
• Provable approximation, e.g., ex ante, ex post
• Other types of abstraction
• More scalable equilibrium-finding algorithms
• Tournament poker [e.g. Miltersen & Sørensen 07]
• No-limit poker
• More than two players
– May need to use alternative solution concept
Thank you