Slides - University of Alberta
Download
Report
Transcript Slides - University of Alberta
Solving Probabilistic
Combinatorial Games
Ling Zhao & Martin Mueller
University of Alberta
September 7, 2005
Paper link: http://www.cs.ualberta.ca/~zhao/PCG.pdf
Motivations
Ken Chen’s previous work
to maximize winning
chance in Go.
Maximize points Vs.
maximize winning
probability
How to solve the abstract
game efficiently or play
the abstract game well?
Motivations
Results (black plays first): +15 (80%), -7 (20%)
Combinatorial Games
Probabilistic Combinatorial
Games
A Terminal node which is expressed as a
probability distribution (d={[p1, v1], [p2,
v2], …, [pn, vn] }) is a PCG.
If A1, A2, … An, B1, B2, …, Bn are PCGs, then
{A1, A2, … An | B1, B2, …, Bn} is a PCG.
Left options
Right options
A sum of PCGs is a PCG.
A move in a sum game consists of a move to an option in
exactly one subgame and leaves all other subgames
unchanged.
Simple PCG (SPCG)
Each PCG has exactly one left option and one
right option.
Each option leads immediately to a terminal
node.
Each distribution has exactly 2 values with
associate probabilities.
Problems to Address
How to solve PCGs efficiently?
How to play PCGs well if resources are
limited or fast play is required?
Game Tree Analysis
Very regular game tree: a node at depth k
has exactly n-k children, so n!/(n-k)! nodes
in total at depth k.
Very large number of transpositions:
C(n, k) * C(k, k/2) distinct nodes at depth k.
Terminal Node Evaluation
Terminal node is a
sum of probability
distributions.
Winning probability
Monte-Carlo Terminal
Evaluation (MCTE)
Use Monte-Carlo methods to randomly collect k samples
from 2n data points in the sum of n distributions.
Use the average winning percentage of samples (P’w) to
approximate the overall winning probability (Pw).
Theory from statistics: Pw - P’w is a normal distribution,
with mean=0 and std dev =
<=
Experimental results:
Monte-Carlo Interior
Evaluation (MCIE)
Evaluation of a node is
approximated by averaging the
values of terminal nodes reached
from it through random play.
Proposed by Abramson in 1990.
- Using 4x4 tic-tac-toe and 6x6 Othello for experiments.
Applied to Monte-Carlo Go by Bouzy and Helmstetter
and several other researchers.
SPCG Solver and Player
Solver: alpha-beta search, transposition
tables, move ordering (MCIE & MCTE).
Player: alpha-beta search to a certain
depth and use Monte-Carlo interior
evaluation for frontier nodes.
Experimental Results
100 randomly generated games, and
each game has 14 subgames.
Value distribution: probability from 0 to
1, value from –1000 to 1000.
AMD 2400MHz CPUs
220 cache entries (terminal nodes have
higher priority)
About 8 seconds to solve a game.
Solver Performance
Monte-Carlo move ordering: dm – depth limit
for Monte-Carlo move ordering being used,
otherwise history heuristic is used.
Monte-Carlo interior evaluation: nt –
percentage of all the current node’s
descendant terminal nodes sampled.
Monte-Carlo terminal evaluation: nc – number
of data points sampled.
Accurate terminal evaluation occupies 90% of
the overall running time.
Solver Performance: Results
Monte-Carlo Player
Test against the perfect player.
Each game has two rounds: each side
plays first once.
Winning probability is the average of
the two rounds.
Parameters: search depth and nc.
Monte-Carlo Player: Results
Error in Move Ordering
Average probability error:
Move: A
B
Actual win prob: 0.18 < 0.19
Estimate: 0.32 > 0.30
Win prob lost: 0.01
Average probability error is the average of the
winning probability lost in all move pairs of a
node.
Worst probability error: the probability lost due
to the wrong best move chosen.
Results
Conclusions
Efficient exact and heuristic solvers for SPCG.
Successfully incorporate Monte-Carlo move
ordering into alpha-beta search to SPCG.
A heuristic evaluation techniques based on
Monte-Carlo with performance close to the
prefect player.
Extensive experiments for the two solvers.
Future Work
Better algorithms to accurately evaluate
terminal nodes?
Progressive pruning.
Why does the simple Abramson’s
Expected Outcome model perform so
well in move ordering?
New Directions to Apply
Monte-Carlo to Computer Go
Monte-Carlo Go:
New Direction:
+
Future Work: Transformation
?
?
PCG solver or player