Transcript ppt

Game Playing
ECE457 Applied Artificial Intelligence
Spring 2007
Lecture #5
Outline


Types of games
Playing a perfect game



Playing an imperfect game




Minimax search
Alpha-beta pruning
Real-time
Imperfect information
Chance
Russell & Norvig, chapter 6
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 2
Game Problems

Games are well-defined search problems…





Well-defined board configurations (states)
Limited set of well-defined moves (actions)
Well-defined victory conditions (goal)
Values assigned to pieces, moves, outcomes (cost)
…that are hard to solve by searching



A search tree for chess has an average branching
factor of 35
An average chess game lasts for 50 moves per
player (ply)
The average search tree has 35100 nodes!
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 3
Game Problems

The opponent




He wants to win and make our agent lose
We have no control over his actions
He prevents us from reaching the optimal
solution
Introduces uncertainty in the search


We don’t know what moves the opponent
will do
We will assume “perfect play” behaviour
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 4
Types of Games
Perfect
information
Imperfect
information
Deterministic
Chance
Chess
Checkers
Go
Backgammon
Monopoly
Stratego
Battleship
Bridge
Poker
Scrabble
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 5
Game-Playing Strategy



Our agent and the opponent play
sequentially
We assume the opponent plays
perfectly
Our agent cannot get to the optimal
goal


The opponent won’t allow it
Our agent must find the best achievable
goal
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 6
Minimax Algorithm

Payoff (utility) function assigns a value to
each leaf node in the tree


Two players




Value then propagates up to non-leaf nodes
MAX wants to maximise payoff
MIN wants to minimise payoff
MAX is the player currently looking for a move (i.e.
at root of tree)
Payoff function



Simple 1 = win / 0 = draw / -1 = lose
Complex for different victory conditions
Win/lose for MAX
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 7
Minimax Algorithm
X
X
X
O
X
O
X
…
O
X
O X
X
O
X
X
ECE457 Applied Artificial Intelligence
X
O
X
…
R. Khoury (2007)
Page 8
…
Minimax Algorithm
MAX
3
MIN
3
MAX 3
18
1
5
ECE457 Applied Artificial Intelligence
1
-12
15
42
R. Khoury (2007)
56 -12
Page 9
-5
Minimax Algorithm

Game of Nim




Initial state: 7 matches in a pile
Each player must divide a pile into two nonempty unequal piles
Player who can’t do that, loses
Payoff

+1 win, -1 loss
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 10
Minimax Algorithm
MAX
7
-1
6-1
5-2
-1
4-3
-1
5-1-1
4-2-1
-1
+1
4-1-1-1
+1
3-1-1-1-1
3-2-1-1
-1
2-2-1-1-1
-1
3-2-2
3-3-1
+1
MAX
-1
2-2-2-1
MIN
+1 (max wins)
The value of each node is
-1 (max loses)
the value of the best leaf
2-1-1-1-1-1
the current player (MAX
or MIN) can
reach.
ECE457 Applied Artificial Intelligence
R. Khoury
(2007)
+1 (max wins)
+1
MIN
MAX
MIN
Page 11
Minimax Algorithm



Generate entire game tree
Compute payoff of leaf nodes
For each non-leaf node, from the lowest
in the tree to the root



If MAX level, then assign value of the child
with maximum payoff
If MAX level, then assign value of the child
with minimum payoff
At the root, select action with maximum
payoff
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 12
Minimax Algorithm





Complete, if tree is finite
Optimal against a perfect opponent
Time complexity = O(bm)
Space complexity = O(bm)
But remember, b and m can be huge

For chess, b ≈ 35 and m ≈ 100
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 13
Alpha-Beta Pruning

MAX take the max of its children
MIN gives each child the min of its children
max(min(3,18,5),min(1,15,42),min(56,-12,-5))


We don’t need to compute the values of
all the grandchildren!
Only until we find a value lower than the
highest child’s value
max(min(3,18,5),min(1,?,?),min(56,-12,?))

ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 14
Alpha-Beta Pruning

Maintain values  and 




Start with  = - and  = 


 is the maximum value that MAX is assured of at
any point in the search
 is the minimum value that MIN is assured of at
any point in the search
Both computed using payoff propagated through
the tree
As the search goes on, the number of possible
values of  and  decreases
When   

Current path is not the result of best play by both
players, so no need to explore further
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 15
Alpha-Beta Pruning
[, ]
[-,]
3]]
56]
3 [3,
MAX
[-, -12]
]
56]
MIN
3
MAX 3
18
1 [-, ]
1]
[-,3]]
3]
[3,
5
ECE457 Applied Artificial Intelligence
1
X X
R. Khoury (2007)
-12
56 -12
X
Page 16
Alpha-Beta Pruning
Called as “rootvalue = Evaluate(root, -, )”
Evaluate(node, , )

If node is leaf



Return payoff
If node is MAX


v = -
For each child of node





v = max( v, Evaluate(child, , )
Break if v  
 = max(, v)
Return v
If node is MIN


v=
For each child of node




v = min( v, Evaluate(child, , ) )
Break if v  
 = min(, v)
Return v
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 17
Alpha-Beta Pruning

Efficiency dependant on ordering of children



Use heuristics to order the nodes to check



Will check each of MAX’s children until finding one
with a value higher than beta
Will check each of MIN’s children until finding one
with a value lower than alpha
Check the highest-value children first for MAX
Check the lowest-value children first for MIN
Good ordering can reduce time complexity to
O(bd/2)


Random ordering gives roughly O(b3d/4)
Minimax is O(bd)
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 18
Imperfect Play



Real-time or time constraints
Chance
Hidden information
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 19
Real-Time Games

Sometimes we can’t search the entire
tree



Real-time games
Time constraints (playing against a clock)
Tree too big (e.g. chess)
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 20
Real-Time Games

Evaluation function


Estimate value of a non-leaf node in the
tree
Cut off search at a given level
X

X
O

X
<
O
X
Chess: count value of pieces, available
moves, board configurations, …
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 21
Real-Time Minimax Algorithm



Generate entire game tree down to maximum
number of ply
Evaluate lowest nodes
For each non-leaf node, from the lowest in
the tree to the root



If MAX level, then assign value of the child with
maximum payoff
If MAX level, then assign value of the child with
minimum payoff
At the root, select action with maximum
payoff
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 22
Real-Time Alpha-Beta Pruning
Called as “rootvalue = Evaluate(root, -, )”
Evaluate(node, , )

If node is at lowest level



Return evaluation
If node is MAX


v = -
For each child of node





v = max( v, Evaluate(child, , )
Break if v  
 = max(, v)
Return v
If node is MIN


v=
For each child of node




v = min( v, Evaluate(child, , ) )
Break if v  
 = min(, v)
Return v
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 23
Real-Time Games: Problems

Non-quiescent positions


Some board configurations cause value to change
wildly
Solved with quiescence search


Expand non-quiescent boards deeper, until you reach
stable “quiescent” boards
Horizon effect



A “singular” move is considerably better than all
others
But a damaging unavoidable move is (or can be
pushed) just beyond the search depth limit (the
“horizon”)
Solved with singular extension

Expand singular state deeper
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 24
Games of Chance



Minimax requires planning for upcoming
moves
If moves depend on dice rolls, random
draws, etc., planning is impossible
We need to add all possible outcomes in
the tree!
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 25
Recall
3
3
3
18
1
5
1
ECE457 Applied Artificial Intelligence
15
-12
42
56 -12
R. Khoury (2007)
-5
Page 26
Expectiminimax
Then, MIN rolls the dice
MAX has already rolled
the dice and has three
possible moves
4.45
4.45
4.15
0.8
3
0.05
0.15
16
-7
There are three possible
outcomes to the roll
-10.45
0.8
0.05
0.15
1
25
ECE457 Applied Artificial Intelligence
-8
0.8
-12
0.05
0.15
-25
R. Khoury (2007)
And MIN picks an action based
on the roll result
58
Page 27
Problems with Expectiminimax
26.65
4.15
4.45
0.8
3
0.05
0.15
16
-7
26.65
0.8
0.05
0.15
1
25
ECE457 Applied Artificial Intelligence
-8
0.8
-12
R. Khoury (2007)
0.05
0.15
-25 800
Page 28
Problems with Expectiminimax

Time complexity: O(bmnm)



n is the number of possible outcomes of a
chance node
Recall: minimax is O(bm)
Trees can grow very large very quickly


Minimax & pruning limits search to likely
sequences of actions given perfect play
With randomness, there is no likely
sequence of actions
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 29
Imperfect Information


Algorithms so far require knowing everything
about the game
In some games, information about the
opponent is hidden


Cards in poker, pieces in Stratego, etc.
We could approximate hidden information to
random events


The probability that the opponent has a flush, the
probability that a piece is a bomb, etc.
Then use expectiminimax to get best action
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 30
Imperfect Information


1

2
a
List all possible outcomes, then
average best action overall
Can lead to irrational behaviour!
Possible cases:


b

Road 1 leads to money, road 2-a leads
to gold, road 2-b leads to death
(rational action is road 2, then a)
Road 1 leads to money, road 2-a leads
to death, road 2-b leads to gold
(rational action is road 2, then b)
But the real situation is:

Road 1 leads to money, road 2 leads to
gold or death (rational action is road 1)
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 31
Imperfect Information

It’s a useful approximation, but it’s not exact!


Need to handle information



Gather information
Plan based on what information we will have at a
given point in the future
Leads to more rational behaviour




Hidden information not the same as random
events
Acting to gain information
Acting to give information to partners
Acting to conceal information from the opponents
We will learn to do that later in the course
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 32
IBM Deep Blue


First chess computer to defeat a
reigning world champion (Garry
Kasparov) under normal chess
tournament constraints in 1997
Relied on brute hardware search
power


30 processors for the search
480 custom VLSI chess processors
for move generation and ordering,
and leaf node evaluation
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 33
IBM Deep Blue

Searched a minimax tree




Null-window alpha-beta pruning




100-200M states per second, maximum 330M
Average 6 to 16 ply, maximum 40 ply
Decide which moves are worth expanding, giving
priority to singular expansion and chess threats
Alpha-beta pruning but limited to a “window” of
moves rather than the entire tree
Faster and easier to implement on hardware
Approximate, can only returns bounds on the
minimax value
Allows for a highly non-uniform, more
selective and human-like search of the tree
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 34
IBM Deep Blue


Two board evaluation heuristics
Fast evaluation to get a quick approximate
value


Slow evaluation to get an exact value



Considers piece position value
Considers 8,000 features
Includes common chess concepts and specific
Kasparov strategies
Features have programmable weights learned
automatically from 700,000 grandmaster
games and fine-tuned manually by a chess
grandmaster
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 35
Assumptions


Utility-based agent
Environment






Fully observable
Deterministic
Sequential
Static
Discrete / Continuous
Single agent
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 36
Assumptions Updated


Utility-based agent
Environment






Fully observable / Partially observable
(approximation)
Deterministic / Strategic / Stochastic
Sequential
Static / Semi-dynamic
Discrete / Continuous
Single agent / Multi-agent
ECE457 Applied Artificial Intelligence
R. Khoury (2007)
Page 37