Transcript ppt
Game Playing
ECE457 Applied Artificial Intelligence
Spring 2008
Lecture #5
Outline
Types of games
Playing a perfect game
Playing an imperfect game
Minimax search
Alpha-beta pruning
Real-time
Imperfect information
Chance
Russell & Norvig, chapter 6
Project #2
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 2
Game Problems
Games are well-defined search problems…
Well-defined board configurations (states)
Limited set of well-defined moves (actions)
Well-defined victory conditions (goal)
Values assigned to pieces, moves, outcomes (cost)
…that are hard to solve by searching
A search tree for chess has an average branching
factor of 35
An average chess game lasts for 50 moves per
player (ply)
The average search tree has 35100 nodes!
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 3
Game Problems
The opponent
He wants to win and make our agent lose
We have no control over his actions
He prevents us from reaching the optimal
solution
Introduces uncertainty in the search
We don’t know what moves the opponent
will do
We will assume “perfect play” behaviour
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 4
Types of Games
Deterministic
Chance
Chess
Checkers
Go
Backgammon
Monopoly
Stratego
Battleship
Bridge
Poker
Scrabble
Perfect
information
Imperfect
information
Zero-sum games: a player’s gains are exactly
substracted from another player’s score (chess)
Non-zero-sum games: players can gain or lose
without an exact change on others (prisoners’ dilemma)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 5
Game-Playing Strategy
Our agent and the opponent play
sequentially
We assume the opponent plays
perfectly
Our agent cannot get to the optimal
goal
The opponent won’t allow it
Our agent must find the best achievable
goal
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 6
Minimax Algorithm
Payoff (utility) function assigns a value to
each leaf node in the tree
Two players
Value then propagates up to non-leaf nodes
MAX wants to maximise payoff
MIN wants to minimise payoff
MAX is the player currently looking for a move (i.e.
at root of tree)
Payoff function
Simple 1 = win / 0 = draw / -1 = lose
Complex for different victory conditions
Win/lose for MAX
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 7
Minimax Algorithm
X
X
X
O
X
O
X
…
O
X
O X
X
O
X
X
ECE457 Applied Artificial Intelligence
X
O
X
…
R. Khoury (2008)
Page 8
…
Minimax Algorithm
MAX
3
MIN
3
MAX 3
18
1
5
ECE457 Applied Artificial Intelligence
1
-12
15
42
R. Khoury (2008)
56 -12
Page 9
-5
Minimax Algorithm
Game of Nim
Initial state: 7 matches in a pile
Each player must divide a pile into two nonempty unequal piles
Player who can’t do that, loses
Payoff
+1 win, -1 loss
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 10
Minimax Algorithm
MAX
7
-1
6-1
5-2
-1
4-3
-1
5-1-1
4-2-1
-1
+1
4-1-1-1
+1
3-1-1-1-1
3-2-1-1
-1
2-2-1-1-1
-1
3-2-2
3-3-1
+1
MAX
-1
2-2-2-1
MIN
+1 (max wins)
The value of each node is
-1 (max loses)
the value of the best leaf
2-1-1-1-1-1
the current player (MAX
or MIN) can
reach.
ECE457 Applied Artificial Intelligence
R. Khoury
(2008)
+1 (max wins)
+1
MIN
MAX
MIN
Page 11
Minimax Algorithm
Generate entire game tree
Compute payoff of leaf nodes
For each non-leaf node, from the lowest
in the tree to the root
If MAX level, then assign value of the child
with maximum payoff
If MIN level, then assign value of the child
with minimum payoff
At the root, select action with maximum
payoff
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 12
Minimax Algorithm
Complete, if tree is finite
Optimal against a perfect opponent
Time complexity = O(bm)
Space complexity = O(bm)
But remember, b and m can be huge
For chess, b ≈ 35 and m ≈ 100
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 13
Alpha-Beta Pruning
MAX take the max of its children
MIN gives each child the min of its children
max(min(3,18,5),min(1,15,42),min(56,-12,-5))
We don’t need to compute the values of
all the grandchildren!
Only until we find a value lower than the
highest child’s value
max(min(3,18,5),min(1,?,?),min(56,-12,?))
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 14
Alpha-Beta Pruning
Maintain values and
Start with = - and =
is the maximum value that MAX is assured of at
any point in the search
is the minimum value that MIN is assured of at
any point in the search
Both computed using payoff propagated through
the tree
As the search goes on, the number of possible
values of and decreases
When
Current path is not the result of best play by both
players, so no need to explore further
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 15
Alpha-Beta Pruning
1. [-, ]
4. [3, ]
MAX
3
MIN
3
5. [3, ]
1 6. [3, 1]
MAX 3
18
2. [-, ]
3. [-, 3]
5
ECE457 Applied Artificial Intelligence
1
[, ]
7. [3, ]
8. [3, 56]
9. [3, -12]
-12
56 -12
R. Khoury (2008)
Page 16
Alpha-Beta Pruning
Called as “rootvalue = Evaluate(root, -, )”
Evaluate(node, , )
If node is leaf
Return payoff
If node is MAX
v = -
For each child of node
v = max( v, Evaluate(child, , )
Break if v
= max(, v)
Return v
If node is MIN
v=
For each child of node
v = min( v, Evaluate(child, , ) )
Break if v
= min(, v)
Return v
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 17
Alpha-Beta Pruning
Efficiency dependant on ordering of children
Use heuristics to order the nodes to check
Will check each of MAX’s children until finding one
with a value higher than beta
Will check each of MIN’s children until finding one
with a value lower than alpha
Check the highest-value children first for MAX
Check the lowest-value children first for MIN
Good ordering can reduce time complexity to
O(bm/2)
Random ordering gives roughly O(b3m/4)
Minimax is O(bm)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 18
Minimax Exercise
A
B
5
C
5
D
E
6
5
8
F
J
G
K 0
8
N
0
ECE457 Applied Artificial Intelligence
O
9
L
M
9
1
2
H
I
4
2
17
R. Khoury (2008)
Page 19
Pruning Exercise
1.[-, ]
5.[5, ]
B
A
2.[-, ]
3.[-, 6]
4.[-, 5]
D
E
6
5
C
7.[-, ]
8.[8, ]
F
J
K 9.[8, ]
10.[8, 0]
8
N
0
ECE457 Applied Artificial Intelligence
O
G
6.[5, ]
11.[5, 8]
14.[5, 4]
12.[-, 8]
13.[9, 8]
L
M
9
14
H
I
4
2
-4
R. Khoury (2008)
Page 20
Imperfect Play
Real-time or time constraints
Chance
Hidden information
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 21
Real-Time Games
Sometimes we can’t search the entire
tree
Real-time games
Time constraints (playing against a clock)
Tree too big (e.g. chess)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 22
Real-Time Games
Evaluation function
Estimate value of a non-leaf node in the
tree
Cut off search at a given level
X
X
O
X
<
O
X
Chess: count value of pieces, available
moves, board configurations, …
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 23
Real-Time Minimax Algorithm
Generate entire game tree down to maximum
number of ply
Evaluate lowest nodes
For each non-leaf node, from the lowest in
the tree to the root
If MAX level, then assign value of the child with
maximum payoff
If MIN level, then assign value of the child with
minimum payoff
At the root, select action with maximum
payoff
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 24
Real-Time Alpha-Beta Pruning
Called as “rootvalue = Evaluate(root, -, )”
Evaluate(node, , )
If node is at lowest level
Return evaluation
If node is MAX
v = -
For each child of node
v = max( v, Evaluate(child, , )
Break if v
= max(, v)
Return v
If node is MIN
v=
For each child of node
v = min( v, Evaluate(child, , ) )
Break if v
= min(, v)
Return v
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 25
Real-Time Games: Problems
Non-quiescent positions
Some state
configurations cause
value to change wildly
Solved with quiescence
search
ECE457 Applied Artificial Intelligence
Expand non-quiescent
boards deeper, until you
reach stable “quiescent”
boards
R. Khoury (2008)
Page 26
Real-Time Games: Problems
Horizon effect
A “singular” move is
considerably better than all
others
But a damaging unavoidable
move is (or can be pushed)
just beyond the search depth
limit (the “horizon”)
Solved with singular
extension
ECE457 Applied Artificial Intelligence
Expand singular state deeper
R. Khoury (2008)
Page 27
Games of Chance
Minimax requires planning for upcoming
moves
If moves depend on dice rolls, random
draws, etc., planning is impossible
We need to add all possible outcomes in
the tree!
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 28
Recall
3
3
3
18
1
5
1
ECE457 Applied Artificial Intelligence
15
-12
42
56 -12
R. Khoury (2008)
-5
Page 29
Expectiminimax
Then, MIN rolls the dice
MAX has already rolled
the dice and has three
possible moves
4.45
4.45
4.15
0.8
3
0.05
0.15
16
-7
There are three possible
outcomes to the roll
-10.45
0.8
0.05
0.15
1
25
ECE457 Applied Artificial Intelligence
-8
0.8
-12
0.05
0.15
-25
R. Khoury (2008)
And MIN picks an action based
on the roll result
58
Page 30
Expectiminimax
0.8
0.15
0.05
4.45
4.45
0.8
4.15
0.8 0.15 0.05
0.05
1 25 -8
0.15
3
3
-10.45
0.8 0.15 0.05
-12 -25 58
16
-7
7
12
16
ECE457 Applied Artificial Intelligence
22
-7
R. Khoury (2008)
-3
4
17
Page 31
Problems with Expectiminimax
26.65
4.15
4.45
0.8
3
0.05
0.15
16
-7
26.65
0.8
0.05
0.15
1
25
ECE457 Applied Artificial Intelligence
-8
0.8
-12
R. Khoury (2008)
0.05
0.15
-25 800
Page 32
Problems with Expectiminimax
Time complexity: O(bmnm)
n is the number of possible outcomes of a
chance node
Recall: minimax is O(bm)
Trees can grow very large very quickly
Minimax & pruning limits search to likely
sequences of actions given perfect play
With randomness, there is no likely
sequence of actions
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 33
Imperfect Information
Algorithms so far require knowing everything
about the game
In some games, information about the
opponent is hidden
Cards in poker, pieces in Stratego, etc.
We could approximate hidden information to
random events
The probability that the opponent has a flush, the
probability that a piece is a bomb, etc.
Then use expectiminimax to get best action
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 34
Imperfect Information
1
2
a
List all possible outcomes, then
average best action overall
Can lead to irrational behaviour!
Possible cases:
b
Road 1 leads to money, road 2-a leads
to gold, road 2-b leads to death
(rational action is road 2, then a)
Road 1 leads to money, road 2-a leads
to death, road 2-b leads to gold
(rational action is road 2, then b)
But the real situation is:
Road 1 leads to money, road 2 leads to
gold or death (rational action is road 1)
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 35
Imperfect Information
It’s a useful approximation, but it’s not
exact!
Advantages:
Works in many cases
Doesn’t require new techniques to handle
information discovery
Disadvantages:
In reality, hidden information is not the
same as random events
Can lead to irrational behaviour
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 36
Imperfect Information
Need to handle information
Leads to more rational behaviour
Gather information
Plan based on what information we will have at a
given point in the future
Acting to gain information
Acting to give information to partners
Acting to conceal information from the opponents
We will learn to do that later in the course
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 37
IBM Deep Blue
First chess computer to defeat a
reigning world champion (Garry
Kasparov) under normal chess
tournament constraints in 1997
Relied on brute hardware search
power
30 processors for the search
480 custom VLSI chess processors
for move generation and ordering,
and leaf node evaluation
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 38
IBM Deep Blue
Searched a minimax tree
Null-window alpha-beta pruning
100-200M states per second, maximum 330M
Average 6 to 16 ply, maximum 40 ply
Decide which moves are worth expanding, giving
priority to singular expansion and chess threats
Alpha-beta pruning but limited to a “window” of
moves rather than the entire tree
Faster and easier to implement on hardware
Approximate, can only returns bounds on the
minimax value
Allows for a highly non-uniform, more
selective and human-like search of the tree
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 39
IBM Deep Blue
Two board evaluation heuristics
Fast evaluation to get a quick approximate
value
Slow evaluation to get an exact value
Considers piece position value
Considers 8,000 features
Includes common chess concepts and specific
Kasparov strategies
Features have programmable weights learned
automatically from 700,000 grandmaster
games and fine-tuned manually by a chess
grandmaster
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 40
Assumptions
Utility-based agent
Environment
Fully observable
Deterministic
Sequential
Static
Discrete / Continuous
Single agent
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 41
Assumptions Updated
Utility-based agent
Environment
Fully observable / Partially observable
(approximation)
Deterministic / Strategic / Stochastic
Sequential
Static / Semi-dynamic
Discrete / Continuous
Single agent / Multi-agent
ECE457 Applied Artificial Intelligence
R. Khoury (2008)
Page 42