Transcript document

Artificial Intelligence in
Game Design
N-Grams and Decision Tree Learning
Predicting Player Actions
Type of games where works best:
• Player has choice between few possible actions
– Attack left
– Attack right
• Character can take simultaneous counteraction
– Each player action has “correct” counteraction
• Attack left  defend left
• Attack right  defend right
• Goal: Character should learn to “anticipate” current
player action based on past actions
Probabilistic Approach
• Keep track of last n actions taken by player
– Size of n is “window” of character memory
• Compute probability of next player action based on these
– Base decision about counteraction on that probability
• Example: Last 10 player actions
LLRLRLLLRL
• Estimated probabilities of next action:
Left attack
Right attack
70%
30%
Probabilistic Actions
• Simple majority approach
– Since left attack has highest probability, defend left
– Problem: Character will take same action for long periods
• Too predictable!
– Example:
• Player’s next 4 attacks are from right
• Character will still choose defend left since left attacks still have
highest probability
• Character looks very stupid!
Left attack still
player: L L R L R L L L R L R R R R
character:
LL LL
majority player action
in last 10 moves
Probabilistic Actions
• Probabilistic Approach:
Choose action with same probability as corresponding
player action
Left attack
70%
Left defend
70%
Right attack
30%
Right defend
30%
• Biased towards recent player actions
• Far less predictable
– Player may notice that character is changing tactics without being aware
of how this is done
Window Size
• Key question: What is good value for n?
LLRLRLLLRLRRL
• Can be too small (n = 2, for example)
LLRLRLLLRLRRL
• Can be too large (n = 20, for example)
LLLLLLLLLLLLRRRRRRRR
• No best solution
– Will need to experiment
Size of “window” used to
determine probabilities
Character has no “memory” of
past player actions
Left defend
50%
Right defend
50%
Too slow to react to changes in
player tactics
Left defend
60%
Right defend
40%
N-Grams
• Conditional probabilities based on sequence of user actions
• Example:
– “Last two player attacks were left and then right”
– “What has player done next after last times the attacked left then
right?”
• After last 4 left then right attacks:
– Attacked right 3 times
– Attacked left 1 time
• Conclusion: Player has 75% chance of attacking right next
N-Grams Example
• Example:
– Window of memory = last 12 actions
– Base decision on last two actions taken by player (past
sequence length = 2)
• Goal: Determine what action player is likely to take next
given last two actions L R
– Previous actions:
LLRLRRLLRLLR?
– Previous cases of L R:
• Followed by L twice
• Followed by R once
Left attack 67%
Right attack 33%
N-Grams and Sequence Size
• Number of statistics to keep grow exponentially with
length of past sequences
– Number of possible actions = a
– Past sequences length = L
– Number of possible configurations of past sequences = aL
• L must be small (no more than 2 or 3)
Storing N-Grams
• Algorithm:
– Keep statistics for all possible action strings of length L
• Based on “window” of past actions
• Example: L L L L R R L L R L L R
Previous
action string
Instances of
next action
Probability
Probability next
next action is L action is R
LL
LLRRR
40%
60%
LR
LR
50%
50%
RL
LL
100%
0%
RR
L
100%
0%
N-Grams and Updating
• When player takes new action
– Add instance for that action and update statistics
– Remove oldest action from list and update statistics
• Move “window”
•
New action
Example: L L L L R R L L R L L R L
Now outside “window”
Previous
action string
Instances of
next action
Probability
Probability next
next action is L action is R
LL
LLRRR
25%
75%
LR
LRL
67%
33%
RL
LL
100%
0%
RR
L
100%
0%
Decision Trees
Simple tree traversal
• Node = question
• Branch to follow =
answer
• Leaf = final action
to take
Can create
dynamically from a
set of examples
Hit points < 5?
no
yes
Obstacle between
myself and player?
yes
hide
no
run
Within one unit
of player?
yes
no
Path to
player
clear?
attack
yes
run towards
player
no
run
sideways
Decision Tree Learning
• Based on a set of training examples
– Attribute values about current situation
• From world or from character state
– Target action character should take on basis of those
attribute values
• Example: Estimating driving time
–
–
–
–
Hour of departure: 8, 9, or 10
Weather: sunny, cloudy or rainy
Accident on route: yes or no
Stalled car on route: yes or no
– Commute time: short, medium, or long
attributes
desired action
Training Examples
Example
Hour
Weather
Accident
Stall
Commute
1
8
sunny
no
no
long
2
8
cloudy
no
yes
long
3
10
sunny
no
no
short
4
9
rainy
yes
no
long
5
9
sunny
yes
yes
long
6
10
sunny
no
no
short
7
10
cloudy
no
no
short
8
9
rainy
no
no
medium
9
9
sunny
yes
no
long
10
10
cloudy
yes
yes
long
11
10
rainy
no
no
short
12
8
cloudy
yes
no
long
13
9
sunny
no
no
medium
Decision Tree Building
node BuildTree(examples[ ] E) {
if (all examples in E have same target action T) {
return new leaf with action T
}
else {
choose “best” attribute A to create branches for examples E
set question at this node to A
for (all possible values V for attribute A) {
create branch with value V
EV = all examples in E that have value V for attribute A
attach BuildTree(EV ) to that branch
}
}
}
Decision Tree Building
• Start tree by using BuildTree(all examples) to create root
• Example: Suppose Hour were chosen at root:
Examples 1 – 13
4 short, 2 medium, 7 long
Hour
8
Examples 3, 6, 7, 10, 11
4 short, 0 medium, 1 long
another
question
9
10
Examples 1, 2, 12
0 short, 0 medium, 3 long
Long
Examples 4, 5, 8, 9, 13
0 short, 2 medium, 3 long
another
question
ID3 Decision Tree Learning
• Choose attribute with highest information gain
• Based on definition of entropy from information theory
• Entropy over examples E with target actions T:
T
Entropy(E) = -
| ET | log2 | ET |
|E|
|E|
• Example: entropy for training set =
-(4/13) log2(4/13) -(2/13) log2(2/13) -(7/13) log2(7/13) = 1.42
4 short examples
2 medium examples
7 medium examples
ID3 Decision Tree Learning
• Expected entropy remaining after attribute A used =
– Sum of entropies of child nodes down branches V
– Weighted by percentage of examples down that branch
V || EE || Entropy (E )
V
V
• Information gain for attribute A = that value
subtracted from original entropy
Attribute
Entropy
Information Gain
hour
weather
accident
stall
0.65
1.29
0.92
1.17
0.77
0.13
0.50
0.25
Best attribute at root
Using ID3 in Games
• ID3 efficient enough for on-line learning
– Execution time proportional to number of examples
• Best applied to games if:
– Few possible actions for NPC to choose from
(attack, retreat, defend)
– Key attributes have few possible values
(or continuous values partitioned into predefined ranges)
– Have easy way to determine desired actions NPC
should take based on significant number of player
actions
• May require tweaking rules of game!
Black and White Game
• Player given “creature” at
beginning of game
• Creature “trained” by player by
observing player actions in different
situations
– What other armies to attack in what
circumstances
• Later in game (after creature
grows), creature takes those same
actions
Black and White Game
• Sample training set:
Example
Allegiance Defense
Tribe
Attack
1
friendly
weak
Celtic
no
2
enemy
weak
Celtic
yes
3
friendly
strong
Norse
no
4
enemy
strong
Norse
no
5
friendly
weak
Greek
no
6
enemy
medium
Greek
yes
7
enemy
strong
Greek
no
8
enemy
medium
Aztec
yes
9
friendly
weak
Aztec
no
Black and White Game
• Tree created from examples:
Allegiance
friendly
enemy
No attack
Defense
weak
strong
medium
Attack
Attack
No attack
Black and White Game
• Nature of game suited to decision tree learning:
– Large number of examples created by player actions
– Known target actions based on player actions
– Small number of possible actions, attribute values
• Game specifically designed around algorithm!