Transcript Document

Artificial Intelligence in
Game Design
Lecture 20:
Hill Climbing and N-Grams
Hill Climbing
Simple technique for learning optimal parameter values
• Character AI described in terms of configuration of parameter values
V = (v1, v2, … vn)
– Example: Action probabilities for Oswald
– V = (Pleft, Pright, Pdefend)
Attack Left
45%
Attack Right
30%
Defend
25%
Oswald’s current V = (0.45, 0.30, 0.25)
Hill Climbing
• Each configuration of parameter values V = (v1, v2, … vn) has error
measure E(V )
– Often an estimate based on success of last action(s)
• Example:
Total damage taken by Oswald – Total damage caused by Oswald’s last 3 actions
• Good enough for hill climbing
• Goal of learning: Find V such that E(V ) is minimized
– Or at least “good enough”
Attack Left
35%
Attack Right
25%
Defend
40%
Configuration with low error measure
Hill Climbing
• Hill climbing works best for
– Single parameter
– Correctness measure which is easy to compute
• Example: “cannon game”
– Only parameter: Angle Ө of cannon
– Error measure:
Distance between target and actual landing point
Ө
Error
Error Space
• Graphical representation of relationship between parameter value
and correctness
• Hill climbing = finding “lowest point” in this space
Error
Optimal Ө
Ө
Error = 0
Ө
Maximum correctness
Hill Climbing Algorithm
• Assumption:
If small change in one direction increases correctness
Then will eventually reach optimal value if keep changing in that direction
Error
Direction of decreasing error
Ө1
Ө3
Ө2
Ө2
Ө1
Ө3
Ө
Hill Climbing Algorithm
• Estimate direction of slope in local
area of error space
– Must sample values near E(Ө)
• E(Ө + ε)
• E(Ө - ε)
Ө-ε Ө+ε
• Move in direction of decreasing error
– Increase/decrease Ө by some given
step size δ
– If E(Ө + ε) < E(Ө - ε) then Ө = Ө + δ
– Else Ө = Ө – δ
Ө
Ө+δ
Multidimensional Error Space
• Exploring multiple parameters simultaneously
– Probabilities for Attack Left, Attack Right, Defend
– Ability to control “powder charge” C for cannon as well as angle Ө
• Vary parameters slightly in all dimensions
–
–
–
–
E(Ө + ε, C + ε)
E(Ө + ε, C – ε)
E(Ө – ε, C + ε)
E(Ө – ε, C – ε)
• Choose combination with lowest error
I need to increase
both the angle and
the charge
Ө1 C1
Multidimensional Error Space
•
Can have too many parameters
– n parameters = n dimensional error space
– Will usually “wander” space, never finding good values
• If using learning keep problem simple
– Few parameters (one or two best)
– Make sure parameters have independent effect on error
• Increased charge, angle both increase distance
I could also move up
a hill, or check the
wind direction…
Ө1 C1
Hill Climbing Step Size
• Choosing a good step size δ
– Too small: learning takes too long
This guy is
an idiot!
Ө2
Ө1
– Too large: learning will “jump over” optimal value
Ө2
Ө1
This guy is
an idiot!
Hill Climbing Step Size
• Adaptive Resolution
– Keep track of previous error E (ӨT-1 )
• If E (ӨT ) < E (ӨT-1 ) assume moving in correct direction
– Increase step size to get there faster
• δ=δ+κ
Ө3
Ө1
Ө2
Hill Climbing Step Size
• If E (ӨT ) > E (ӨT-1 ) assume overshot optimal value
– Decrease step size to avoid overshooting on way back
• δ = δ × ρ, ρ < 1
– Idea: decrease step size fast
• Main goal: Make character actions plausible to player
– Should make large changes if miss badly
– Should make small changes if near target
Ө3
Ө2
Ө1
Local Minima in Error Space
• Major assumption:
Error space monotonically decreases as move towards goal
• Other factors may cause error to increase in local areas
Ө2
Ө1
Ө2
Ө1
Appears to be worse than first shot!
Local Minima in Error Space
• Local minima in error space
– Places where apparent error increases as get closer to optimum value
– Simple hill climbing can get stuck
Error
Local minima
Optimal Ө
Ө
Hill climbing will not escape!
Local Minima in Error Space
Solutions:
• Momentum term:
– Current change based on previous changes as well as current error
– Define momentum term α
• Proportion of previous change to current change
• α<1
– Previous change: Δ ӨT-1
– Current change C based on error either δ or –δ
– Current change Δ ӨT = α Δ ӨT-1 + (1 – α) C
Local Minima in Error Space
• “Speeds up” with multiple changes in same direction
• Will continue to go in same direction for several step even if error
indicates change in other direction
• Idea: Momentum will “run through” local minima
Error
Local minima
Optimal Ө
Ө
Momentum
builds down hill
Momentum decreases,
but still escapes local
minimum
Local Minima in Error Space
• May need to restart with different initial value
– Use randomness
– Something very different from last starting point
– Plausible behavior – if current actions not working, try something new
Very different result
Multiple shots
with same result
Memory and Learning
• What if player moves?
– Should not have to restart learning process
– Should keep appearance that character is slowly improving aim
• Should be able to quickly adapt to changes in player strategy
Ө3
Ө2
Ө1
Ө4
Memory and Learning
• Remember previous actions and effects
– Store each angle Ө tried and resulting distance D(Ө)
– If player moves to location L, start from Ө whose D(Ө) is closest to L
Ө3
Ө2
Ө1
D(Ө1)
D(Ө2)
D(Ө3)
D(Ө1)
D(Ө2)
D(Ө3)
Closest to new
player location is Ө2
Memory and Learning
• Best solution may be to cheat
– Use physics formula Ө = f(location)
– Incorporate “error” term E: Ө = f(location ± E)
– Decrease error each term: E = E – ΔE
• Error will still decrease even as player moves
– May want to increase error if player moves: E = E + movePenalty
• Looks more realistic
• Encourages player to keep moving
Appearance of Learning
• Requires accuracy in error measure
– Otherwise, how do we know what direction to change parameters?
• Not always plausible to have
– How do we know changing Oswald’s probabilities will improve
performance?
• Goal sometimes not optimal performance – just appearance of
adaptation
Attack Left
40%
Attack Right
60%
Successful
attack from right
Attack Left
35%
Attack Right
65%
No guarantee this is best in long run,
but at least Oswald looks like he is
learning!
Predicting Player Actions
Type of games where works best:
• Player has choice between few possible actions
– Attack left
– Attack right
• Character can take simultaneous counteraction
– Each player action has “correct” counteraction
• Attack left  defend left
• Attack right  defend right
• Goal:
Character should learn to “anticipate” current player action
based on past actions
Probabilistic Approach
• Keep track of last n actions taken by player
– Size of n is “window” of character memory
• Compute probability of next player action based on these
– Base decision about counteraction on that probability
• Example: Last 10 player actions
LLRLRLLLRL
• Estimated probabilities of next action:
Left attack
70%
Right attack
30%
Probabilistic Actions
• Simple majority approach
– Since left attack has highest probability, defend left
– Problem: Character will take same action for long periods
• Too predictable!
– Example:
• Player’s next 4 attacks are from right
• Character will still choose defend left since left attacks still have highest
probability
Left attack still
• Character looks very stupid!
majority player action
in last 10 moves
player: L L R L R L L L R L R R R R
character:
LL LL
Probabilistic Actions
• Probabilistic Approach:
Choose action with same probability as corresponding player action
Left attack
70%
Left defend
70%
Right attack
30%
Right defend
30%
• Biased towards recent player actions
• Far less predictable
– Player may notice that character is changing tactics without being aware of how
this is done
Window Size
• Key question: What is good value for n?
LLRLRLLLRLRRL
• Can be too small (n = 2, for example)
LLRLRLLLRLRRL
• Can be too large (n = 20, for example)
LLLLLLLLLLLLRRRRRRRR
• No best solution
– Will need to experiment
Size of “window” used to
determine probabilities
Character has no “memory” of
past player actions
Left defend
50%
Right defend
50%
Too slow to react to changes in
player tactics
Left defend
60%
Right defend
40%
N-Grams
• Conditional probabilities based on sequence of user actions
• Example:
– “Last two player attacks were left and then right”
– “What has player done next after last times the attacked left the right?”
• After last 4 left then right attacks:
– Attacked right 3 times
– Attacked left 1 time
• Conclusion: Player has 75% chance of attacking right next
N-Grams Example
• Example:
• Window of memory = last 12 actions
• Base decision on last two actions taken by player (past sequence length = 2)
• Goal: Determine what action player is likely to take next
given last two actions L R
– Previous actions:
LLRLRRLLRLLR?
– Previous cases of L R:
• Followed by L once
• Followed by R twice
Left attack
33%
Right attack
67%
N-Grams and Sequence Size
• Number of statistics to keep grow exponentially with length of past
sequences
– Number of possible actions = a
– Past sequences length = L
– Number of possible configurations of past sequences = aL
• L must be small (no more than 2 or 3)
– Too many statistics to keep track of
– Very few instances of each case
• How many cases of LLRLLRR will there be?
• All statistics will be unreliable
Storing N-Grams
• Algorithm:
– Keep statistics for all possible action strings of length L
• Based on “window” of past actions
• Example: L L L L R R L L R L L R
Previous
action string
Instances of
next action
Probability
Probability
next action is L next action is L
LL
LLRRR
40%
60%
LR
LR
50%
50%
RL
LL
100%
0%
RR
L
100%
0%
N-Grams and Updating
• When player takes new action
– Add instance for that action and update statistics
– Remove oldest action from list and update statistics
• Move “window”
•
Example: L L L L R R L L R L L R L
New action
Now outside “window”
Previous
action string
Instances of
next action
Probability
Probability
next action is L next action is L
LL
LLRRR
25%
75%
LR
LRL
67%
33%
RL
LL
100%
0%
RR
L
100%
0%