Transcript Powerpoint

Artificial Intelligence in
Game Design
Introduction to Learning
Learning and Games
• Learning in AI:
– Creating new rules automatically
• Observation of world
• Examples of good/bad actions to take
– Major goal of AI
• Would be very useful in gaming
– Automatic adaptation to player tactics
– Infinite replayability
• Would be impossible for player to create
strategy that would win forever
You can defeat me now,
but I shall return smarter
than ever!
Learning and Games
• Fairer and more plausible NPC behavior
• Characters should have same learning curve as players
– Start out inexperienced
– Become more competent over time
• Example: Simple “cannon fire game”
– Could use physics to compute exact angle, but would win first turn!
– Character should miss badly at first
– Should “learn” to get closer over time
Learning in AI
• Basic components:
Critic
Inputs from
environment
Current
Rules
Learning Element
Determines how to
change rules in order to
decrease error
Actions
indicated
by rules
Determines
how good or
bad action was
Often in terms
of some error
Learning in AI
• Learning algorithms in AI
– Neural networks
– Probabilistic learning
– Genetic learning algorithms
• Common attributes
– Requires time
• Usually thousands of cycles
– Results are unpredictable
• Will create any rules that decrease error,
not necessarily the ones that make the
most sense in a game
• Still very limited
– No algorithm to automatically generate
something as complex as a FSM
Not what you
want to happen in
a game!
“Create an
opponent that
can defeat Data”
Online Learning
• Learning most useful if occurs during game
– Must be as efficient as possible
• Simple methods best
– Hill climbing
– N-Gram prediction
– Decision tree learning
• Most successful methods often specific to game
– Example: Negative influences at character destruction locations
Other
units steer
around
this area
-1
-1
-1
-1
-1
-1
-2
-2
-2
-1
-1
-2
-1
-2
-1
-1
-2
-1
-2
-2
-1
-1
-1
-1
Unknown
enemy unit
Our unit
destroyed
Scripted Learning
• Can “fake” appearance of learning
– Player performs action
– Game AI knows best counteraction, but does not perform it
– Game AI allows player certain number of that action before beginning to
perform counteraction
• Like timeout
• Number could be chosen at random
– Gives appearance that character has “learned” to perform counteraction
Player allowed to attack from right
for certain number of turns
AI begins to defend from right after
that point
Scripted Learning
• Scripting in cannon game:
– Compute actual best trajectory using physics
– Add error factor E to computation
– Decrease error E over time at rate Δ E
• Test different values of Δ E to make sure learns at “same rate” as typical player
• Can also different values of Δ E to set “difficulty” level
Correct trajectory
Small E
Large E
Hill Climbing
Simple technique for learning optimal parameter values
• Character AI described in terms of configuration of parameter values
V = (v1, v2, … vn)
– Example: Action probabilities for Oswald
– V = (Pleft, Pright, Pdefend)
Attack Left
45%
Attack Right
30%
Defend
25%
Oswald’s current V = (0.45, 0.30, 0.25)
Hill Climbing
• Each configuration of parameter values V = (v1, v2, … vn) has error
measure E(V )
– Often an estimate based on success of last action(s)
• Example:
Total damage taken by Oswald – Total damage caused by Oswald’s last 3 actions
• Good enough for hill climbing
• Goal of learning: Find V such that E(V ) is minimized
– Or at least “good enough”
Attack Left
35%
Attack Right
25%
Defend
40%
Configuration with low error measure
Hill Climbing
• Hill climbing works best for
– Single parameter
– Correctness measure which is easy to compute
• Example: “cannon game”
– Only parameter: Angle Ө of cannon
– Error measure:
Distance between target and actual landing point
Ө
Error
Error Space
• Graphical representation of relationship between parameter value
and correctness
• Hill climbing = finding “lowest point” in this space
Error
Optimal Ө
Ө
Error = 0
Ө
Maximum correctness
Hill Climbing Algorithm
• Assumption:
– Small change in one direction increases correctness
– Will eventually reach optimal value if keep changing in that direction
Error
Direction of decreasing error
Ө1
Ө3
Ө2
Ө2
Ө1
Ө3
Ө
Hill Climbing Algorithm
• Estimate direction of slope in local
area of error space
– Must sample values near E(Ө)
• E(Ө + ε)
• E(Ө - ε)
Ө-ε Ө+ε
• Move in direction of decreasing error
– Increase/decrease Ө by some given
step size δ
– If E(Ө + ε) < E(Ө - ε) then Ө = Ө + δ
– Else Ө = Ө – δ
Ө
Ө+δ
Multidimensional Error Space
• Exploring multiple parameters simultaneously
– Probabilities for Attack Left, Attack Right, Defend
– Ability to control “powder charge” C for cannon as well as angle Ө
• Vary parameters slightly in all dimensions
– E(Ө + ε, C + ε)
– E(Ө – ε, C + ε)
E(Ө + ε, C – ε)
E(Ө – ε, C – ε)
• Choose combination with lowest error
I need to increase
both the angle and
the charge
Ө1 C1
Multidimensional Error Space
•
Can have too many parameters
– n parameters = n dimensional error space
– Will usually “wander” space, never finding good values
• If using learning keep problem simple
– Few parameters (one or two best)
– Make sure parameters have independent effect on error
• Increased charge, angle both increase distance
I could also move up
a hill, or check the
wind direction…
Ө1 C1
Hill Climbing Step Size
• Choosing a good step size δ
– Too small: learning takes too long
This guy is
an idiot!
Ө2
Ө1
– Too large: learning will “jump over” optimal value
Ө2
Ө1
This guy is
an idiot!
Hill Climbing Step Size
• Adaptive Resolution
– Keep track of previous error E (ӨT-1 )
• If E (ӨT ) < E (ӨT-1 ) assume moving in correct direction
– Increase step size to get there faster
• δ=δ+κ
Ө3
Ө1
Ө2
Hill Climbing Step Size
• If E (ӨT ) > E (ӨT-1 ) assume overshot optimal value
– Decrease step size to avoid overshooting on way back
• δ = δ × ρ, ρ < 1
– Idea: decrease step size fast
• Main goal: Make character actions plausible to player
– Should make large changes if miss badly
– Should make small changes if near target
Ө3
Ө2
Ө1
Local Minima in Error Space
• Major assumption:
Error space monotonically decreases as move
towards goal
Multiple shots with same
result – no decrease in error
Local Minima in Error Space
• Local minima in error space
– Places where apparent error does not decrease as get closer to
optimum value
– Simple hill climbing can get stuck
Error
Local minima
Optimal Ө
Ө
Hill climbing will not escape!
Local Minima in Error Space
• May need to restart with different initial value
– Use randomness
– Something very different from last starting point
– Plausible behavior – if current actions not working, try something new
Very different result
Multiple shots
with same result
Memory and Learning
• What if player moves?
– Should not have to restart learning process
– Should keep appearance that character is slowly improving aim
• Should quickly adapt to changes in player strategy
Ө3
Ө2
Ө1
Ө4
Memory and Learning
• Remember previous actions and effects
– Store each angle Ө tried and resulting distance D(Ө)
– If player moves to location L, start from Ө whose D(Ө) is closest to L
Ө3
Ө2
Ө1
D(Ө1)
D(Ө2)
D(Ө3)
D(Ө1)
D(Ө2)
D(Ө3)
Closest to new
player location is Ө2