Transcript LearningKit

Pattern Recognition
& Limits of Machine
Learning
Game Programming
Kit Ming Chan
About Our Authors
• Section 10.3
– Timo Kaukoranta, Jouni Smed, Harri Hakonen
– Department of I.T., University of Turku, Finland
– Smed: computer games, optimization, and scheduling algorithms,
doctoral degree from university of Turku <http://staff.cs.utu.fi/staff/jouni.smed/>
– Hakonen: Algorithms for Computer Games, String Algorithmic,
Software Construction and Object Integrity <
http://staff.cs.utu.fi/staff/harri.hakonen/>
• Section 10.5
– Neil Kirby
• Lucent Tech, Bell Labs, [email protected], MS in Comp Sci, Ohio
State University.
• currently develops .NET solutions.
• built speech recognition software and teaching at the university
level.
• In his spare time he designs multi-player, tactical combat
computer games. He especially enjoys writing programs for
computer opponents that play well without cheating.
• 10.5 Getting around the Limits of Machine
Learning
Categories of Limitations
• Four Questions
•
•
•
•
Is it cheap to see (recognize) the thing to learn from?
Is it cheap to store the knowledge?
Is it cheap to use the knowledge?
Does learning make the game better?
Divisions of Problem Space
• Implicit vs. explicit learning
– Implicit means the game is self motivated to
learn
– Explicit means the game is told to learn
• Learn in the field vs. in development
– Flexibility vs. risky quality assurance
Examples of Learning
• Simple question:
– Implicit or explicit learning? Learning in field or development?
– Is it cheap to see/store/use knowledge and does it improve the game?
– How to deal with quality assurance?
• Black & White
– Remember in the last class when the human player teaches his
monster to throw stone at the village?
1.
Explicit and In field learning
2.
Cheap to see ( just copy the player), Cheap to store (proven by
example), Cheap to use (game runs well on target machines)
3.
Improve the game (sure! Increase enjoyment of replaying)
4.
To avoid quality assurance problem, creatures are programmed with
innate behaviors that constrain learning
Example of Learning
• Command & Conquer Renegade
– A feature that is turned off in the shipped version is a
code that copies player movement and add new paths to
the pathfinder.
1.
Implicit and in field learning
2.
Cheap to see ( computationally easy)
3.
Cheap to store (free! Pathfinder already stores this type of info)
4.
Cheap to use (certainly! Just add some more paths to pathfinder)
5.
Improve the game (makes AI more intelligent)
6.
No quality assurance concern because it makes use of original, safe
component (pathfinder) of the game
Example of Learning
Re-volt (racing game)
–
A genetic algorithm is used during
development to tune parameters and
reduce lap times
–
Implicit & in development learning
–
Cheap to see
• ( during game has no cost, but
during development it is costly to run
GA)
–
Cheap to store
• ( free! Just change the parameters)
–
Cheap to use
• (during game has no cost, but during
development it takes a long time)
–
Improve the game
• (makes AI more intelligent)
–
No quality assurance concern
Limits to Learning?
• What are some limits you can think of now?
Most limitations fall into these categories
1. Knowledge representation
2. Recognizing the knowledge Is the knowledge cheap to see?
3. Storing the knowledge
 Is the knowledge cheap to store?
4. Using the knowledge
Is the knowledge cheap to use?
The last 3 are simply the first 3 questions we have been
talking about
Knowledge Representation
• Consider music coding
– One channel of CD-quality music
• requires 88,200 bytes / sec regardless of content
– MIDI
• a few events per note,
• most music is roughly a few notes per second.
– Printed sheet music
• one event per note
Limit 1: the higher the level of representation, the
easier the knowledge to work with, but the more
expensive to acquire
Seeing More Clearly
• Limit 2: AI must store information to deal with the
past
• The present
– Easier to learn than past
• B&W ties feedback to behavior temporally
• C&C Renegade reduces mouse-and-keyboard input into paths,
and pathfinder stores new area connectivity
• Re-Volt uses the lap timer to differentiate good parameters from
suboptimal ones.
• The past
– Racing game, Backgammon are not history dependent
– For history dependent games, it is much more difficult to
understand causal chain in history
Seeing More Clearly
• Limit 3: Instance based methods learn
wrong things from data
– Neural nets, Markov models
– Need good training data that resembles testing
– Cannot differentiate similar (but different) data
even if trained with it
Seeing More Clearly
• Examples of cases that are hard to learn in instancedbased learning
1. Neural Networks to train tank recognition
2. Pronunciation of One, Two, and Three in German
http://www.bbc.co.uk/languages/german/lj/language_notes/1_10.shtml
Storing new Knowledge
• Limit 4: AI cannot do sophisticated algorithm
on the fly (thus we store knowledge)
• Avoid over-fitting (learning the wrong thing)
– Neural network, Markov
– Improvement: learn from just the new datum
– Overfitting because of small new dataset!
– Neural network can compensate by temporal
differences
Using new Knowledge
• Limit 5: If games are not designed to use
new knowledge, new storage and capability
will need to be added
• Games should design to exploit new
knowledge in an integrated fashion
– C&C again, where new data is integrated into
pathfinder
– B&W, new data determines behavior of monster
Conclusion
• Machine Learning present in
– In field (player may or may not be aware)
– In development
• Knowledge can be learned
– Explicitly or implicitly
• Limits of Machine Learning
– The cost to see / recognize knowledge
– The cost to store
– The cost to use
• Does it make the game better
Limits in Use of Pattern
Recognition
• Limits of Machine Learning
– The cost to see / recognize knowledge
• Expensive!!
– The cost to store
– The cost to use  Could be expensive
– Does it make the game better  hopefully
• Usually in development
– Black and White is in field
• “Unfortunately, in the development of AIs for CGs the scope
of pattern recognition has not been widely realized…Our
motivation is to do pattern recognition ourselves to discern
where pattern recognition can be applied in CG” [Hakonen]
• Many classic academic games such as Chess (Deep Blue),
Backgammons, Go made use pattern recognition and
achieved good result.
• Many games do use neural networks, but does not
necessarily use pattern recognition.
• 10.3 Understanding Pattern Recognition
Methods
Intro to Pattern Recognition 1
• Wikipedia:
• Pattern recognition … can be defined as "the
act of taking in raw data and taking an action
based on the category of the data" [1].
– image analysis, character recognition, speech
analysis, man and machine diagnostics, person
identification and industrial inspection.
Intro to Pattern Recognition 2
• Textbook Definition (in the context of CG):
– To abstract relevant information from the game
world and, based on the retrieved information,
construct concepts and deduce patterns for the
use of higher level reasoning and decisionmaking systems.
Intro to Pattern Recognition 3
Abstract
relevant info,
compare to old
info in KB,
deduce pattern
Choose from a
list of possible
actions
available in the
current state,
balance it to the
requested state
Questions
1. Why do we need a decision-making system?
Why don’t we just simply map patterns to some
reaction? (Hint: Do we know everything about
the environment to make the decision?)
The world is not deterministic..
-Built-in randomness
-Human players’ action
2. Where do you think PR can be applied for
computer games?
Intro to Pattern Recognition 4
• Some suggested uses of pattern recognition:
– During game:
•
•
•
•
•
Enemy evaluation and prediction
Coaching
Group coordination
Terrain analysis
Learning
– Black and White
– Command and Conquer Renegade (attempted)
– During development
• Re-Volt
• Functional Approach.
• How is the use of PR differ with…
– Levels of decision making ?
– Stance towards the player ?
Functional – Level of Decision Making
• Suitability of pattern recognition depends on the level of
decision making:
– Strategic Level
– Tactical Level
– Operational Level
Functional– Level of Decision Making
Level of
decision
making
Duration
Size of
Plan
Strategic
Long-term All
players
Offline /
extensive
use
terrain is analyzed to
find advantageous
positions to plan the
maneuvers
Tactical
medium
Delivered
real time /
some use
decisions made on the
engaging platoons and
the battleground
A group
Speed/ Use
of Pattern
Recognition
Strategic Game
Example
Operational immediate Individual Made in real To shoot, dodge, or
time / no
charge
use
Functional - Stance
• Enemy
– Provide Challenge
– Demonstrate Intelligent (at least purposeful)
behavior
– PR to aid computer’s decision making
• Prediction and production
Functional - Stance
– Ally,
• Augment user interface
– Hints and guide
• Aiding the human
player
• End result should be
visually accessible and
consistent, but not
necessarily complete
Functional - Stance
Neutral :
-Context dependent
-Commentory
-highlighting events and
providing background
highlighting events and
background information
- soccer referee
-Judging rule violation
Functional– Stance
– Modeled Language?
• A generator labels events and states with symbol
• Modeling recognizes the underlying dependencies
between symbols
• Short term history sufficient
Functional–
Prediction
– Predict next symbol, calculate probability of the
next action of opponent
Functional–
Production
– observe opponent and produce next symbol
(memcpy)
• PR as these problems:
– Optimization
– Adaptation
– Uncertainty (not going to talk about)
• Algorithm used in each problem area
• How the level of decision making decides
what PR algorithm to use
Methodology - Optimization
Optimization
– an objective function (to max or min)
– A set of variables
– A set of constraints
• Pattern recognition as an optimization
problem is to have an objective function to
rank the solution candidates
Methodology - Optimization
Methodology - Optimization
• Project # 6: tweak AI to beat some opponents
• Marc’s presentation: the tweaking could be
done automatically
• This is related to pattern recognition
– Trying to identify strengths and weaknesses of
opponents
Methodology - Optimization
• Age of Empire Example
– Objective: Balance civilizations and units
– A combat comparison simulator is used to test battles
with different troop combinations
– The set of variables: Attributes (armor, hit point, damage,
range)
– The set of constraints: the range of allowed values
– Objective Function: min(difference of the number of
victories of different civilizations in simulator battles)
– Attributes are changed to even out discrepancies
These slides are taken from [Street]
Methodology - Optimization
• How to find more optimal variable values?
• Heuristic functions
– Fastest and simplest
– Get stuck at local optimum before finding global
– Use multiple search traces instead of one
Methodology - Optimization
• How to find more optimal variable values and
avoid local optimum?
• Genetic Algorithm
– Avoid local optimum. A population of candidate
solutions, go thru stages of natural selection,
objective function is to weed out the weak
candidates
– Vulnerable to dependency of variables, CPU
exhaustive
Methodology - Optimization
• Another way to find more optimal variable
values and avoid local optimum?
• Swarm Algorithm
– Avoid local optimum (if min speed is set)
– faster than genetic algorithm
– next 3 slides on Particle Swarm Optimization
Particle Swarm Optimization 1
– PSO is very similar to genetic algorithm, but does not
have genetic operators like crossover and mutation.
Particles update themselves with the internal velocity.
– The idea is similar to bird flocks searching for food, don’t
know where the food is, but know how far from it
Bird = particle, Food = optimal solution
pbest = the best solution (fitness) a particle has
achieved so far.
gbest = the global best solution of all particles
– pbest and gbest are stored in memory
Particle Swarm Optimization 2
• v[] = v0[] + c1 * rand() * (pbest[] - present[]) +
c2 * rand() * (gbest[] - present[]) -----------(a)
• present[] = present0[] + v[] ------------------(b)
v[]
present[]
rand ()
c1, c2
= the particle velocity
= the current particle (solution).
= a random number between (0,1).
= learning factors. usually c1 = c2 = 2.
http://uk.geocities.com/markcsinclair/pso.html
Optimization & the Level of Decision
Making
• Strategic Level:
– Computationally hard to solve
– With relaxed constraint that the variables are not
interdependent, then genetic algorithm
• Tactical Level:
– Must be responsive
– Single trace, heuristic functions
• Operational Level:
– Real-time,
– simple objective function (switch statement)
Methodology - Adaptation
• Adaptation
– The ability to make appropriate responses to
changed or changing circumstances
• Try to model the originator of the modeled
data
Methodology - Adaptation
• Adaptation vs.
Optimization
– Adaptation
• looks for a
function behind
a solution,
– Optimization
• looks for a
solution for a
given function
Methodology - Adaptation
• When is adaptation useful? Why is it hard to
use?
1. Useful when the affecting factors or mechanisms behind
the phenomena is unknown or dynamic
2. Hard to use because it requires sampling the search
space to cover sufficiently. The more complex the cause
is, the sparser the our sample gets (combinatorial
explosion)
Methodology - Adaptation
• Neural Networks
– Can adapt to situation where we do not have background
knowledge of dependencies
– Supervised (predefined categories for results) or
unsupervised learning
– For more information, please read Chp11
• Hidden Markov Model
– System is conditionally independent of the past states
– each state has a probability distribution over the possible
output tokens
– the challenge is to determine the hidden parameters from
the observable parameters
– Can adapt to recurring structure
Adaptation and the level of decision making
• Strategic Level
– Supervised or unsupervised learning
– Can afford great computational demand
• Tactical Level
– Hidden Markov models
– More dynamic environment
– Credibility of results can be evaluated
• Operation Level
– Stochastic interpretation for input data, or
– Use a ready-adapted neural network
Conclusion
• Pattern Recognition
–
–
–
–
PR is not well explored in gaming yet
Biggest limit is computational complexity
A lot of time is domain dependent
Algorithms used depend highly on required
responsiveness
– Functional Approach
• Level of decision making
• Stance
– Methodological Approach
• Optimization
• Adaptation
• Uncertainty ( not discussed )
References
•
Hu, Xiaohui. “Particle Swarm Optimization Tutorial,”
http://www.swarmintelligence.org/tutorials.php
•
Fraser, Neil. “Neural Network Follies.” http://neil.fraser.name/writing/tank/
•
German steps, number 1-10.
http://www.bbc.co.uk/languages/german/lj/language_notes/1_10.shtml
•
Shutton, R. “Learning to Predict by the Method of Temporal Differences,”
ftp://ftp.cs.umass.edu/pub/anw/pub/sutton/sutton-88.ps.gz
•
Kidd, Petersen, Street, “How to Balance a Real-Time Strategy Game: Lessons from the
Age of Empires Series,” http://www.gdconf.com/archives/2001/gstreetprintable3.ppt
•
Timo Kaukoranta, Jouni Smed, Harri Hakonen. “Role of Pattern Recognition in Computer
Games.” http://staff.cs.utu.fi/staff/jouni.smed/papers/PRinCG.pdf