Teacher Document - Computer Science

Transcript Teacher Document - Computer Science

Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
Test Beds for Artificial Intelligence
• Perfect instrumentation and
measurements
• Perfect control and manipulation
• Reduced cost
• Reduced risk
• Great way to showcase algorithms
Improve User Experience
• Create adaptive, believable game AI
• Compose great multiplayer matches
based on skill and social criteria
• Mitigate Network latency using
prediction
• Create realistic character movement
Partially observable stochastic games
States only partially observed
Multiple agents choose actions
Stochastic pay-offs and state transitions depend on state and all the
other agents’ actions
Goal: Optimise long term pay-off (reward)
Just like life: complex, adversarial, uncertain, and we are in it for
the long run!
Approximations
and Methods
• Reinforcement Learning
• Unsupervised Learning
• Supervised Learning
• Planning and Pathfinding
What is the best
AI?
• Maximises its chances of winning
• Delivers best entertainment value
• ???
Agents
1977
Non-Player
Character
Human Player
2001
Number of Games Consoles
Sold (May 2009)
Worldwide Video Game Revenues
50 million
$41.9 bln
$31.6 bln
30 million
$21.9 bln $23.3 bln
$26.3 bln $27.7 bln
23 million
2002
Wii
Xbox 360
PS3
2003
2004
2005
2006
2007
Hardware
development
• Graphics (graphics cards, displays)
• Sound (5.1, 7.1 surround, speakers, headphones)
• CPU speed, cache
• Memory, Hard disks, DVD, HDDVD (Blu-Ray)
• Networking (Broadband)
Software
development
• Graphics (rendering etc.)
• Sound (sound rendering etc.)
• Physics simulation (e.g., Havoc engine)
• Networking software (e.g., Xbox Live)
• Artificial Intelligence
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
Objective is to nurture
creatures called norns
Model incorporates
artificial life features
Norns had neural
network brains
Their development can
be influenced by player
feedback
Peter Molineux’s famous
“God Game”
Player determines fate of
villagers as their “God”
(seen as a hand)
Creature can be taught
complex behaviour
Good and Evil - actions
have consequence
Variety of tracks, drivers
and road conditions
Racing line provided by
author, neural network
keeps car on racing line
Multilayer perceptrons
trained with RPROP
Simple rules for recovery
and overtaking
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
XBOX Game
• Dynamic Racing Line
• Learning a Drivatar
• Using a Drivatar
Ranking and Matchmaking for Xbox Live and Halo 3
Xbox 360 Live (Launched 2005)
Every game uses TrueSkill™ to match players
> 20 million players, > 3 million matches per day
> 2 billion hours of gameplay
Halo 3 (Launched 2007)
Largest entertainment launch in history
> 200,000 player concurrently (peak: 1 million)
Halo 3 Game
• Matchmaking
• Skill Stats
• Tight Matches
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
Supervised
Learning
Reinforcement
Learning
Unsupervised
Learning
• Learning an input-output relationship from examples
• Tasks: regression, classification, ranking
• Applications: skill estimation, behavioural cloning
• Learning policies from state-action-reward sequences
• Tasks: control, value estimation, policy learning
• Applications: learning to drive, learning to walk, learning to fight
• Learning the underlying structure from examples
• Tasks: clustering, manifold learning, density estimation
• Applications: modelling motion capture data, user behaviour
Goal: Learn from skilled players how to act in a first-person
shooter (FPS) game
Test Environment:
Unreal Tournament FPS game engine
Gamebots control framework
Idea: Naive Bayes classifier to learn under which circumstances
to switch behaviour
St: bot’s state at t
St+1: bot’s state t+1
H: health level
W: weapon
OW: opponent’s weapon
HN: hear noise
NE: number of close enemies
PW: weapon close by?
PH: health pack close by?
H
W
OW
HN
NE
PW
PH
Naive Bayes
(“inverse
programming”)
• Specify probabilities P(H| St+1) and P(St+1|St)
• Use Bayes rule for P(St+1|H,W,...) etc.
Supervised
learning
• Recognition of state of human trainer
• Reading out variables from game engine
• Determine relative frequencies to estimate
probabilities P(H| St+1) (table).
Adaptive avatar for
driving
Separate game mode
Basis of all in-game AI
Basis of “dynamic”
racing line
“Built-In” AI Behaviour
Development Tool
Drivatar
Learning System
Drivatar Racing Line
Behaviour Model
Vehicle Interaction and Racing
Strategy
Recorded Player
Driving
Controller
Car Behaviour
Drivatar AI Driving
Two phase process:
1. Pre-generate possible racing lines prior to the race from a
(compressed) racing table.
2. Switch the lines during the race to add variability.
Compression reduces the memory needs per racing line
segment
Switching makes smoother racing lines.
Segments
a1
a2
a3
a4
Car Position and
Speed at time t
Static Car
Properties
Car Controls
Physics (“causal”)
Physics
Simulation
System
Control (“inverse”)
Controller
Car Position and
Speed at time
t+1
Desired Pos. and
Speed at time
t+1
Competition is central to our lives
Innate biological trait
Driving principle of many sports
Chess Rating for fair competition
ELO: Developed in 1960 by Árpád Imre Élő
Matchmaking system for tournaments
Challenges of online gaming
Learn from few match outcomes efficiently
Support multiple teams and multiple players per team
Given:
Match outcomes: Orderings among k teams consisting of n1, n2 , ...,
nk players, respectively
Questions:
Skill si for each player such that
Global ranking among all players
Fair matches between teams of players
Latent Gaussian performance model for fixed skills
Possible outcomes: Player 1 wins over 2 (and vice versa)
s1
s2
p1
p2
y1
2
Gaussian Prior Factors
s
s
s
s
1
2
3
4
Fast and efficient approximate message passing using
Expectationt Propagation
t
t
1
2
3
y1
y2
2
3
Ranking Likelihood Factors
Leaderboard
Global ranking of all players
Matchmaking
For gamers: Most uncertain outcome
For inference: Most informative
Both are equivalent!
Data Set: Halo 2 Beta
3 game modes
Free-for-All
Two Teams
1 vs. 1
> 60,000 match outcomes
≈ 6,000 players
6 weeks of game play
Publically available
40
35
30
Level
25
20
15
char (TrueSkill™)
SQLWildman (TrueSkill™)
char (Halo 2 rank)
SQLWildman (Halo 2 rank)
10
5
0
0
100
200
Number of Games
300
400
Winning probability
100%
char wins
SQLWildman wins
Both players draw
80%
60%
40%
20%
0%
0
5/8 games won by char
100
200
Number of games played
300
400
500
Golf (18 holes): 60 levels
Car racing (3-4 laps): 40 levels
UNO (chance game): 10 levels
Model time-series of skills
by smoothing across time
History of Chess
3.5M game outcomes (ChessBase)
20 million variables (each of
200,000 players in each year of
lifetime + latent variables)
40 million factors
pt,i
pt,j
st,i
st,j
pt,i
pt,j
pt+1,i
pt+1,j
st+1, i
st+1, j
pt+1,i
pt+1,j
Garry Kasparov
3000
2800
Robert James Fischer
Anatoly Karpov
Skill estimate
2600
Mikhail Botvinnik
Paul Morphy
2400
Whilhelm Steinitz
2200
Boris V Spassky
Emanuel Lasker
2000
1800
Jose Raul Capablanca
1600
Adolf Anderssen
1400
1850
1858
1866
1875
1883
1891
1899
1907
1916
1924
1932
Year
1940
1949
1957
1965
1973
1981
1990
1998
2006
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
game
state
Agent
parameter update
reward /
punishment
action
Learning Algorithm
game state
Game
action
+10.0
actions
Q-Table
THROW
KICK
STAND
13.2
10.2
-1.3
3.2
6.0
4.0
1ft / GROUND
2ft / GROUND
game states
3ft / GROUND
4ft / GROUND
5ft / GROUND
6ft / GROUND
1ft / KNOCKED
2ft / KNOCKED
3ft / KNOCKED
4ft / KNOCKED
5ft / KNOCKED
6ft / KNOCKED
3 ft
Q(s,a) is expected reward for action a in state s
α is rate of learning
a is action chosen
r is reward resulting from a
s is current state
s’ is state after executing a
γ is discount factor for future rewards
Q Learning (off-policy)
SARSA (on-policy)
Game state features
•
•
•
•
Reinforcement Learner
Separation (5 binned ranges)
Last action (6 categories)
Mode (ground, air, knocked)
Proximity to obstacle
Available Actions
• 19 aggressive (kick, punch)
• 10 defensive (block, lunge)
• 8 neutral (run)
Q-Function Representation
• One layer neural net (tanh)
In-Game AI Code
Reward for decrease in Wulong Goth’s health
Early in the learning process …
… after 15 minutes of learning
Punishment for decrease in either player’s health
Early in the learning process …
… after 15 minutes of learning
Left Distance
1. Collect Experience
2. Learn transition probabilities and
rewards
3. Revise Value Function and Policy
4. Revise state-action abstraction
5. Return to 1 and collect more
experience
Speed
Too Coarse
Just Right!
Too Fine
Representational Complexity
A
A
A
A
Split
A
A
A
Merge
A
Split
Merge
Real time racing
simulation.
Goal: as fast lap
times as possible.
Laser Range Finder
Measurements as Features
Progress along Track as
Reward
Coast
Accelerate
Brake
Hard-Left
Hard-Right
Soft-Left
Soft-Right
2D Demo
• Principles Explained
XBOX 360 Integration
• Efficient Implementation
• 60 fps
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges
Fix Markers at key body
positions
Record their position in
3D during motion
Fundamental technology
in animation today
Free download of mocap files:
www.bvhfiles.com
Generative model for dimensionality reduction
Probabilistic equivalent to PCA which defines a probability
distribution over data
Non-linear manifolds based on kernels
Visualisation of high-dimensional data
Back-projection from latent to data space
Can deal with missing data
Principal Component Latent variables
Analysis:
Marginalise over x and
optimise W
Gaussian Process Latent
Variable Model:
Marginalise over W and
optimise x
Weight matrix
x
W
y
Data
Tutorial
Introduction
Commercial
Games
Supervised
Learning
Machine
Learning
Coffee
Break
Reinforcement
Learning
Artificial
Intelligence
Unsupervised
Learning
Pathfinding
Planning in
Games
Next Steps
Testbeds
Future
Challenges

Teacher Document - Computer Science

Transcript Teacher Document - Computer Science

Directory