Nature Genetic Algorithm
Download
Report
Transcript Nature Genetic Algorithm
Machine Learning
Robin Burke
GAM 376
Outline
Terminology
Parameter learning
Hill-climbing
Simulated annealing
Decision learning
Decision trees
Neural networks
Genetic algorithms
Terminology
On-line learning
Off-line learning
a separate learning step that occurs used stored data
Over-fitting
learning that takes place while the system is operational
arises when a learner adapts too completely to off-line data
can't cope with novel situations
Fitness landscape
A way to visualize the learning process
Fitness = good outcomes
n dimensions to adjust behavior
which one gets us to the "peak"
Parameter learning
A variable or set of variables
Different performance at different points in
the parameter space
Example
Parameters
• max. kick velocity
• max. running velocity
• max. steering force
Performance metric
• time of possession
Hill Climbing
We want to find the best combination of values
cannot simply try them all
there may be many, many possibilities
there may be many dimensions
each test may be expensive
Simple idea
pick a point
test its fitness
test neighbors in each dimension
• or in some nearby region
move to the point that is most fit
repeat
Step size
How near do I look?
step size too small = lots of tests
step size too big = might miss a
narrow peak
Fix
adaptive resolution
• at the beginning of the search, large step
size
• at the end of the search, small step
Local minimum
Can get stuck in a "foothill"
even if there is a better peak elsewhere
Fixes
Momentum
• prefer the direction of past movement
• sometimes overshooting will carry past a local
minimum
Random restart
• repeat many times with different random starting
points
Simulated annealing
Annealing
process of cooling metal slowly
creates better crystalline structure
Why is this?
heat is essentially random motion of
molecules
the lowest-energy state of the system is the
most stable
• no energy to be gained by changing things
if heat is removed suddenly
• the system may be "frozen" in an unstable state
Learning analogy
Introduce "heat"
randomness in state change
add a random term to fitness assessment
we don't always go "up" the real fitness gradient
After k steps of climbing
reduce the temperature
the process converges to regular hill-climbing
• when temp = 0
but the randomness makes sure that more of the
parameter space is explored
• harder to miss big optima
Decision learning
Game AI makes many decisions
when to select which weapon
when to pass, shoot or dribble
when to switch to attack or retreat state
We can think of these choices as rule-governed
if a, b, c have particular values, then do x
if a, b, c have other values, then do y
Or we can think of them as a divided space
a, b and c are dimensions
x and y are regions of the space
• where different behaviors are appropriate
Example
We want to learn
Then we will know
a boundary between
the regions
for any given
parameter combination
what decision to make
Classification learning
Feedback
Most learning methods are
"supervised"
the learning algorithm gets examples
• sets of parameters
is told what the right decision is
• in each case
The goal is to be able to produce the
right classification
for examples not seen originally
Dimensionality
The more dimensions
the harder the decision boundary is to learn
Example
learn how to play Counter-Strike
• too many dimensions of input
• too many kinds of decisions
learn where to expect an enemy ambush
• input = map features
• classification simple
Better to learn bits and pieces
than fullscale behavior
not very much like human learning
Decision trees
The idea
produce a tree
where each node is a feature value
discriminate your concept
Learn the decision tree
by examining examples
looking for features that divide up the
space of examples
Example
Data
class
body
covering
habitat
flies?
breathes air?
1
m
hair
land
no
yes
2
m
hair
land
yes
yes
3
m
other
water
no
yes
4
b
feathers
land
yes
yes
5
b
feathers
land
no
yes
6
f
scales
water
no
no
7
f
scales
water
yes
no
8
f
other
water
no
no
9
f
scales
water
no
yes
10
r
scales
land
no
yes
ID3 Algorithm
Start with an initial node
Choose a feature value that splits the
examples most nearly in half
For each subnode
(all examples)
do this again
Continue until all of the examples in the
node
have the same class
Information theory
Information as a measurable quantity
"a difference that makes a difference"
Measured by number of bits needed to send a message
Example
Language A, B, C, D
Might need two bits per symbol
•
What if you know something about distribution
A=01, B=10, C=00, D=11
C and D are rare (1/8 each)
A is common (1/2)
B less common (1/4)
More efficient coding
A=0
B=10
C=110
D=111
1.75 bits per symbol
Entropy
What is the smallest number of bits needed to send a
message drawn from a given distribution?
depends on the possible symbols
and their frequencies
If the symbols all have the same frequency
more information needed to distinguish
If some symbols are rare and some high frequency
less information needed
This measure is called entropy
in some sense, measures the "disorder" of the
distribution
Information gain
The difference between entropy values
If one distribution has a lower entropy than
another
is called information gain
we say it has gained information
In ID3
we want each node to eventually have 0
entropy
all of the examples agree on the action
Example
Health
Position Ammo
Action
1
Healthy
In Cover
With
Ammo
Attack
2
Hurt
In Cover
With
Ammo
Attack
3
Healthy
In Cover
Empty
Defend
4
Hurt
In Cover
Empty
Defend
5
Hurt
Exposed
With
Ammo
Defend
Entropy
E = -pA log2 pA – pD log2 pD
The entropy of the whole set
Question
E(healthy) = 1, E(hurt) = 0.918
E(cover) = 1, E(exposed) = 0.0
E(ammo) = 0.918, E(empty) = 0.0
Information gain
What attribute divides the set into the lowest entropy pieces?
Health
0.971
current entropy minus
the entropy of each set * the number of examples classified
G = E - |T| * ET - |~T|*E~T
Example
G(health) = 0.020
G(cover) = 0.171
G(ammo) = 0.420
Result
Ammo?
Yes
No
Defend
Cover?
No
Defend
Yes
Attack
Extensions
There are extensions to deal with
multiple classes
discrete attributes
continuous attributes
• with some caveats
Neural Networks
Biological inspiration
axon
dendrites
synapses
The information transmission happens at the synapses.
Biological inspiration
• Tiny voltage spikes travel along the axon of the presynaptic neuron
• Trigger the release of neurotransmitter substances at the
synapse.
• Cause excitation or inhibition in the dendrite of the postsynaptic neuron.
• Integration of the signals may produce spikes in the next
neuron.
• Depends on the strength of the synaptic connection.
Artificial neurons
Neurons work by processing information. They receive and
provide information in form of voltage spikes.
x1
w1
x2
Inputs
xn-1
xn
z wi xi ; y H ( z )
w2
x3
…
n
i 1
..
w3
.
wn-1
wn
The McCullogh-Pitts model
Output
y
Artificial neurons
The McCullogh-Pitts model:
• spikes are interpreted as spike rates;
• synaptic strength are translated as synaptic weights;
• excitation means positive product between the
incoming spike rate and the corresponding synaptic
weight;
• inhibition means negative product between the
incoming spike rate and the corresponding synaptic
weight;
Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron:
y f ( x, w)
y is the neuron’s output, x is the vector of inputs, and w
is the vector of synaptic weights.
Examples:
y
1
1 e
ye
w xa
T
|| x w||2
2a 2
sigmoidal neuron
Gaussian neuron
Artificial neural networks
Inputs
Output
An artificial neural network is composed of many artificial
neurons that are linked together according to a specific
network architecture. The objective of the neural network
is to transform the inputs into meaningful outputs.
Artificial neural networks
Tasks to be solved by artificial neural networks:
• controlling the movements of an agent based on selfperception and other information (e.g., environment
information);
• deciding the category of situations (e.g., dangerous or
safe) in the game world;
• recognizing a pattern of behavior;
• predicting what an enemy will do next.
Learning in biological
systems
•Learning = learning by adaptation
•green fruits are sour
•yellowish/reddish ones are sweet
•adapt picking behavior
•Neural level
•changing of the synaptic strengths, eliminating some synapses,
and building new ones
•Form of optimization
•Best outcome for least resource usage
Learning principle for
artificial neural networks
ENERGY MINIMIZATION
We need an appropriate definition of energy for artificial
neural networks, and having that we can use
mathematical optimisation techniques to find how to
change the weights of the synaptic connections between
neurons.
ENERGY = measure of task performance error
Neural network mathematics
Inputs
Output
y11 2
1 y1 f ( y1 , w12 )
y32
2
2
3
y 12 f ( x 2 , w12 ) y1 y 2 2
2
1
2
y
f
(
y
,
w
y
y
y 2 f ( y , w2 )
Out
1)
3
1
2
y 31 f ( x3 , w31 )
y3 2
1
2
y3
1 y3 f ( y , w3 )
y4
y 14 f ( x 4 , w14 )
y11 f ( x1 , w11 )
MLP neural networks
MLP = multi-layer perceptron
Perceptron:
yout wT x
x
yout
MLP neural network:
1
y1k
w1 kT x a1k
1 e
y1 ( y11 , y12 , y31 )T
, k 1,2,3
1
y k2
w 2 kT y 1 a k2
1 e
y 2 ( y12 , y 22 )T
2
, k 1,2
yout wk3 y k2 w3T y 2
k 1
x
yout
Neural network tasks
• control
• classification
• prediction
• approximation
These can be reformulated
in general as
FUNCTION
APPROXIMATION
tasks.
Approximation: given a set of values of a function g(x)
build a neural network that approximates the g(x) values
for any input x.
Neural network
approximation
Task specification:
Data: set of value pairs: (xt, yt), yt=g(xt) + zt; zt is random
measurement noise.
Objective: find a neural network that represents the input /
output transformation (a function) F(x,W) such that
F(x,W) approximates g(x) for every x
Learning to approximate
Error measure:
1
E
N
N
2
(
F
(
x
;
W
)
y
)
t
t
t 1
Rule for changing the synaptic weights:
E
wi c
(W )
j
wi
j
wi
j , new
wi wi
j
j
c is the learning parameter (usually a constant)
Learning with
backpropagation
A simplification of the learning problem:
• calculate first the changes for the synaptic weights
of the output neuron;
• calculate the changes backward starting from layer
p-1, and propagate backward the local error terms.
The method is still relatively complicated but it
is much simpler than the original optimization
problem.
Issues with ANN
Often many iterations are needed
1000s or even millions
Overfitting can be a serious problem
No way to diagnose or debug the network
must relearn
Designing the network is an art
input and output coding
layering
often learning simply fails
Stability vs plasticity
Learning is usually one-shot
Cannot easily restart learning with new data
Genetic Algorithms
A class of probabilistic optimization
algorithms
Inspired by the biological evolution process
Uses concepts of “Natural Selection” and
“Genetic Inheritance” (Darwin 1859)
Originally developed by John Holland
(1975)
GA overview (cont)
Particularly well suited for hard problems where little
is known about the underlying search space
Widely-used in business, science, engineering
starting to be used in games and digital cinema
Definition
A genetic algorithm maintains a population of
candidate solutions for the problem at hand,
and makes it evolve by iteratively applying
a set of stochastic operators
Stochastic operators
Selection replicates the most successful solutions
found in a population at a rate proportional to their
relative quality
Recombination decomposes two distinct
solutions and then randomly mixes their parts to
form novel solutions
Mutation randomly perturbs a candidate solution
The Metaphor
Genetic Algorithm
Nature
Optimization problem
Environment
Feasible solutions
Individuals living in that
environment
Individual’s degree of
adaptation to its
surrounding environment
Solutions quality (fitness
function)
The Metaphor (cont)
Genetic Algorithm
Nature
A set of feasible solutions
A population of organisms
(species)
Stochastic operators
Selection, recombination
and mutation in nature’s
evolutionary process
Evolution of populations to
suit their environment
Iteratively applying a set of
stochastic operators on a
set of feasible solutions
The Metaphor (cont)
The computer model introduces simplifications
(relative to the real biological mechanisms),
BUT
surprisingly complex and interesting structures
have emerged out of evolutionary algorithms
Simple Genetic Algorithm
produce an initial population of individuals
evaluate the fitness of all individuals
while termination condition not met do
select fitter individuals for reproduction
recombine between individuals
mutate individuals
evaluate the fitness of the modified individuals
generate a new population
End while
The Evolutionary Cycle
selection
parents
modification
modified
offspring
initiate &
evaluate
population
evaluation
evaluated offspring
deleted
members
discard
A “Population”
Ranking by Fitness:
Mate Selection:
Fittest are copied and
replaced less-fit
Mate Selection Roulette:
Increasing the likelihood but not
guaranteeing the fittest reproduction
16%
0%
25%
7%
3%
11%
38%
Crossover:
Exchanging information through
some part of information (representation)
Mutation:
Random change of binary digits from 0 to 1
and vice versa (to avoid local minima)
Best Design
The GA Cycle
Example: Karl Sim’s
creatures
MIT Media lab
Typical “Chromosome”
Game Applications
Generate behavioral rules
not all the same
reasonably effective
plausible in the environment
Also used in movies
battle scenes in "Lord of the Rings"
don't want every orc to move the same
don't want thousands of extras in costume
Issues
Creating the representation is an art
difficult to get right the first time
also tricky to choose the right combination of
evolutionary operators
Computationally intensive
especially when determining "fitness" is complex
Many, many evolutionary steps required
Very problem-dependent
Rarely optimal
often odd artifacts develop and stay
• just like real evolution
usually in the region of optimality
Bot Project
Due Wednesday 5 pm
Tournament Thursday 11:45 in Game
Lab