Nature Genetic Algorithm

Download Report

Transcript Nature Genetic Algorithm

Machine Learning
Robin Burke
GAM 376
Outline


Terminology
Parameter learning



Hill-climbing
Simulated annealing
Decision learning



Decision trees
Neural networks
Genetic algorithms
Terminology

On-line learning


Off-line learning


a separate learning step that occurs used stored data
Over-fitting



learning that takes place while the system is operational
arises when a learner adapts too completely to off-line data
can't cope with novel situations
Fitness landscape




A way to visualize the learning process
Fitness = good outcomes
n dimensions to adjust behavior
which one gets us to the "peak"
Parameter learning



A variable or set of variables
Different performance at different points in
the parameter space
Example

Parameters
• max. kick velocity
• max. running velocity
• max. steering force

Performance metric
• time of possession
Hill Climbing


We want to find the best combination of values
 cannot simply try them all
 there may be many, many possibilities
 there may be many dimensions
 each test may be expensive
Simple idea
 pick a point
 test its fitness
 test neighbors in each dimension
• or in some nearby region


move to the point that is most fit
repeat
Step size

How near do I look?
step size too small = lots of tests
 step size too big = might miss a
narrow peak


Fix

adaptive resolution
• at the beginning of the search, large step
size
• at the end of the search, small step
Local minimum

Can get stuck in a "foothill"


even if there is a better peak elsewhere
Fixes

Momentum
• prefer the direction of past movement
• sometimes overshooting will carry past a local
minimum

Random restart
• repeat many times with different random starting
points
Simulated annealing

Annealing



process of cooling metal slowly
creates better crystalline structure
Why is this?


heat is essentially random motion of
molecules
the lowest-energy state of the system is the
most stable
• no energy to be gained by changing things

if heat is removed suddenly
• the system may be "frozen" in an unstable state
Learning analogy


Introduce "heat"
 randomness in state change
 add a random term to fitness assessment
 we don't always go "up" the real fitness gradient
After k steps of climbing
 reduce the temperature
 the process converges to regular hill-climbing
• when temp = 0

but the randomness makes sure that more of the
parameter space is explored
• harder to miss big optima
Decision learning



Game AI makes many decisions
 when to select which weapon
 when to pass, shoot or dribble
 when to switch to attack or retreat state
We can think of these choices as rule-governed
 if a, b, c have particular values, then do x
 if a, b, c have other values, then do y
Or we can think of them as a divided space
 a, b and c are dimensions
 x and y are regions of the space
• where different behaviors are appropriate
Example

We want to learn


Then we will know



a boundary between
the regions
for any given
parameter combination
what decision to make
Classification learning
Feedback

Most learning methods are
"supervised"

the learning algorithm gets examples
• sets of parameters

is told what the right decision is
• in each case

The goal is to be able to produce the
right classification

for examples not seen originally
Dimensionality


The more dimensions
 the harder the decision boundary is to learn
Example
 learn how to play Counter-Strike
• too many dimensions of input
• too many kinds of decisions

learn where to expect an enemy ambush
• input = map features
• classification simple

Better to learn bits and pieces
 than fullscale behavior
 not very much like human learning
Decision trees

The idea
produce a tree
 where each node is a feature value
 discriminate your concept


Learn the decision tree
by examining examples
 looking for features that divide up the
space of examples

Example
Data
class
body
covering
habitat
flies?
breathes air?
1
m
hair
land
no
yes
2
m
hair
land
yes
yes
3
m
other
water
no
yes
4
b
feathers
land
yes
yes
5
b
feathers
land
no
yes
6
f
scales
water
no
no
7
f
scales
water
yes
no
8
f
other
water
no
no
9
f
scales
water
no
yes
10
r
scales
land
no
yes
ID3 Algorithm

Start with an initial node



Choose a feature value that splits the
examples most nearly in half
For each subnode


(all examples)
do this again
Continue until all of the examples in the
node

have the same class
Information theory

Information as a measurable quantity



"a difference that makes a difference"
Measured by number of bits needed to send a message
Example


Language A, B, C, D
Might need two bits per symbol
•

What if you know something about distribution




A=01, B=10, C=00, D=11
C and D are rare (1/8 each)
A is common (1/2)
B less common (1/4)
More efficient coding





A=0
B=10
C=110
D=111
1.75 bits per symbol
Entropy




What is the smallest number of bits needed to send a
message drawn from a given distribution?
 depends on the possible symbols
 and their frequencies
If the symbols all have the same frequency
 more information needed to distinguish
If some symbols are rare and some high frequency
 less information needed
This measure is called entropy
 in some sense, measures the "disorder" of the
distribution
Information gain

The difference between entropy values


If one distribution has a lower entropy than
another


is called information gain
we say it has gained information
In ID3


we want each node to eventually have 0
entropy
all of the examples agree on the action
Example
Health
Position Ammo
Action
1
Healthy
In Cover
With
Ammo
Attack
2
Hurt
In Cover
With
Ammo
Attack
3
Healthy
In Cover
Empty
Defend
4
Hurt
In Cover
Empty
Defend
5
Hurt
Exposed
With
Ammo
Defend
Entropy


E = -pA log2 pA – pD log2 pD
The entropy of the whole set


Question




E(healthy) = 1, E(hurt) = 0.918
E(cover) = 1, E(exposed) = 0.0
E(ammo) = 0.918, E(empty) = 0.0
Information gain




What attribute divides the set into the lowest entropy pieces?
Health


0.971
current entropy minus
the entropy of each set * the number of examples classified
G = E - |T| * ET - |~T|*E~T
Example



G(health) = 0.020
G(cover) = 0.171
G(ammo) = 0.420
Result
Ammo?
Yes
No
Defend
Cover?
No
Defend
Yes
Attack
Extensions

There are extensions to deal with
multiple classes
 discrete attributes
 continuous attributes

• with some caveats
Neural Networks
Biological inspiration
axon
dendrites
synapses
The information transmission happens at the synapses.
Biological inspiration
• Tiny voltage spikes travel along the axon of the presynaptic neuron
• Trigger the release of neurotransmitter substances at the
synapse.
• Cause excitation or inhibition in the dendrite of the postsynaptic neuron.
• Integration of the signals may produce spikes in the next
neuron.
• Depends on the strength of the synaptic connection.
Artificial neurons
Neurons work by processing information. They receive and
provide information in form of voltage spikes.
x1
w1
x2
Inputs
xn-1
xn
z   wi xi ; y  H ( z )
w2
x3
…
n
i 1
..
w3
.
wn-1
wn
The McCullogh-Pitts model
Output
y
Artificial neurons
The McCullogh-Pitts model:
• spikes are interpreted as spike rates;
• synaptic strength are translated as synaptic weights;
• excitation means positive product between the
incoming spike rate and the corresponding synaptic
weight;
• inhibition means negative product between the
incoming spike rate and the corresponding synaptic
weight;
Artificial neurons
Nonlinear generalization of the McCullogh-Pitts
neuron:
y  f ( x, w)
y is the neuron’s output, x is the vector of inputs, and w
is the vector of synaptic weights.
Examples:
y
1
1 e
ye
w xa
T
|| x  w||2

2a 2
sigmoidal neuron
Gaussian neuron
Artificial neural networks
Inputs
Output
An artificial neural network is composed of many artificial
neurons that are linked together according to a specific
network architecture. The objective of the neural network
is to transform the inputs into meaningful outputs.
Artificial neural networks
Tasks to be solved by artificial neural networks:
• controlling the movements of an agent based on selfperception and other information (e.g., environment
information);
• deciding the category of situations (e.g., dangerous or
safe) in the game world;
• recognizing a pattern of behavior;
• predicting what an enemy will do next.
Learning in biological
systems
•Learning = learning by adaptation
•green fruits are sour
•yellowish/reddish ones are sweet
•adapt picking behavior
•Neural level
•changing of the synaptic strengths, eliminating some synapses,
and building new ones
•Form of optimization
•Best outcome for least resource usage
Learning principle for
artificial neural networks
ENERGY MINIMIZATION
We need an appropriate definition of energy for artificial
neural networks, and having that we can use
mathematical optimisation techniques to find how to
change the weights of the synaptic connections between
neurons.
ENERGY = measure of task performance error
Neural network mathematics
Inputs
Output
 y11  2
 1  y1  f ( y1 , w12 )
 y32 
 2
2
3
y 12  f ( x 2 , w12 ) y1   y 2  2
2
1
2
y

f
(
y
,
w
y

y


y 2  f ( y , w2 )
Out
1)
3

1
 2 
y 31  f ( x3 , w31 )
 y3  2
1
2
y3 
1  y3  f ( y , w3 )


 y4 
y 14  f ( x 4 , w14 )
y11  f ( x1 , w11 )
MLP neural networks
MLP = multi-layer perceptron
Perceptron:
yout  wT x
x
yout
MLP neural network:
1
y1k 
 w1 kT x  a1k
1 e
y1  ( y11 , y12 , y31 )T
, k  1,2,3
1
y k2 
 w 2 kT y 1  a k2
1 e
y 2  ( y12 , y 22 )T
2
, k  1,2
yout   wk3 y k2  w3T y 2
k 1
x
yout
Neural network tasks
• control
• classification
• prediction
• approximation
These can be reformulated
in general as
FUNCTION
APPROXIMATION
tasks.
Approximation: given a set of values of a function g(x)
build a neural network that approximates the g(x) values
for any input x.
Neural network
approximation
Task specification:
Data: set of value pairs: (xt, yt), yt=g(xt) + zt; zt is random
measurement noise.
Objective: find a neural network that represents the input /
output transformation (a function) F(x,W) such that
F(x,W) approximates g(x) for every x
Learning to approximate
Error measure:
1
E
N
N
2
(
F
(
x
;
W
)

y
)
 t
t
t 1
Rule for changing the synaptic weights:
E
wi  c 
(W )
j
wi
j
wi
j , new
 wi  wi
j
j
c is the learning parameter (usually a constant)
Learning with
backpropagation
A simplification of the learning problem:
• calculate first the changes for the synaptic weights
of the output neuron;
• calculate the changes backward starting from layer
p-1, and propagate backward the local error terms.
The method is still relatively complicated but it
is much simpler than the original optimization
problem.
Issues with ANN





Often many iterations are needed
 1000s or even millions
Overfitting can be a serious problem
No way to diagnose or debug the network
 must relearn
Designing the network is an art
 input and output coding
 layering
 often learning simply fails
Stability vs plasticity
 Learning is usually one-shot
 Cannot easily restart learning with new data
Genetic Algorithms




A class of probabilistic optimization
algorithms
Inspired by the biological evolution process
Uses concepts of “Natural Selection” and
“Genetic Inheritance” (Darwin 1859)
Originally developed by John Holland
(1975)
GA overview (cont)


Particularly well suited for hard problems where little
is known about the underlying search space
Widely-used in business, science, engineering

starting to be used in games and digital cinema
Definition

A genetic algorithm maintains a population of
candidate solutions for the problem at hand,
and makes it evolve by iteratively applying
a set of stochastic operators
Stochastic operators



Selection replicates the most successful solutions
found in a population at a rate proportional to their
relative quality
Recombination decomposes two distinct
solutions and then randomly mixes their parts to
form novel solutions
Mutation randomly perturbs a candidate solution
The Metaphor
Genetic Algorithm
Nature
Optimization problem
Environment
Feasible solutions
Individuals living in that
environment
Individual’s degree of
adaptation to its
surrounding environment
Solutions quality (fitness
function)
The Metaphor (cont)
Genetic Algorithm
Nature
A set of feasible solutions
A population of organisms
(species)
Stochastic operators
Selection, recombination
and mutation in nature’s
evolutionary process
Evolution of populations to
suit their environment
Iteratively applying a set of
stochastic operators on a
set of feasible solutions
The Metaphor (cont)
The computer model introduces simplifications
(relative to the real biological mechanisms),
BUT
surprisingly complex and interesting structures
have emerged out of evolutionary algorithms
Simple Genetic Algorithm
produce an initial population of individuals
evaluate the fitness of all individuals
while termination condition not met do
select fitter individuals for reproduction
recombine between individuals
mutate individuals
evaluate the fitness of the modified individuals
generate a new population
End while
The Evolutionary Cycle
selection
parents
modification
modified
offspring
initiate &
evaluate
population
evaluation
evaluated offspring
deleted
members
discard
A “Population”
Ranking by Fitness:
Mate Selection:
Fittest are copied and
replaced less-fit
Mate Selection Roulette:
Increasing the likelihood but not
guaranteeing the fittest reproduction
16%
0%
25%
7%
3%
11%
38%
Crossover:
Exchanging information through
some part of information (representation)
Mutation:
Random change of binary digits from 0 to 1
and vice versa (to avoid local minima)
Best Design
The GA Cycle
Example: Karl Sim’s
creatures

MIT Media lab
Typical “Chromosome”
Game Applications

Generate behavioral rules




not all the same
reasonably effective
plausible in the environment
Also used in movies



battle scenes in "Lord of the Rings"
don't want every orc to move the same
don't want thousands of extras in costume
Issues



Creating the representation is an art
 difficult to get right the first time
 also tricky to choose the right combination of
evolutionary operators
Computationally intensive
 especially when determining "fitness" is complex
 Many, many evolutionary steps required
 Very problem-dependent
Rarely optimal
 often odd artifacts develop and stay
• just like real evolution

usually in the region of optimality
Bot Project
Due Wednesday 5 pm
 Tournament Thursday 11:45 in Game
Lab
