Transcript PPT
Mutation Operator
Evolution for EA-Based
Neural Networks
By Ryan Meuth
Reinforcement Learning
Environment
Reward
State
Agent
State Value Estimate
Action Policy
Action
Reinforcement Learning
Good for On-Line learning where little is known
about environment
Easy to Implement in Discrete Environments
Value estimate can be stored for each state
In infinite time, optimal policy guaranteed.
Hard to Implement in Continuous Environments
Infinite States! Must estimate Value Function.
Neural Networks Can be used for function
approximation.
Neural Network Overview
Feed Forward Neural Network
Based on biological theories of neuron operation
Feed-Forward Neural Network
Recurrent Neural Network
Neural Network Overview
Traditionally used with Error BackPropagation
BP uses Samples to Generalize to Problem
Few “Unsupervised” Learning Methods
Problems with No Samples: On-Line
Learning
Conjugate Reinforcement Back
Propagation
EA-NN
Both Supervised and Unsupervised
Learning Method.
Uses weight set as genome of individual
Fitness Function is Mean-Squared Error
over target function.
Mutation Operator is a sample from a
Gaussian Distribution.
Possible that mutation operator might not be
best.
Uh… Why?
Could improve EA-NN efficiency
Faster Online Learning
Revamped tool for Reinforcment Learning
Smarter Robots.
Why Use an EA?
Knowledge – Independent
Experimental Implementation
First Tier – Genetic Programming
Individual is Parse-tree representing Mutation
operator
Fitness is Inverse of sum of MSE’s from EA Testbed
Second Tier – EA Testbed
4 EA’s, spanning 2 classes of problems
2 Feed-Forward Non-Linear Approximations
1 High-Order, 1 Low-Order
2 Recurrent Time Series Predictions
1 Will be Time-Delayed, 1 Not Time-Delayed
GP Implementation
Functional Set: {+,-,*,/}
Terminal Set:
Weight to be Modified
Random Constant
Uniform Random Variable
Over-Selection: 80% of Parents from top 32%
Rank-Based Survival
Initialized by Grow Method (Max Depth of 8)
Fitness: 1000/(AvgMSE) – num_nodes
P(Recomb) = 0.5; P(Mutation) = 0.5;
Repair Function
5 runs, 100 generations each.
Steady State: Population of 1000 individuals, 20 children per
generation.
EA-NN Implementation
Recombination: Multi-Point Crossover
Mutation: Provided by GP
Fitness: MSE over test function (minimize)
P(Recomb) = 0.5; P(Mutation) = 0.5;
Non-Generational: Population of 10
individuals, 10 children per generation
50 Runs of 50 Generations.
Results
This is where results would go.
Single Uniform Random Variable: ~380
Observed Individuals: ~600
Improvement! Just have to Wait and
See…
Conclusions
I don’t know anything yet.
Questions?
Thank You!