Transcript PPT

Mutation Operator
Evolution for EA-Based
Neural Networks
By Ryan Meuth
Reinforcement Learning
Environment
Reward
State
Agent
State Value Estimate
Action Policy
Action
Reinforcement Learning
Good for On-Line learning where little is known
about environment
Easy to Implement in Discrete Environments


Value estimate can be stored for each state
In infinite time, optimal policy guaranteed.
Hard to Implement in Continuous Environments


Infinite States! Must estimate Value Function.
Neural Networks Can be used for function
approximation.
Neural Network Overview
Feed Forward Neural Network

Based on biological theories of neuron operation
Feed-Forward Neural Network
Recurrent Neural Network
Neural Network Overview
Traditionally used with Error BackPropagation


BP uses Samples to Generalize to Problem
Few “Unsupervised” Learning Methods
Problems with No Samples: On-Line
Learning
Conjugate Reinforcement Back
Propagation
EA-NN
Both Supervised and Unsupervised
Learning Method.
Uses weight set as genome of individual
Fitness Function is Mean-Squared Error
over target function.
Mutation Operator is a sample from a
Gaussian Distribution.

Possible that mutation operator might not be
best.
Uh… Why?
Could improve EA-NN efficiency



Faster Online Learning
Revamped tool for Reinforcment Learning
Smarter Robots.
Why Use an EA?

Knowledge – Independent
Experimental Implementation
First Tier – Genetic Programming


Individual is Parse-tree representing Mutation
operator
Fitness is Inverse of sum of MSE’s from EA Testbed
Second Tier – EA Testbed


4 EA’s, spanning 2 classes of problems
2 Feed-Forward Non-Linear Approximations
1 High-Order, 1 Low-Order

2 Recurrent Time Series Predictions
1 Will be Time-Delayed, 1 Not Time-Delayed
GP Implementation
Functional Set: {+,-,*,/}
Terminal Set:



Weight to be Modified
Random Constant
Uniform Random Variable
Over-Selection: 80% of Parents from top 32%
Rank-Based Survival
Initialized by Grow Method (Max Depth of 8)
Fitness: 1000/(AvgMSE) – num_nodes
P(Recomb) = 0.5; P(Mutation) = 0.5;
Repair Function
5 runs, 100 generations each.
Steady State: Population of 1000 individuals, 20 children per
generation.
EA-NN Implementation
Recombination: Multi-Point Crossover
Mutation: Provided by GP
Fitness: MSE over test function (minimize)
P(Recomb) = 0.5; P(Mutation) = 0.5;
Non-Generational: Population of 10
individuals, 10 children per generation
50 Runs of 50 Generations.
Results
This is where results would go.
Single Uniform Random Variable: ~380
Observed Individuals: ~600
Improvement! Just have to Wait and
See…
Conclusions
I don’t know anything yet.
Questions?
Thank You!