here. - University of Sussex

Download Report

Transcript here. - University of Sussex

The Neuronal
Replicator
Hypothesis
Chrisantha Fernando & Eors Szathmary
CUNY, December 2009
1Collegium
Budapest (Institute for Advanced Study), Budapest, Hungary
for Computational Neuroscience and Robotics, Sussex University, UK
3MRC National Institute for Medical Research, Mill Hill, London, UK
4Parmenides Foundation, Kardinal-Faulhaber-Strase 14a, D-80333 Munich, Germany
5Institute of Biology, Eötvös University, Pázmány Péter sétány 1/c, H-1117 Budapest, Hungary
2Centre
Visiting Fellow
MRC National Institute for
Medical Research
London
Post-Doc
Center for Computational
Neuroscience and Robotics
Sussex University
Marie Curie Fellow
Collegium Budapest
(Institute for Advanced Study)
Hungary
The Hypothesis
• Evolution by natural selection takes
place in the brain at rapid timescales
and contributes to solving
cognitive/behavioural search problems.
• Our background is in evolutionary
biology/the origin of non-enzymatic
template replication/evolutionary
robotics/computational neuroscience.
Outline
• Limitations of some proposed search
algorithms, e.g.
• Reward biased stochastic search
• Reinforcement Learning
• How copying/replication of neuronal
data structures can alleviate these
limitations.
• Mechanisms of neuronal replication
• Applications and future work
Simple Search Tasks
• Behavioural and neuropsychological
learning tasks can be solved by
stochastic-hill climbing
• Stroop Task
• Wisconsin Card Sorting Task (WCST)
• Instrumental Conditioning in Spiking
Neural Networks
• Simple inverse kinematics problem
Stochastic HillClimbing
•
•
•
•
•
•
Initially P(xi = 1) = 0.5, Initial reward = 0
0.5 0.5 0.5 0.5 0.5
Make random change to P
Generate M examples of binary strings
0.5 0.5 0.5 0.5 0.5
Calculate reward
If r(t) > r(t-1), keep changes of P, else
revert to previous P values.
One solution, change solution, keep good
changes, loose bad changes.
0.8 0.5 0.5 0.4 0.5
Can get stuck on
local optima
Stroop Task
Green Red Blue Purple Blue Purple
Blue Purple Red Green Purple Green
Name the colour of the words.
dW = Reward x pre x post
Decreased reward -> Instability in workspace
Dehaene et al, 1998
WCST
• Each card has several “features”.
Subjects must sort cards according to a
feature (color, number, shape, size).
• Rougier et al 2005. PFC weights stabilised if
expected reward obtained, destabilised if
expected reward not obtained, i.e. TD
learning
Instrumental
Conditioning
In a spiking neural
net
• Simple spiking model
• Random connections
• STDP
• Delayed reward
• Eligibility traces
• Synapse selected
Izhikevich 2007
• Simple spiking model
STDP
Time tpre
Time tpost
Interval = Tpost - Tpre
Time tpost
Time tpre
Interval = Tpost - Tpre
A simple 2D inverse
kinematics problem
Reinforcement
Learning
•
•
•
For large problems a tabular representation of stateaction pairs is not possible.
How does compression of state representation
occur? Function approximation
Domain-specific knowledge provided by the
designer, e.g. TD-Gammon was dependent on
Tesauro’s skillful design of a non-linear multilayered
neural network, used for value function
approximation in the Backgammon domain
consisting of approximately 1020 states” p20 [51].
So far…
• SHC works on simple problems
• RL is a sophisticated kind of SHC
• In order for RL/SHC to work,
action/value representations must fit the
problem domain.
• RL doesn’t explain how appropriate
data-structures/representations arise.
Large search space so
random search or
exhaustive search not
possible.
Representation critical
local optima.
Requires internal
sub-goals, no explicit
reward.
What neural mechanisms underlie complex search?
What is natural
selection?
1. multiplication
2. heredity
3. variability
Some hereditary traits
affect survival and/or
fertility
Natural selection
reinvented itself
Evolutionary
Computation
• Solving problems by EC also requires
decisions about genetic representations
• And about fitness functions
• For example, we use EC to solve the 10
coins problem
Fitness
function
• Convolution of desired inverted
triangle over grid
• Instant fitness = number of coins
occupying he inverted triangle
template
• An important question is how such
fitness functions (subgoals/goals)
could themselves be bootstrapped in
cognition.
ichael Ollinger, Parmenides Foundation, Munich
Structuring Phenotypic
Variation
• Natural Selection can act on
• genetic representations
• variability properties (genetic
operators, e.g mutation rates)
Variation in Variability
A
Improvement of representations for free…
B
Non-trivial Neutrality
g1
ed 1
p
ed 2
g2
Adapted from Toussaint 2003
Population Search
• Natural selection allows redistribution of
search resources between multiple
solutions.
• We propose that multiple (possibly
interacting) solutions to a search
problem exist at the same time in the
neuronal substrate.
B
AA
A
B
C
D
A
B
C
D
D
C
B
AA
D
C
AA
B
B
DD
A
B
C
D
A
B
C
D
C
D
A
CC
Waste
A
D’ D’’
D
D’’’
A
B
D’
D’’ D’’’
D
Can units of selection
exist in the brain?
• We propose 3 possible mechanisms
• Copying of connectivity patterns
• Copying of bistable activity patterns
• Copying of spatio-temporal spike
patterns & explicit rules
Copying of
connectivity patterns
How to copy small neuronal circuits
DNA
neuronal network
STDP and causal inference
With error correction and sparse
activation
1 + 1 Evolution
Stratergy
Copying of bistable
activity patterns
1 bit copy
Hebbian Learning can Structure Exploration
Distributions
- Search in biased towards previous local optima
The Origin of Heredity in Neuronal Networks.
Genotype 2
CM2= M1
C = M2-1M1
Phenotype 2
M2
C
Genotype 1
Phenotype 1
M1
Non-local, e.g. requires ATA
Stochastic hill climbing can select for neuronal
template replication
Genotype 2
M2
C
Error
Genotype 1
M1
E
Copying of
Spatiotemporal Spike
Patterns & Explicit
Rules
Spatiotemporal spike patterns
ABA vs ABB
DD vs DS
Visual shift-invariance
mechanisms applied
to linguistics.
APPLICATIONS
• Evolution of Predictors (Feed-forward
Models/Emulators/Bayesian Causal
Networks).
• First derivative of predictability
• Evolution of Linguistic Construction
• Evolution of controllers for robot handmanipulation
• Evolution of Productions in ACTR/Copycat
• Evolution of representations and
search for insight problem solving.
Larranaga et al, 1996. Structure Learning of Bayesian Networks by Genetic Algorithms.
Kemp & Tenenbaum, 2008. The discovery of structural form.
Operations to
construct a BN
Luc Steels et al, Sony Labs
Istvan Zacher
Collegium Budapest (Institute for Advanced Study)
K
K(v)
0
1
S(p)
C(p)
K
S
C
S
C
0
1
K
K(v)
0
1
S(p)
C(p)
K
S
C
S
C
0
1
Rules
K
K(v)
0
1
S(p)
C(p)
K
S
C
S
C
0
1
Rules
K
K(v)
0
1
S(p)
C(p)
K
S
C
S
C
0
1
Rules
K
K(v)
0
1
S(p)
C(p)
K
S
C
0
1
Rules
S
C
KC
K
K(v)
0
1
S(p)
C(p)
K
S
C
0
1
Rules
S
C
KC S
K
S
C
Rules
K
S
C
KC S
K
K(v)
0
1
S(p)
C(p)
K
S
C
0
1
Rules
S
C
KC S
Helge Ritter, Bielefeld, Germany
Thanks to
Richard Goldstein
Richard Watson
Dan Bush
Eugine Izhikevich
Phil Husbands
Luc Steels
K.K. Karishma
Anna Fedor, Zoltan Szatmary, Szabolcs Szamado,
Istvan Zachar
Anil Seth