Transcript lec3

CSC2535:
Advanced Machine Learning
Lecture 11b
Adaptation at multiple
time-scales
Geoffrey Hinton
An overview of how biology solves
search problems
• Searching for good combinations can be very slow
if its done in a naive way.
• Evolution has found many ways to speed up
searches.
– Evolution works too well to be blind. It is being
guided.
– It has discovered much better methods than the
dumb trial-and-error method that many biologists
seem to believe in.
Some search problems in Biology
• Searching for good genes and good policies for when to
express them.
– To understand how evolution is so efficient, we need to
understand forms of search that work much better than
random trial and error.
• Searching for good policies about when to express muscles.
– Motor control works much too well for a system with a 30
mille-second feedback loop.
• Searching for the right synapse strengths to represent how
the world works
– Learning works much too well to be blind trial and error. It
must be doing something smarter than just randomly
perturbing synapse strengths.
A way to make searches work better
• In high-dimensional spaces, it is a very bad
idea to try making multiple random changes.
– Its impossible to learn a billion synapse
strengths by randomly changing synapses.
– Once the system is significantly better than
random, almost all combinations of random
changes will make it worse.
• It is much more effective to compute a
gradient and change things in the direction
that makes things better.
– That’s what brains are for. They are
devices for computing gradients. What of?
A different way to make searches
work better
• It is much easier to search a fitness
landscape that has smooth hills rather than
sharp spikes.
– Fast adaptive processes can change the
fitness landscape to make search much
easier for slow adaptive processes.
An example of a fast adaptive process changing
the fitness landscape for a slower one
• Consider the task of drawing on a blackboard.
– It is very hard to do with a dumb robot arm:
• If the robot positions the tip of the chalk just
beyond the board, the chalk breaks.
• If the robot positions the chalk just in front of the
board, the chalk doesn’t leave any marks.
• We need a very fast feedback loop that uses the
force exerted by the board on the chalk to stop
the chalk.
– Neural feedback is much too slow for this.
A biological solution
• Set the relative stiffnesses of opposing muscles so that
the equilibrium point has the tip of the chalk just beyond
the board.
• Set the absolute stiffnesses so that small perturbations
from equilibrium only cause small forces (this is called
“compliance”).
• The feedback loop is now in the physical system so it
works at the speed of shockwaves in the arm.
– The feedback in the physics makes a much nicer
fitness landscape for learning how to set the muscle
stiffnesses.
The energy landscape created by two
opposing muscles
Physical
energy in the
opposing
springs
start
Location
of board
Location of
endpoint
The difference of the two muscle stiffnesses determines
where the minimum is. The sum of the stiffnesses
determines how sharp the minimum is.
Two fitness landscapes
• System that directly
specifies joint angles
• System that specifies
spring stiffnesses
fitness
fitness
neural signals
neural signals
Objective functions versus programs
• By setting the muscle stiffnesses, the brain creates an
energy function.
– Minimizing this energy function is left to the physics.
– This allows the brain to explore the space of objective
functions (i.e. energy landscapes) without worrying
about how to minimize the objective function.
• Slow adaptive processes should interact with fast ones
by creating objective functions for them to optimize.
– Think how a general interacts with soldiers. He
specifies their goals.
– This avoids micro-management.
Generating the parts of an object
“square”
+
pose parameters
sloppy top-down
activation of parts
parts with topdown support
clean-up using lateral
interactions specified
by the layer above.
Its like soldiers on a
parade ground
Another example of the same principle
• The principle: Use fast adaptive processes to
make the search easier for slow ones.
• An application: Make evolution go a lot faster by
using a learning algorithm to create a much
nicer fitness landscape (the Baldwin effect).
• Almost all of the search is done by the learning
algorithm, but the results get hard-wired into the
DNA.
– Its strictly Darwinian even though it achieves
most of what Lamark wanted.
A toy example to explain the idea
• Consider an organism that has a mating circuit
containing 20 binary switches. If exactly the right
subset of the switches are closed, it mates very
successfully. Otherwise not.
– Suppose each switch is governed by a separate
gene that has two alleles.
– The search landscape for unguided evolution is a
one-in-a-million spike.
• Blind evolution has to build about a million organisms
to get one good one.
– Even if it finds a good one, that combination of
genes will be almost certainly be destroyed in the
next generation by crossover.
Guiding evolution with a fast adaptive process
(godless intelligent design :-)
• Suppose that each gene has three alleles: ON,
OFF, and “leave it to learning”.
– ON and OFF are decisions hard-wired into the
DNA
– “leave it to learning” means that on each learning
trial, the switch is set randomly.
• Now consider organisms that have 10 switches
hard-wired and 10 left to learning.
– One in a thousand will have the correct hardwired decisions, and with only about a thousand
learning trials, all 20 switches will be correct.
The search tree
Evolution can ask learning:
“Am I correct so far?”
Evolution: 1000 nodes
Learning: 999,000 nodes
99.9% of the work required to find a
good combination is done by learning.
A learning trial is MUCH cheaper than
building a new organism.
The results of a simulation
(Hinton and Nowlan 1987)
• After building about 30,000 organisms, each of
which runs 1000 learning trials, the population
has nearly all of the correct decisions hard-wired
into the DNA.
– The pressure towards hard-wiring comes from
the fact that with more of the correct decisions
hard-wired, an organism learns the remaining
correct decisions faster.
• This suggests that learning performed almost all
of the search required to create brain structures
that are currently hard-wired.
Using the dynamics of neural activity to
speed up learning
• A Boltzmann machine has an inner-loop iterative search
to find a locally optimal interpretation of the current
visible vector.
– Then it updates the weights to lower the energy of the
locally optimal interpretation.
• An autoencoder can be made to use the same trick: It
can do an inner loop search for a code vector that is
better at reconstructing the input than the code vector
produced by its feedforward encoder.
– This speeds the learning if we measure the learning
time in number of input vectors presented to the
autoencoder (Ranzato, PhD thesis, 2009).
Major Stages of Biological Adaptation
• Evolution keeps inventing faster inner loops to make
the search easier for slower outer loops:
– Pure evolution: each iteration takes a lifetime.
– Development: each iteration of gene expression
takes about 20 minutes. The developmental
process may be optimizing objective functions
specified by evolution (see next slide)
– Learning: each iteration takes about a second.
– Inference: In one second, a neural network can
perform many iterations to find a good
explanation of the sensory input.
The three-eyed frog
• The two retinas of a frog connect to its tectum in a way
that tries to satisfy two conflicting goals:
– 1. Each point on the tectum should receive inputs
from corresponding points on the two retinas.
– 2. Nearby points on one retina should go to nearby
points on the tectum.
• A good compromise is to have interleaved stripes on the
tectum.
– Within each stripe all cells receive inputs from the
same retina.
– Neighboring stripes come from corresponding places
on the two retinas.
What happens if you give a frog embryo
three eyes?
• The tectum develops interleaved stripes of the
form: LMRLMRLMR…
– This suggests that in the normal frog, the
interleaved stripes are not hard-wired.
– They are the result of running an optimization
process during development (or learning).
• The advantage of this is that it generalizes much
better to unforeseen circumstances.
– It may also be easier for the genes to specify
goals than the details of how to achieve them.
The next great leap?
• Suppose that we let each biological learning trial
consist of specifying a new objective function.
• Then we use computer simulation to evaluate the
objective function in about one second.
– This creates a new inner loop that is millions of
times faster than a biological learning trial.
• Maybe we are on the brink of a major new stage
in the evolution of biological adaptation methods.
We are in the process of adding a new inner loop:
– Evolution, development, learning, simulation
THE END