The Boltzmann Machine - Carnegie Mellon University
Download
Report
Transcript The Boltzmann Machine - Carnegie Mellon University
The Boltzmann Machine
Psych 419/719
March 1, 2001
Recall Constraint Satisfaction..
• We have a network of units and
connections…
• Finding an optimal state involves
relaxation: letting the network settle into a
configuration that maximizes a goodness
function
• This is done by annealing
Simulated Annealing
• Update unit states according to a probability
distribution, which is based on:
– The input to the unit. Higher input = greater
odds of being on
– The temperature. High temperature = more
random. Low temperature = deterministic
function of input
• Start with high temperature, and gradually
reduce it
Constraint Satisfaction Networks
Have Nice Properties
• Can settle into stable configurations based
on partial or noisy information
• Can do pattern completion
• Have well formed attractors corresponding
to stable states
• BUT: How can we make a network learn?
What about Backprop?
• Two problems:
– Tends to split the probability distributions
– If input is ambiguous (say, the word LEAD),
output reflects that distribution. Not like the
necker cube
– Also: not very biologically plausible.
– Error gradients travel backwards along
connections. Neurons don’t seem to do this.
We Need Hidden Units
• Hidden units are
needed to solve xorstyle problems
• In these networks, we
have a set of
symmetric connections
between units.
• Some units are visible
and others are hidden
The Boltzmann Machine:
Memorizing Patterns
• Here, we want to train the network on a set
of patterns.
• We want the network to learn about the
statistics and relationships between the parts
of the patterns.
• Not really performing an explicit mapping
(like backprop is good for)
How it Works
•
•
•
•
Step 1. Pick an example
Step 2. Run network in positive phase
Step 3. Run network in negative phase
Step 4. Compare the statistics of the two
phases
• Step 5. Update the weights based on
statistics
• Step 6. Go to step 1 and repeat.
Step 1: Pick Example
• Pretty simple. Just select an example at
random.
Step 2. The Positive Phase
• Clamp our visible units with the pattern
specified by our current example
• Let network settle using the simulated
annealing method
• Record the outputs of the units
• Start again with our example, settling again
and recording units again.
Step 3. The Negative Phase
• Here, we don’t clamp the network units. We
just let it settle to some state as before.
• Do this several times, again recording the
unit outputs.
Step 4. Compare Statistics
• For each pair of units, we compute the odds
that both units are coactive (both on) for the
positive phase. Do it also for the negative
phase.
• If we have n units, this gives us two n x n
matrices of probabilities
• pi,j is probability that both unit i and j are
both on.
Step 5: Update Weights
i, j
i, j
wi , j k ( p p )
• Change each weight according to the
difference of the probabilities for the
positive and negative phase
• Here, k is like a learning rate
Why it Works
• This reduces the difference between what
the network settles to when the inputs are
clamped, and what it settles to when its
allowed to free-run.
• So, the weights learn about what kinds of
visible units go together.
• Recruits hidden units to help learn higher
order relationships
Can Be Used For Mappings Too
• Here, the positive phase involves clamping
both the input and output units and letting
the network settle.
• The negative phase involves clamping just
the input units
• Network learns that given the input, it
should settle to a state where the output
units are what they should be
Contrastive Hebbian Learning
• Very similar to a normal Boltzmann
machine, except we can have units whose
outputs are a deterministic function of their
input (like the logistic).
• As before, we have two phases: positive and
negative.
Contrastive Hebbian
Learning Rule
i
j
i
j
wi , j k (a a a a )
• Weight updates based on actual unit
outputs, not probabilities that they’re both
on.
Problems
• Weight explosion. If weights get too big too
early, network will get stuck in one
goodness optimum.
– Can be alleviated with weight decay
• Settling time. Time to process an example is
long, due to settling process.
• Learning time. Takes a lot of presentations
to learn.
• Symmetric weights? Phases?
Sleep?
• It has been suggested that something like
the minus phase might be happening during
sleep:
• Spontaneous correlations between hidden
units (not those driven by external input) get
subtracted off. Will vanish, unless driven by
external input while awake.
• Not a lot of evidence to support this
conjecture.
• We can learn while awake!
For Next Time
• Optional reading handed out.
• Ends section on learning internal
representations. Next: biologically plausible
learning.
• Remember:
– No class next Thursday
– Homework 3 due March 13
– Project proposal due March 15. See web page.