Psych209-2017-01-17-GoodnessProbability

Download Report

Transcript Psych209-2017-01-17-GoodnessProbability

Network States as Perceptual
Inferences
PDP Class
Winter, 2017
January 17, 2017
Overview
• Network state as a perceptual inference
• Goodness of a network state
• How networks maximize goodness
– The Hopfield network and Rumelhart’s continuous version
– The Boltzmann Machine, and the relationship between goodness
and probability
– Sampling from the probability distribution over states
• The IA model:
– Evidence and issues
– The IA model
– Problem and solution – the MIA model
• Mutual constraint satisfaction in the brain
Network Goodness and How to
Increase it
The Hopfield Network
•
•
•
•
•
•
•
Assume symmetric weights.
Units have binary states [+1,-1]
Units set into initial states
Choose a unit to update at random
If net > 0, then set state to 1.
Else set state to -1.
Goodness always increases… until it
stops changing.
Rumelhart’s Continuous Version
Unit states have values between 0 and 1.
Units are updated asynchronously. Update is gradual,
according to the rule:
The Cube Network
Positive weights have value +1
Negative weights have value -1
‘External input’ is implemented as a positive bias of .5 to all units.
Goodness Landscape of Cube
Network
The Boltzmann Machine:
The Stochastic Hopfield Network
Units have binary states [0,1], Update is asynchronous.
The activation function is:
Assuming processing is ergodic: that is, it is possible to get from any state to any
other state, then when the state of the network reaches equilibrium,
the relative probability and relative goodness of two states are related as follows:
or
More generally, at equilibrium we have the Probability-Goodness Equation:
Simulated Annealing
• Start with high temperature. This means it
is easy to jump from state to state.
• Gradually reduce temperature.
• In the limit of infinitely slow annealing, we
can guarantee that the network will be in
the best possible state (or in one of them,
if two or more are equally good).
• Thus, the best possible interpretation can
always be found (if you are patient)!
Exploring Probability Distributions
over States
• Imagine settling at a fixed non-zero temperature, such as
T = 1.
• At this temperature, there’s still some probability of being
in or switching to a state that is less good than one of the
optimal states.
• Consider an ensemble of networks.
– At equilibrium (i.e. after enough cycles, possibly with annealing)
the relative frequencies of being in the different states will
approximate the relative probabilities given by the ProbabilityGoodness equation.
• You will have an opportunity to explore this situation in
the homework assignment.
Findings Motivating the IA Model
– Subjects identify letters in
words better than single letters
or letters in scrambled strings.
• The pseudoword advantage
– The advantage over single
letters and scrambled strings
extends to pronounceable nonwords (e.g. LEAT LOAT…)
• The contextual
enhancement effect
– Increasing the duration of the
context or of the target letter
facilitates correct identification.
• Reicher’s experiment:
– Used pairs of 4-letter words
differing by one letter
READ ROAD
– The ‘critical letter’ is the letter
that differs.
– Critical letters occur in all four
positions.
– Same critical letters occur alone
or in scrambled strings
_E__ _O__ EADR EODR
Percent Correct
• The word superiority effect
(Reicher, 1969)
W
PW
Scr
L
_E__
O
READ
Percent Correct
The Contextual Enhancement Effect
Ratio
Questions
•
Can we explain the Word Superiority Effect and the Contextual
Enhancement Effect as a consequence of a synergistic
combination of ‘top-down’ and ‘bottom-up’ influences?
•
Can the same processes also explain the Pseudoword advantage?
•
What specific assumptions are necessary to capture the data?
•
What can we learn about these assumptions from the study of
model variants and effects of parameter changes?
•
Can we derive novel predictions?
•
What do we learn about the limitations as well as the strengths of
the model?
Approach
• Draw on ideas from the way neurons work
• Keep it simple
The Interactive Activation Model
•
•
•
•
•
Feature, letter and word units.
Activation is the system’s only
‘currency’
Mutually consistent items on
adjacent levels excite each other
Mutually exclusive alternatives
inhibit each other.
Response selected from the letter
units in the cued location
according to the Luce choice rule:
where
IAC Activation
Function
Output from
unit j
wij
max
Unit i
a
Calculate net input to each unit:
neti = Sjoj wij
0
rest
min
Set outputs: oj = [aj]+
Interactive
Activation
How the Model
Works:
Words vs. Single
Letters
Word and Letter Level Activations for
Words and Pseudowords
Idea of ‘conspiracy effect’ rather than consistency with rules as
a basis of performance on ‘regular’ items.
The problem with
the 1981 Model
The model did not show the
empirically-observed pattern of
‘logistic additivity’ when context and
stimulus information were separately
manipulated.
Idealization
of empirical
pattern
Simulation
Massaro & Cohen (1991) presented
different /l/ t0 /r/-like segments in
four contexts:
“p_ee”,”t_ee”,”s_ee”, “v_ee”
The Multinomial IA Model
•
•
Very similar to Rumelhart’s 1977 forumulation
Based on a simple generative model of displays in letter
perception experiments.
–
–
–
–
–
•
•
•
Experimenter selects a word,
Selects letters based on word, but with possible random
errors
Selects features based on letters, again with possible
random error AND/OR
Visual system registers features with some possibility of
error
Some features may missing as in the WOR? example
above
Units without parents have biases equal to log of prior
Weights defined ‘top down’: correspond to log of p(C|P)
where C = child, P = parent
Units within a layer take on probabilistic activations based
on softmax function (at right)
–
only one unit allowed to be active within each set of
mutually exclusive hypotheses with probability ri
•
A state corresponds to one active word unit and one active
letter unit in each position, together with the provided set
of feature activations.
•
If the priors and weights correspond to those underlying
the generative model, than states are ‘sampled’ in
proportion to their posterior probability
–
–
State of entire system = sample from joint posterior
State of word or letter units in a given position = sample
from marginal posterior
Input and activation of units in PDP
models
•
General form of unit update:
max=1
neti   wija j  biasi  inputi  noise
j
if neti  0 :
ai  neti (1  ai )  d (ai  rest )
else
ai  neti (ai  min)  d (ai  rest )
•
Input from
unit j
Simple version used in cube
simulation:
a
0
wij
rest
min=-.2
neti   wija j  biasi  inputi
j
if neti  0 :
ai  neti (1  ai )
unit i
•
An activation function that links PDP
models to Bayesian ideas:
ai 
•
eneti
eneti  1
ai or pi
else
ai  neti (ai )
Or set activation to 1 probabilistically:
eneti
pi  neti
e 1
neti
Interactivity in the Brain
• Bidirectional Connectivity
Maunsell & van Essen
• Interactions between V5 (MT) and V1/V2:
Hupe el al
• Subjective Contours in V1:
Lee and Nguyen
• Binocular Rivalry
Leopold and Logothetis
Hupe, James, Payne, Lomber, Girard & Bullier (Nature,
1998, 394, 784-787)
•
•
•
•
Investigated effects of cooling V5
(MT) on neuronal responses in
V1, V2, and V3 to a bar on a
background grid of lower contrast.
MT cooling typically produces a
reversible reduction in firing rate to
V1/V2/V3 cells’ optimal stimulus
(figure)
Top down effect is greatest for
stimuli of low contrast. If the
stimulus is easy to see when it is
not moving, top-down influences
from MT have little effect.
Concept of ‘inverse effectiveness’
arises here and in many other
related cases.
*
Lee & Nguyen (PNAS,
2001, 98, 1907-1911)
• They asked the question:
Do V1 neurons participate in the
formation of a representation of the
illusory contour seen in the upper
panel (but not in the lower panel)?
• They recorded from neurons in V1
tuned to the illusory line segment, and
varied the position of the illusory
segment with respect to the most
responsive position of the neuron.
Response to the illusory contour is found at
precisely the expected location.
Temporal Response to Real and Illusory
Contours
Neuron’s receptive field falls right
over the middle of the real or illusory
line defining the bottom edge of the square
Figure shows a V1/V2 neuron
that showed strong modulation
in firing around epochs in
which the monkey perceives
the cell’s preferred stimulus.
From Leopold and Logothetis,
1996.
Top: psth’s show strong
orientation preference.
Bottom: When both stimuli are
presented simultaneously,
neuron is silent just before a
response indicating perception
of the null direction, but quite
active just before a response
(t < 0) indicating perception of
the preferred direction.
Leopold and Logothetis
(Nature, 1996, 379, 549553) found that some
neurons in V1/V2 as well
as V4 modulate their
responses in concert with
Monkey’s percept, as if
participating in a
massively distributed
constraint-satisfaction
process. However, some
neurons in all areas do not
modulate their responses.
Thus the conscious percept
appears to be correlated
with the activity of only a
subset of neurons. The
fraction of neurons that
covary with perception is
greater in higher areas.