Down - 서울대 : Biointelligence lab
Download
Report
Transcript Down - 서울대 : Biointelligence lab
5. Representations and the
neural code
Fundamentals of Computational Neuroscience, T. P. Trappenberg, 2002.
Lecture Notes on Brain and Computation
Byoung-Tak Zhang
Biointelligence Laboratory
School of Computer Science and Engineering
Graduate Programs in Cognitive Science, Brain Science and Bioinformatics
Brain-Mind-Behavior Concentration Program
Seoul National University
E-mail: [email protected]
This material is available online at http://bi.snu.ac.kr/
1
Outline
5.1
5.2
5.3
5.4
5.5
How neurons talk
Information theory
Information in spike train
Population coding and decoding
Distributed representation
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
2
5.1 How neurons talk
5.1.1 Cortical chatter
Fig. 5.1 Schematic view of an extracellular recording of
neuronal activity using a microelectrode
The signal can be amplified and monitored with an
oscilloscope and loudspeaker
Neuroscientists search for functional correlates in the firing
patterns of neurons
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
3
5.1.2 Morse code and neural code
Neuronal spiking patterns are different from Morse code in
many respects.
Length of a neuronal spike is approximately constant
Neuronal spike patterns do not simply represent an alphabet
Deciphering the neural code
Table 5.1 International Morse code
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
4
5.1.3 The firing rate hypothesis
The firing rate of sensory neurons can often be correlated to
the sensory awareness of a stimulus
Fig. 5.2 (A) Data from Adrian’s original work showing the firing rate of a
frog’s stretch receptor on a muscle as a function of the weight load applied
to the muscle. (B) Response (tuning curve) of a neuron in the primary visual
cortex of the cat as a function of the orientation of a light stimulus in form
of a moving bar.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
5
5.1.4 Correlation code
The firing rate does not show the relevant information is
shown in Fig. 5.3
Co-occurrence of the spikes of the two neurons
Fig. 5.3 An example of the response of
some neurons in the primary auditory
cortex that do not show significant
variations in response to the onset of a 4
kHz tone with the amplitude envelope
shown in (A).
(B) Average firing rates in 5 ms bins of
two different neurons.
(C) Spike-triggered average rate that
indicates some correlation between the
firing of the two neurons that is
significantly correlated to the presentation
of the stimulus.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
6
5.1.5 Integrator and coincidence detector
Perfect integrator
A leaky integrator with a small time constant can be used as a
coincidence detector
Fig. 5.4 Schematic illustration of (A) a perfect integrator and (B) a leaky
integrator that can be utilized as coincidence detector. In this example the
membrane potential u(t) integrates short current pulses of the two spike
trains shown at the bottom.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
7
5.1.6 How accurate is the spike timing?
The transmission of spikes is not very reliable
Fig. 5.5 Spike trains (top) and average response over trials (middle) of an
MT neuron to a visual stimulus with either constant velocity (A) or
altering velocity (B) as indicated in the bottom graph.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
8
5.2 Information Theory
5.2.1 Communication channel
Quantify the amount of information
Claude Shanon
Fig. 5.6 The communication channel as studied by Shannon. A message x is
converted into a signal s = f(x) by a transmitter that sends this signal
subsequently to the receiver. The receiver generally receives a distorted signal r
consisting of the set signal s convoluted by noise η. The received signal is then
converted into a message y = g(r).
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
9
5.2.2 Information gain
The amount of information
The probability of independent messages is the product of the
probabilities of the individual messages, f ( xy) f ( x) f ( y) (5.1)
The information gain when receiving a message yi , I ( yi ) log 2 ( pi )
pi P( yi )
(5.4)
I
(
x
)
log
P
(
x
)
The information gain
2
(5.2)
I ( x ) P
x
P prior ( x)
( x) log 2 posterior
P
( x) (5.5)
posterior
Ex)
I log 2 (0.5) 1
(5.3)
I log 2 (1/ 3) 1.585
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
I log 2 (1/ 4) 2
10
5.2.3 Entropy, the average information gain
Entropy, the average amount of information in the message
set of the information source is
S ( X ) p log ( p ) (5.6)
i
2
i
i
The entropy of a message set with N possible messages that
are all equally likely is
N
S ( X )
i 1
1
1
log 2 ( ) log 2 ( N ) (5.7)
N
N
The entropy for continuous distribution
S ( X ) dxp( x) log 2 p( x) (5.8)
X
Ex) The entropy of a Gaussian-distributed set
( x )2
1
2
p( x)
e 2 (5.9)
2
1
S log 2 (2e 2 ) (5.10)
2
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
11
5.2.4 Mutual information
Noise condition
The received signal y from the sending signal x
The average information
y
dyP( y ) S ( X | y )
p( x | y )
p( x)
(5.11)
Mutual information or the cross-entropy
I mutual
X Y
X Y
dxdyp( y ) p( x | y ) log 2
dxdyp( x, y ) log 2
p ( x, y )
p( x) p ( y )
(5.13)
The mutual information as difference between the sum of the
individual entropies S(X) and S(Y) and the joint distributions S(X,Y)
I mutual ( X , Y ) S ( X ) S (Y ) S ( X , Y )
(5.14)
S ( X , Y ) S ( X ) S (Y )
I mutual (all y Y independen t of all x X ) 0
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
(5.15)
12
5.2.5 Channel capacity
An amount of information that can be transmitted by a change
in the signal is proportional to the number of states the signal
can have.
Fig. 5.8 Example of hand signal using
Ex)
the thumb in three different states to
transmit messages. (0°, 45°, 90°)
The amount of information that can be transmitted through
Gaussian channel is called the channel capacity
The information we can transmit through a noisy channel is
therefore always less than the channel capacity, I 1 log 2 (1 x 2 )
2
2
eff
2
The variance of the input signal, x
(5.16)
The effective variance of channel noise, eff2
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
13
5.3 Information in spike trains
5.3.1 Entropy of a spike train with temporal coding (1)
Fig. 5.9 Representation of spike train as binary string with time resolution Δt.
The firing rate r, the spike train of length T
The entropy of the pike train
S
T
[rt ln( rt ) (1 rt ) ln( 1 rt )] (5.17)
t ln 2
e
For time bins much smaller than the inverse firing rate, S Tr log 2 ( rt )
(5.18)
The average entropy per spike
S
e
log 2 (
) , N Tr (5.19)
N
rt
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
14
5.3.1 Entropy of a spike train with
temporal coding (2)
Fig. 5.10 (A) Maximum entropy per spike for spike trains with different firing rates as a
function of the size of the time bins, which sets the resolution of the firing times.
(B) Maximum entropy per spike for spike trains from which the information is transmitted
with a rate code. The two different curves correspond to two different spike statistics of the
spike train, a Poisson and an exponential probability of spike counts. Spike trains with
exponential spike distributions can convey the maximum information with a rate code for
fixed variance.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
15
5.3.2 Entropy of a rate code
The entropy of observing N spikes in a time interval T can be calculated by
applying the definition of the entropy,
S ( N ; T ) P(n) log 2 P(n)
n
1
P(n) ln P(n) (5.20)
ln 2 n
Ex) Poisson spike train
1
N ne N
S ( N ;T )
P(n)( n ln N N ln( n!))
P ( n)
(5.21)
(5.22)
ln 2 n
n!
Stirling’s formula
1
1
ln( n!) (n ) ln n n ln( 2 ) (5.23)
2
2
P( x) x
The Expectation value E(x) of a quantity x, E ( x)
(5.24)
x
The expectation value of n is just E(n) = N, S ( N ; T ) 1 log 2 N 1 log 2 2
2
A exponential distributed spike train , P(n) en
2
(5.26)
S ( N ; T ) log 2 (1 N ) N log 2 (1
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
(5.25)
1
) (5.27)
N 16
5.3.3 Difficulties in measuring entropy
Measuring entropies is a data hungry task
To estimate probability distribution
But, we do not know a priori the form of the distribution
function
Information depends on the logarithm of probabilities
Events with small probabilities have a large factor in the
entropy
Reliable measurements of rare events do demand a large amount
of representative data
Normalized procedure dxP( x) 1 (5.28)
Overestimate the information content of the events measured
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
17
5.3.4 Entropy measured from single neuron
Simpler to measure only one neuron and the probability
distribution of the response of this neuron to sensory stimuli s
I S ( s) dyP( y | s) log 2
Y
P( y | s )
P( y ) (5.29)
Fig. 5.11 Stimulus-dependent
surprise for two cells in the inferior
temporal cortex that respond to
faces. The curve is 5 times the
universal expectation for small
time windows.
For very small time intervals, so that at most one spike can
occur during this time, there is not much freedom for a coding
scheme so that we can expect a universal curve,
rs r rs
I (t 1) r log 2
(5.30)
r
ln 2
S
s
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
18
5.3.5 Spike-dependent network processing
in the brain
Spike patterns are utilized for information processing in the
brain.
Rate code
This dos not means that the timing of a spike does not matter
The emission of spikes can stimulate waves of neural activity
Oscillation
Synchronous presynaptic events
The timing of pre- and postsynaptic spikes is essential for
some forms of synaptic plasticity
Spiking timing dependent plasticity
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
19
5.3.6 Rapid data transmission in the brain
How quickly information can be transmitted through the brain
Fig 5.12 (A) Recognition performance of 15 subjects who had to identify the
presence of an animal in a visual scene (presented for only 20 ms) versus their
mean reaction time. (B) Event-related potential averaged over frontal electrodes
across the 15 subjects.
Noisy populations of neurons can respond very quickly to
changes in the input currents
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
20
5.4 Population coding and decoding
5.4.1 Encoding
We expect that information about the external world and the
processing of information is distributed inn the brain
Encoding, the specific way in which a stimulus is coded in a
neuron or a population of neurons
Express the encoding of a stimulus pattern in terms of the
response probability of neurons in a population with a
conditional probability distribution of neural responses,
P(r | s) P(r1S , r2S , r3S ,... | s ) (5.31)
Certain stimulus, s
The stimulus-specific response of neuron i in the
population rsi
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
21
5.4.2 Bayesian decoding
The probability that a stimulus was present, given a certain
response pattern of the neurons in the population
P ( s | r ) P ( s | r1S , r2S , r3S ,...) (5.32)
Choose most likely stimulus as answer to the decoding problem,
s arg max P( s | r )
s
P( s | r )
(5.33)
P(r | s) P( s)
(5.34)
P(r )
A maximum likelihood estimate as
sML arg max P(r | s) (5.35)
s
1
E (( sML s) 2 )
(5.36)
IF
Fisher information
d2
I F p(r | s) 2 ln p(r | s) (5.37)
dx
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
22
5.4.3 Decoding with response tuning
curves
If we do not know the conditional probability distribution
The best we can then do is to make appropriate assumptions and constrain
them with experimental data
Tuning curves (often well approximated with Gaussian function)
The response fluctuation of the neurons around this average response
profile are independent
P(r | s) P(ri | s)
i
(5.38)
To assume Gaussian distributions
1
( ri f i ( s )) 2 / 2 i2
P(ri | s)
e
(5.39)
2 i
The maximum likelihood estimator
s min
(
i
ri f ( s )
i
2
)
(5.40)
Fig. 5. 13 Gaussian tuning curves representing the firing
rate of a neuron as a function of a stimulus feature. (A) A
single neuron cannot unambiguously decode the stimulus
feature from the firing rate. (B) A second neuron with
shifted tuning curve can resolve the ambiguity.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
23
5.4.4 Population vector decoding (1)
Population vector decoding
Fig. 5.14 (A) Gaussian
tuning curves representing
the firing rate of a neuron
as a function of a stimulus
feature. (B) Example of
the firing rates of the 8
neurons in responses to a
stimulus with direction
130 deg. (C) Decoding
error when the stimulus is
estimated with a
population code.
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
24
5.4.4 Population vector decoding (2)
The neurons have Gaussian tuning curves,
ri f (s) e
(5.42)
To normalize the firing rate to relative values,
min
ri ri
ri max
ri
s pop
(5.43)
ri
pref
si
(5.44)
j rj
Normalized population vector
i
(5.41)
To decode the stimulus vale for the firing pattern of the population
we multiply the firing rate of each neuron by its preferred direction
and sum the contributions of all the neurons,
sdir ri sipref
i
2
( s sipref ) 2 / 2 RF
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
25
5.4.5 Decoding in the brain
Many other estimation techniques are known from statistics
Different estimation techniques often have different
characteristics.
To ask which decoding schemes are used in the brain
An unsolved question
How sophisticated estimation methods with performances
similar to those of maximum likelihood estimations could be
implemented in the brain
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
26
5.5. Distributed representation
5.5.1 Distributed versus localized coding
1.
Distinguish roughly three classes of representations in the
brain
Local representation
Only one node represent a stimulus in that only one node is
active when a particular stimulus is presented
2.
Fully distributed representation
A stimulus is encoded by the combination of the activities of
all the component in the vector representing a stimulus
3.
Sparsely distributed representation
Only a fraction of the components (nodes) of a vector is
involved in representing a certain stimuli
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
27
5.5.2 Sparseness
How may neurons are involved in the neural processing of individual
stimuli
The sparseness of a representation
Neurons that either fire (r=1) or not (r=0)
The average number of neurons that are firing
a
1
1
ri s (5.45)
S s N i
The sparseness of the representation with binary nodes
a ri s i ,s
Average over a set of stimuli,
a
(5.46)
ri s i2, s
ri s 2 i , s
(5.47)
The firing rate of N neurons in response to a set of S stimuli and estimate
the sparseness as
1
1
ri s ) 2
S s N i
a
(5.48)
1
1
s 2
(ri )
N i Biointelligence
S s CSE
(C) 2012 SNU
Lab, http://bi.snu.ac.kr
(
28
5.5.3 What is a feature?
The response of a population of nodes to a stimulus
An alternative definition of a feature would be to view the
population representation of a stimulus as a feature decomposition
of the stimulus
The population vector a feature vector
Even very specific feature is distributed among neurons, ad
representations of natural stimuli with many features are likely to
be based on a distributed code in the brain
5.5.4 Population coding
Information is represented and processed in a distributed
fashion in the brain
Estimates of how coarse the typical distributed representation
in the brain
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
29
5.5.5 Using decoding schemes
To study the nature of representations in the brain we want to
estimate mutual information between populations of neurons
or between a set of external stimuli and a population of
neurons
The mutual information
I mutual dsdyP ( s, y ) log 2
S Y
P ( s, y )
(5.49)
P( s ) P(y )
The average information
I
mutual
P ( s, s )
dsds P( s, s ) log 2
(5.50)
P
(
s
)
P
(
s
)
S S
is equivalent to the mutual information
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
30
5.5.6 Population information in the
inferior-temporal cortex
Fig. 5.15 (A) Estimate of mutual information between face stimuli and firing rate responses of C cells in the
inferior-temporal cortex. The set of stimuli consisted 20 faces (stars). 8 faces (crosses), and 4 face(squares).
(B) the information in the population of cells relative to the umber of stimuli in the stimulus set. The solid
lines are fits of the cell data to the ceiling model. The dashed line illustrates the values of the ceiling model
for a stimulus et of 10000 items and y=0.01 (C) Estimate of mutual information with 20 faces when the
neuronal responses is derived from the spike count in 500 ms (start) and 50 ms (crosses).
A ceiling effect I (C ) S max (1 (1 y )C )
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
(5.51)
31
5.5.7 Sparseness of object representations
in the inferior-temporal cortex
Fig. 5.16 (A) Average firing rate of a neuron in the IT cortex in response to each
individual stimulus in a set of 70 stimuli (faces and objects)/ The responses are sorted
in descending order. The horizontal dashed line indicates the spontaneous firing rate of
this neuron. (B) The sparseness derived from five IT neurons calculated from eqn 5.47
in which responses lower then the firing rate threshold above the spontaneous firing
rate are ignored. The solid line corresponds to the data shown in (A).
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
32
Conclusion
How information may be coded within the brain
Encoding
Decoding
An elaborate code based on spike timings
Firing of neurons with specific patterns
Carries a lot of information
Information is represented and processed in the brain in a
distributed fashion
Population coding
Sparseness
(C) 2012 SNU CSE Biointelligence Lab, http://bi.snu.ac.kr
33