Information Theory and Neural Coding

Download Report

Transcript Information Theory and Neural Coding

Information Theory in
Neuroscience
Noise, probability and
information theory
MSc Neuroscience
Prof. Jan Schnupp
[email protected]
Neural Responses are Noisy
0
80
160
240
320
400
480
0
80
160
240
320
400
480
560
560
640
640
720 800
msec
720 800
msec
Recordings in cat A1 to recordings of sheep and frog sounds.
Seventeen identical repetitions of a stimulus do not produce 17 times the same
spike pattern.
How much information does an individual response convey about the
stimulus?
cats\9920\zoo50.src
Joint and Marginal Probabilities
Neuron
Responds
Neuron
does not
Respond
(marginal
p(s) )
Stimulus
On
Stimulus
Off
(marginal
p(r) )
0.35
0.05
0.4
0.15
0.45
0.6
0.5
0.5
A plausible hypothetical example
Joint Probabilities and
Independence
Let s be stimulus present, r be neuron responds.
p(s,r)=p(r,s) is the probability that stimulus is present
and that neuron responds. (joint probability)
p(s|r) is the probability that the neuron responds given
that a stimulus was present (conditional probability)
Note: p(s|r) =p(s,r)/p(r)
If r and s are independent, then p(s,r)=p(s) • p(r)
Therefore, if r,s independent, then p(s|r)=p(s), so
knowing that the neuron responded does not change
my view on how likely it is that there was a stimulus,
i.e. the response does not carry information about the
stimulus.
What is Information?
If I tell you something you already know I don’t give
you any (new) information.
If I tell you something that you could have easily
guessed I give you only little information.
The less likely a message, the more “surprising” it is:
Surprise=1/p.
The information content of a message is
proportional to the order of magnitude of the
message’s “surprise”: I=log2(1/p) = -log2 (p)
Examples:
“A is the first letter of the alphabet”: p=1, I=-log2(1)=0
“I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1
“His phone number is 928 399”: p=1/107 I=log2(107)=23.25
“Entropy” S(s) or H(s)
S ( s)   p( s)  log 2 ( p( s))
s
Measures “uncertainty” about a message s.
Equal to the “average” information content of messages
from a particular source.
Note that, to estimate entropy, the statistical properties of
the source must be known, i.e. one must know what values
s can take and how likely (p(s)) they are.
Entropy of flipping a fair coin:
S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1
Convention: 0 • log(0) = 0;
Entropy of flipping a trick coin with “heads” on both sides:
S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0
Entropy of rolling a die:
S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585
If two random processes are statistically
independent, their entropies add
Outcome of
2 coin flips
HH
HT
TH
TT
Probability
1/4
1/4
1/4
1/4
S ( s)   p( s)  log 2 ( p( s))
s
In this example:
S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 =
S(coin1)+S(coin2)
If two processes are not independent, their joint
entropy is less than the sum of the individual
entropies
S ( s, r )  S ( s )  S ( r )
Outcome of
2 coin flips
HH
HT
TH
TT
Probability
1/2
0
0
1/2
In this example, the two coins are linked so that their outcome
is 100% correlated.
S(s)=S(r)=1 => S(s)+S(r) = 2
S(s,r)= -2 • 1/2 • log2(1/2) = 1
“Mutual Information” I(r,s)
I (r , s )  S (r )  S ( s)  S (r , s )
p(r , s)
I (r , s )   p (r , s )  log 2 (
)
p(r )  p( s)
r ,s
Also sometimes called the “transmitted information” T(r;s).
Equal to the difference between joint-entropy and sum of individual
entropies.
Measures how much uncertainty about one random variable is reduced if
the value of another random variable is known.
Traffic Light Example
Swiss Drivers
Relative freq
(estimated prob)
Stop
Go
Red
Green
1/2
0
0
1/2
p(r , s)
I (r , s )   p (r , s )  log 2 (
)
p(r )  p( s)
r ,s
Here:
I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½))
+ 0 = log2(2) = 1
Traffic Light Example
Egyptian Drivers
Relative freq
Red
Green
Stop
0.2
0.05
Go
0.3
0.45
p(r , s)
I (r , s )   p (r , s )  log 2 (
)
p(r )  p( s)
r ,s
Note: In this case p(Stop)=0.25, hence S(Go) = 0.8133 < 1
Here:
I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 /
(0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 /
(0.75• 0.5)) = 0.3545
Hypothetical Example
4
response
3
2
1
0
0
0.5
1
1.5
stimulus intensity
2
2.5
Non-monotonic (quadratic)
relationship between
stimulus and response.
No (linear or first order)
correlation between stimulus
and response.
Nevertheless, the response
is informative about the
stimulus. E.g. large
response implies mid-level
stimulus.
Correlation is zero, but
mutual information is large.
90
90
60
elevation
Elev [deg]
Estimating Information in Spike
Counts. Example:
30
0
45
0
-30
-45
-60
-90
-180
-135
-90
-45
0
45
90
135
180
Azim [deg]
-135
-90
-45
0
45
90
135
azimuth
24 “Sectors”, p(s)=1/24, S(s) = 4.585
Data from Mrsic-Flogel et al. Nature Neurosci (2003)
Spatial receptive fields of A1 neurons were mapped out using “virtual
acoustic space stimuli”. Left panel: the diameter of the dots is
proportional to the spike count.
Space was carved up into 24 “sectors” (right panel). The question is:
what is the mutual information between spike count and sector of
space?
Estimating Information in Spike Counts
- continued.
p(sector, count)
p(r , s)
I (r , s )   p (r , s )  log 2 (
)
p(r )  p( s)
r ,s
sector
We use the relative frequencies
(how often did we observe 0, 1,
2, … spikes if the stimulus was
in quadrant 1,2,3,…) as
estimates for p(r,s).
p(s) is fixed by the
experimenter and p(r) is
estimated from the pooled
responses.
These values are then plugged
into the formula above
I(s,r)=0.7019
Difficulties with Estimating Mutual
Information: Bias!
p(sector, count)
To calculate transmitted
information, we use observed
However, to estimate probabilities
(particularly of rare events)
accurately, one needs a lot of data.
Inaccuracies in the estimates of
p(s,r) tend to lead to
overestimates of the information
content.
Example: here on the right,
responses were randomly reassigned to stimulus classes. The
randomisation should have led to
statistical independence and hence
zero information. Nevertheless, a
value of 0.1281 bits was obtained.
sector
frequencies as estimates for
true underlying probabilities.
I(s,r)=0.1281
Estimating Information in Spike Patterns: The
Eskandar Richmond & Optican (1992) Experiment
Monkeys were trained to perform delayed non-match to target
tasks with a set of Walsh patterns
Neural responses in area TE of infra - temporal cortex were
recorded while the monkeys performed the task
IT responses
Example of responses recorded by Eskander et al.
Different Walsh patterns produced different response patterns as
well as different spike counts.
Principal Component Analysis of Response Patterns
Principle Components
IT Neuron Response Patterns
PCA coefficients
PCA makes it possible to summarize
complex response shapes with relatively
few numbers (The “coefficients” of the
first few principle components)
Eskandar et al. Results
Spike count plus the first 3
PCA coefficients (T3, gray
bars) transmit 30% more
information about stimulus
identity (“Pattern”) than
Spike count alone (TS,
white bars).
Most of the IT response is
attributable to stimulus
identity (which Walsh
pattern?), only little to task
“context” (sample, match
or non-match stimulus).
Rat “Barrel” Cortex
Rat S1 has a a large “barrel
field” in which the vibrissae are
represented.
Spike Latency Coding in Rat Somatosenory Cortex
Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2
barrel, stimulated D2 whisker as well as surrounding whiskers. Response
PSTHs shown on right
While spike counts were not very informative about which whisker was
stimulated, response latency carried large amounts of information.
Applications of Information Theory in
Neuroscience – Some Further Examples
Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so
of the response of “face cells” in monkey inferotemporal cortex
contained most of the information contained in the entire
response pattern
Machens et al (J Neurosci 2001) found that grasshopper auditory
neurons transmit information about sound stimuli with highest
efficiency if the properties of these stimuli match the time scales
and amplitude distributions of natural songs.
Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of
A1 neurons in adult ferrets carry more information about the
spatial location of a sound stimulus than do responses of infant
neurons.
Li et al (Nature Neurosci 2004) found that the mutual information
between visual stimuli and V1 responses can depend on the task
an animal is performing (attention?).
Information Theory in Neuroscience: a
Summary
Transmitted Information measures how much the uncertainty
about one random variable can be reduced by observing another.
Two random variables are “mutually informative” if they are not
statistically independent (p(x,y) ≠ p(x) p(y))
However, information measures are agnostic about how the
information should best be decoded, or indeed about how much
(if any) of the information contained in a spike train can be
decoded and used by the brain.
Information theory thinks about neurons merely as “transmission
channels” and assumes that the receiver (i.e. “higher” brain
structures) knows about possible states and their entropies.
Real neurons have to be encoders and decoders as much as they
are transmission channels.
The information content of a spike train is hard to measure
accurately, but at least rough (and potentially useful) estimates
can sometimes be obtained.
Further Reading
Trappenberg, T. P. (2002). "Fundamentals of computational
neuroscience," (Oxford University Press, Oxford).
Rolls, E. T., and Treves, A. (1998). "Neural networks and brain
function." (Oxford University Press, Oxford), pp. appendix 2.
Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press,
Cambridge, Mass.; London).
Eskandar EN, Richmond BJ, and Optican LM. Role of inferior
temporal neurons in visual memory. I. Temporal encoding of
information about visual images, recalled images, and behavioral
context. J Neurophysiol 68: 1277-1295, 1992.
Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical
representation of auditory space: information-bearing features of
spike patterns," J Neurophysiol 87, 1749-62.
Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME.
The role of spike timing in the coding of stimulus location in rat
somatosensory cortex. Neuron 29: 769-777, 2001.