Population Codes and Inference in Neurons

Download Report

Transcript Population Codes and Inference in Neurons

Population Codes &
Inference in Neurons
Richard Zemel
Department of Computer Science
University of Toronto
Basic questions of neural representation
Fundamental issue in computational neuroscience:
How is information represented in the brain?

What are the units of computation?

How is information processed at neural level?
Important part of answer: information not processed
by single cells, but by populations
Population Codes
Coding first thought to be localist: neurons as binary
units, encode unique value
Alternative: more distributed, graded response;
neuron’s level of activity conveys information
Population code: group of units tuned to common
variable
Good computational strategy: efficient and robust
Population codes all the way down
Examples: visual features;
motor commands; other sensory
properties; place fields
Outline
1) Information processing in population codes
a) reading the neural code
b) computation in populations
2) Extending the information in population codes
a) representing probability distributions
b) methods for encoding/decoding
distributions in neurons
3) Maintaining and updating distributions
through time: dynamic distributions
a) optimal analytic formulation
b) network approximation
Reading the Neural Code
Neurophysiologists collect neural recordings:
sequences of action potentials (spikes) from one or
several cells during controlled experiment
Task: reconstruct identity, or value of parameter(s)
Why play the homunculus?
 Assess
degree to which that parameter encoded
(establish sufficiency, not necessity)
 Limits on reliability and accuracy of neuronal encoding
(estimate optimal parameters)
 Characterize information processing: nervous system
faced with this decoding problem
Rate representation of response
Spikes convey information through timing
Typically converted into scalar rate value,
summarized in ri: firing rate of cell i (#spikes in
interval/interval size)
Interval size determines amount of information about
spike timing lost in firing rate representation
Can also consider firing rate of cell as the probability
that the cell will fire within specified time interval
Example: Reconstructing movement direction
Task: given rates in population of direction-selective
cells (r = r1,…,rN) compute arm direction
Cells in motor cortex (M1) tuned to movement angle
Tuning function (curve) fi(x):
fi(x) = A + B cos (x-xi)
A = ½ (rimax + rimin) B = ½ (rimax - rimin)
Population vector method
Consider each cell as vector pointing in preferred
direction xi
Length of vector represents relative response strength
for particular movement direction
Sum of vectors is the estimated movement direction:
the population vector
Simple, robust accurate method if N large, and {xi}
randomly, uniformly span the space of directions
Can also view as reconstruction with
cosine basis:
=
Bayesian reconstruction
Basis function methods perform well, but other class
of methods in some sense optimal
Set up statistical model: signal x produces response r,
need to invert to model to find likely x for given r
Begin with encoding model
ri(x) = fi(x) + η
 rate ri(x)
is random variable: response of cell i in
population to stimulus x
 tuning function fi(x) describes expected rate
 noise η typically assumed Poisson or Gaussian
Goal: decode responses to form posterior distribution
P(x|r)= P(r|x) P(x) / P(r)
Standard Bayesian reconstruction
likelihood P(r|x) based on encoding model
assumptions in standard model:
 spikes
have Poisson distribution (natural if rate defined as
spike count, spikes distributed independently, randomly)
 noise uncorrelated between different cells: all variability
captured in P(ri|x)
intuition: gain precision through multiplying rather than
adding basis functions (tuning curves here)
obtain single value estimate through MAP or ML
Application: hippocampal population codes
P(x) based on spatial occupancy; P(ri|x) are place fields
Zhang et al.
ML reconstruction
under simplifying assumptions, ML reconstruction has
simple intuitive form
implement ML by maximizing
if tuning curves evenly distributed (
for Gaussian tuning curves,
xM L
P
r i xi
i
= P
i ri
constant)
Computation in population codes
most of computational focus on population codes
based on observation that they offer compromise:
 localist
codes have problems with noise, robustness,
number of neurons required
 fully distributed codes can make decoding complicated,
cannot handle multiple values
other properties of population codes studied recently,
key focus (driven partly by biological studies) on
recurrent connections between units in population
Line attractor
simple network model, with recurrent connections Tij,
governed by dynamic equation:
 ui
is net input into unit i; rate ri its output;
recurrent input; hi its feedforward input
 if rate linear above threshold input:


is
recurrent contribution of j on i:
feedforward contribution:
in general, set of N linear equations in N unknowns
has unique solution, but can tune connections so
fixed points (attractors) lie along a line
Line attractor model
applied to number of problems:
 short-term
memory: remembering facing direction after
closing eyes, rotating head
 noise removal: used to clean up noisy population
responses



set up lateral cxns so smooth hill centered on any point is stable
transient noisy input, network settles into hill of activity
peak position close approximation to xML: process allows simple
decoder (e.g., population vector method) to approximate ML
other recurrent connection schemes produce stimulus
selection (nonlinear, WTA); gain modulation (linear,
scale responses by background amplitude)
Outline
1) Information processing in population codes
a) reading the neural code
b) computation in populations
2) Extending the information in population codes
a) representing probability distributions
b) methods for encoding/decoding
distributions in neurons
3) Maintaining and updating distributions
through time: dynamic distributions
a) optimal analytic formulation
b) network approximation
Extending information in population codes
Standard model focuses on encoding single value of x in
face of noisy r
Alternative: populations represent more than single value;
motivated by computational efficiency, also necessity –
handle important natural situations
(1). Multiple values
Extending information in population codes
(2). uncertainty (noise at all levels; inherent in image –
insufficient information, e.g., low-contrast images)
Aperture Problem
v
Adelson & Movshon
v
Inherent Ambiguity
All possible motion vectors lie along a line in
the 2D vx,vy ‘velocity space’
vy
vx
Human behavior: Bayesian judgements
Likelihood * Prior
Posterior
Weiss, Simoncelli, Adelson
Bayesian cue combination
Ernst & Banks
(A). Gain Encoding
Simple extension of standard population code interpretation:
activity is noisy response of units to single underlying value
Aim: given unit activities r, tuning curves fi(),
find directions P(|r)
Encoding: P(ri|), for example, bell-shaped tuning:
Decoding: log P(ri|), e.g., assume indy Poisson noise
(A). Gain Encoding (cont).
Gaussian, homogeneous fi(), uniform prior:
log P(|{ri}) ! Gaussian
Solve for ,  by completing the square
Simple mechanism for encoding uncertainty:
change overall population activity (gain);
but limited to Gaussian posterior
(A). Gain encoding: Transparent motion
Solve for ,  by completing the square
convolves responses w/ unimodal kernels
1. unimodal response pattern produces unimodal distn.
2. surprisingly, also fails on bimodal response patterns
 only extracts single motion component from responses to
transparent motion
(B). Direct Encoding
Activity corresponds directly to probability
Simple case: binary (A vs. B):
probability neuron 1 spikes  P(A),
or can wait to compute rates r1  P(A)
Note: r1 can also represent log P(A); log P(A)/P(B )
Shadlen et al; Rao; Deneve; Hoyer & Hyvarinen
(B). Direct Encoding: Example
Discrete alternatives i for explaining input s
ri / log P(s|i) = likelihood for i
Standard model for neural motion analysis:
motion energy filter
Filter response gi(s) is energy of video s(y,t)
convolved with oriented filter, tuned to velocity i
Probabilistic model predicts ideal video
is formed by image s(y) translating at velocity i
(B). Direct Encoding: Example
y
t
=0
=1
=2
Weiss
& Fleet,
02
Weiss
& Fleet
(C). Convolution Codes
Characterize population response in terms of P(|r) -standard model restricted to Gaussian posterior
Convolution codes can represent more general density
fcns, introduce level of indirection to direct method
Two forms of convolution codes:
1. Decoding kernels
2. Encoding kernels
(C). Convolution Codes
Decoding kernels (bases):
• bases can be distributions: P(|r) normalized
• bases can have simple form: i() = ( - i)
• multimodal P(|r) if active neurons have
different i
Anderson
(C). Convolution Codes: DPC
Encoding kernels (bases):
if P() = (,*) then <ri> = i(*), so could
choose tuning functions fi() as kernels
Decoding:
• deconvolution (cannot recover high freqs.)
• probabilistic approach: nonlinear regression
to optimize P(|r) under encoding model
Zemel, Dayan, & Pouget
Sums or Products?
kernel decoder
kernel encoder
(C). Convolution codes: Transparent motion
Bimodal response patterns: recovers generating distribution
Unimodal patterns fit, until
(matches subject’s uncertainty)
(C). Convolution Codes: Extension
handle situation with multiple values and uncertainty
library of functions () that describe combinations
of values of 
Sahani & Dayan
Outline
1) Information processing in population codes
a) reading the neural code
b) computation in populations
2) Extending the information in population codes
a) representing probability distributions
b) methods for encoding/decoding
distributions in neurons
3) Maintaining and updating distributions
through time: dynamic distributions
a) optimal analytic formulation
b) network approximation
Dynamic distributions: motivation
Dynamic cue combination
Dynamic cue combination
Kording & Wolpert
information constantly changing over time: extend
framework to encode/decode dynamic distributions
Dynamic Distributions: decoding
Spike train R(t)  what is P(X(t)|R(t))?
 Markov: dynamics determined by Tij = P(Xi(t)|Xj(t-1))
 More general form:
continuous time: R(t), X(t)
is spike, posn from 0 to t:
R(0)…R(t-²); X(0)…X(t-²)
GP spikes: Encoding model & prior
 instantaneous, independent, inhomogeneous Poisson
process:
YN YM
Y
P (R (t)jX (t)) =
P (R j (t m )jX (t m )) /
f j (X (t m ))
j = 1 m= 0
j ;t m
 and a Gaussian Process prior:
® defines the smoothness of the prior, and ¿ defines
the speed of movement
Huys, Zemel, Natarajan, Dayan
GP spikes decoding: Dynamics prior is key
X
P
m(t) =
k (t m )Rj (t m ))
j µj (
f tm < tg
 static stimulus prior (® = 0):
 dynamic stimulus prior (® > 0): spikes not eternally
informative
 1st-order Markov (® = 1) : OU process
 high-order (® = 2): smooth process
Trajectories & kernels
OU (® = 1)
Smooth (®=2)
Optimal Dynamic Distributions
Analytically tractable formulation
Prior important for rapidly changing stimuli – fewer
spikes than temporal variations in stimulus
For smooth (natural) stimuli: no recursive
formulation, recompute kernel per spike
Decoding: must maintain spike history
Hypothesis: Recoding spikes
Recode input spikes into a new set of spikes to facilitate
downstream processing; obviate need to store spike history
Train network to produce new spikes so that simple
decoder can approximate optimal decoding of input spikes
Natarajan, Huys, Dayan, Zemel
Log-linear spike decoding
effect of spike on postsynaptic neuron: produces smoothly
decaying postsynaptic potential
(t)
t
1
1
1
X
1
1
X
1
Hinton & Brown
Dynamic Distributions: recoding network
Aim: map spikes R(t) to S(t), so that simple
decoding of S(t) approximates optimal P(X(t)|R(t))
1. Convolution kernel decoder for S(t):
2. Processing dynamics: standard recurrent net
3. Learn weights W, V to minimize
Recoding network: example
Recoding network: analyzing kernels
Recoding network: results summary
Discussion

Current directions:
 Apply
scheme recursively, hierarchically
 Relate
model to experimental results, e.g., Kording &
Wolpert

Open issues:
 High-dimensional
spaces: curse of dimensionality
doubled?
 Experimental
validation or refutation of proposed
distributional schemes?