Transcript PPT - MIT

Improved characterization of
neural and behavioral response
properties using point-process
state-space framework
Anna Alexandra Dreyer
Harvard-MIT Division of Health Sciences and Technology
Speech and Hearing Bioscience and Technology Program
Neurostatistics Research Laboratory, MIT
PI: Emery Brown, M.D., Ph.D.
September 27, 2007
Action potentials as binary events
• Action potentials (spikes) are
binary events
• Cells using timing and
frequency of action potentials
to communicate with
neighboring cells
• Most cell emit action potentials
spontaneously in the absence
of stimulation
• Models should begin with
spikes to most accurately
describe the response
Figure from laboratory of Mark Ungless
Point Process Framework: Definition
of Conditional Intensity Function
• Given a recording interval of [0,T), the counting process
N(t) represents the number of spikes that have occurred
on the interval [0,t).
• A model can be completely characterized by the
conditional intensity function (CIF) that defines the
instantaneous firing rate at every point in time as:
 (t | H ) 
lim
Pr[ N (t  )  N (t )  1 | H (t )]
 0

• where H(t) represents the autoregressive history until
time t.
Brown et al., 2003; Daley and Vere-Jones, 2003; Brown, 2005
Joint Probability of Spiking
(Likelihood function)
• Discretize time on duration [0,T) into B intervals.
• As  becomes increasingly small (tb|Ψ,Hk), where Ψ are
parameters and Hb is the autoregressive history up to bin b,
approaches the probability of seeing one event in the binwidth of .
• If we select a sufficiently small binwidth ,, such that the probability
of seeing more than one event in this binwidth approaches 0, the
joint probability can be written as the product of Bernoulli
independent events (Truccolo, et al., 2005):
B
•
Pr[ N1:B |  ]   ( (tb | H b , )) Nb (1  (tb | H b , ) )1 Nb  o(  J )
b 1
where o(J) represents the probability of seeing two or more events
on the interval (tb-1,tb].
Truccolo et al., 2005
An example of using PP models to
analyze auditory data:
Experimental Paradigm
•
Recordings of action potentials to 19
stimulus levels for multiple repetitions
of the stimulus
•
Need to develop encoding model to
characterize responses to each
stimulus level as well as the noise in
the system
•
Inference: find the lowest stimulus
level for which the response is more
than system noise
•
Given new responses from the same
cell, need to decode the stimulus
from which response originated
Data from Lim and Anderson (2006)
Modeling example of cortical
response across stimulus levels
•
Response characteristics include
– Autoregressive components
– Temporal and rate-dependent
elements
•
Typical autoregressive components
Does NOT
capture
To have adequate goodness-of-fit
and predictive power, must
capture these elements from raw
data
Point process state space
framework
• Instantaneous Firing Intensity Model:
– The firing intensity in each Δ=1ms bin, b, is modeled as a
function of the past spiking history, Hl,k,bΔ and of the effect of the
stimulus
– Observation Equation
 J

 R

l ,k (b | l ,  , H l ,k ,b )  exp  l ,r gr (b)  exp    j nl ,k ,b j 
 r 1
Conditional firing intensity
Stimulus effect


j 1

Past spiking history effect
– State equation
l 1,r  l ,r   l 1,r where εl+1,r is a Gaussian random vector
Computational methods developed with G Czanner, U Eden, E Brown
Encoding and Decoding
Methodology
• Estimation/Encoding/Inference
– The Expectation-Maximization algorithm used
– Monte Carlo techniques to estimate
confidence bounds for stimulus effect
• Goodness-of-fit
– KS and autocorrelation of rescaled times
• Decoding and response property inference
Expectation-Maximization
algorithm
• Used in computing maximum likelihood (ML) parameters in
statistical models with hidden variables or missing data.
• The algorithm consists of two steps
– expectation (E) step where the expectation of the complete
likelihood is estimated
– maximization (M) step when the maximum likelihood of the
expectation is taken.
• As the algorithm progresses, the initial estimates of the parameters
are improved by taking iterations until the estimate converges on
the maximum likelihood estimator.
Dempster et al., 1977; McLachlan and Krishnan, 1997; Pawitan, 2001
SS-GLM model of stimulus
effect
• Level dependent stimulus
effect captures many
phenomena seen in data
Stimulus Effect (spikes/s)
– Increase of spiking with
level
– Spread of excitation in
time
Level Number
Time since stimulus onset (ms)
• Removes the effect of
autoregressive history
which is system (not
stimulus dependent)
property
Threshold inference based on
all trials and all levels
• Define threshold as the first
stimulus level for which we can
be reasonably (>0.95) certain
that the response at that level is
different from the noise and
continues to differ for higher
stimulus levels
• For this example, we define
threshold as level 8
• Compare to common
methodology of rate-level
function (level 11)
Dreyer et al., 2007; Czanner et al., 2007
Goodness-of-fit assessment
• The KS plot fits close to the
45 degree line indicating
uniformity of rescaled spike
times
• The autocorrelation plot
implies that Gaussian
rescaled spike times are
relatively uncorrelated,
implying independence.
• In contrast, the KS plots for
the underlying rate-based
models provide a very poor
fit to the data
Johnson & Kotz, 1970; Brown et al, 2002; Box et al., 1994
Decoding based on a single
trial
• Decoding of new data based on encoding parameters
^
^
^
^
  ( 0 ,  , )
•
Given a spike train, estimate the likelihood that the spike
train, nl*, came from any stimulus, sl’, in our encoding
model
B
^
^
Lik ( sl ' | nl )   [ (b | H b , l ' )]n [1   (b | H b , l ' ) ]1 n
b 1
b
b
*
• Calculate the likelihood for all stimuli, s1:L
Lik T (s1:L | nl* )  [ Lik (s1 | nl* ) Lik (s2 | nl* ) ... Lik (sL | nl* )]
• Take the most likely level as the decoded stimulus
Lik T MAX (s1:L | nl* )  max( Lik T (s1:L | nl* ))
Single-trial threshold inference using
decoding based on ML more sensitive
around than ROC based on number of
spikes
The area under ROC
curve specifies the
probability that,
when two responses
are drawn, one from
a lower level and
one from a higher
level, the algorithm
assigns a larger
value to the draw
from a higher level.
Decoding across multiple trials
improves performance
Neural Model Conclusions
• This methodology has potential for
characterizing the behavior of any noisy
system where separation of signal from
noise is important in predicting responses
to future stimuli
Bayesian techniques – the alternative
to frequentist estimation
• Use Bayesian sampling techniques to:
– Estimate behavioral responses to auditory
stimuli
– Apply methodology used for auditory
encoding models to learning experiments to
discover the neural mechanisms that encode
for behavioral learning in the Basal Ganglia.
In collaboration with B. Pfingst, A. Smith, A. Graybiel, E. Brown
Bayesian sampling
methodology
• Goal is to compute the posterior probability density of the
parameters and the hidden state given the data
prior
posterior
p( , x | N ) 
comple data likelihood
p( ) p( x |  ) p( N |  , x)
p( N )
normalization
• Use Monte Carlo Markov Chain (MCMC) methods to compute the
posterior probability by simulating stationary Markov Chains.
• MCMC methods provide approximate posterior probability density
for parameters
• Can compute credible intervals (analogous to confidence intervals
for unknown parameter estimates) for parameter estimates
Gilks et al., 1996; Congdon, 2003
References
•
Box GEP, Jenkins GM, Reinsel GC. Time series analysis, forecasting and control. 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 1994.
•
Brown EN. Theory of Point Processes for Neural Systems. In: Chow CC, Gutkin B, Hansel D, Meunier C, Dalibard J, eds. Methods and Models in Neurophysics. Paris,
Elsevier, 2005, Chapter 14: 691-726.
•
Brown EN, Barbieri R, Eden UT, and Frank LM. Likelihood methods for neural data analysis. In: Feng J, ed. Computational Neuroscience: A Comprehensive Approach.
London: CRC, 2003, Chapter 9: 253-286.
•
Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. Time-Rescaling theorem and its application to neural spike train data analysis. Neural. Comput 2002: 14:325-346.
•
Congdon P. Applied Bayesian Modelling. John Wiley and Sons Ltd., Chichester, United Kingdom, 2003.
•
Daley D and Vere-Jones D. An Introduction to the Theory of Point Process. 2nd ed., Springer-Verlag, New York, 2003.
•
Czanner G, Dreyer AA, Eden UT, Wirth S, Lim HH, Suzuki W, Brown EN. Dynamic Models of Neural Spiking Activity. IEEE Conference on Decision and Control. 2007
Dec 12.
•
Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 1977, 39(1): 1-38.
•
Dreyer AA, Czanner G, Eden UT, Lim HH, Anderson DJ, Brown EN. Enhanced auditory neural threshold detection using a point process state-space model analysis.
Computational Systems Neuroscience Conference (COSYNE). February, 2007
•
Gilks WR, Richardson S, Spiegelhalter DJ. Monte Carlo Markov chain in practice. New York: Chapman and Hall/CRC, 1996.
•
Johnson A, Kotz S. Distributions in Statistics: Continuous Univariate Distributions. New York: Wiley, 1970.
•
Lim HH, Anderson DJ. Auditory cortical responses to electrical stimulation of the inferior colliculus: Implications for an auditory midbrain implant. J. Neurophysiol. 2006,
96(3): 975-88.
•
McLachlan GJ and Krishnan T. The EM Algorithm and Extensions. John Wiley & Sons, 1997.
•
Pawitan Y. In All Likelihood: Statistical Modeling and Inference Using Likelihood. New York: Oxford Univ. Press, 2001.
•
Truccolo, W. Eden, U.T., Fellows, M.R., Donoghue, J.P. and Brown, E.N. A point process framework for relating neural spiking activity to spiking history, neural ensemble
and extrinsic covariate effects. J. Neurophysiol. 2005, 93: 1074-1089.