p - Wellcome Trust Centre for Neuroimaging

Download Report

Transcript p - Wellcome Trust Centre for Neuroimaging

Estimating the Transfer Function
from
Neuronal Activity to BOLD
Maria Joao Rosa
SPM Homecoming 2008
Wellcome Trust Centre for Neuroimaging
Statistic formulations
•
P(A): probability of event A occurring
•
P(A|B): probability of A occurring given B occurred
•
P(B|A): probability of B occurring given A occurred
•
P(A,B): probability of A and B occurring simultaneously (joint probability of A and
B)
Joint probability of A and B
P(A,B) = P(A|B)*P(B) = P(B|A)*P(A)
 P(B|A) = P(A|B)*P(B)/P(A)
Which is Bayes Rule
Bayes’ Rule is very often referred to Bayes’ Theorem, but it is not really a theorem, and
should more properly be referred to as Bayes’ Rule (Hacking, 2001).
Reverend Thomas Bayes
(1702 – 1761)
• Reverend Thomas Bayes was a minister
interested in probability and stated a form of his
famous rule in the context of solving a
somewhat complex problem involving billiard
balls
• It was first stated by Bayes in his ‘Essay
towards solving a problem in the doctrine of
chances’, published in the Philosophical
Transactions of the Royal Society of London in
1764.
Conditional probability
P(A|B): conditional probability of A given B
Q: When are we considering conditional probabilities?
A: Almost always!
Examples:
• Lottery chances
• Dice tossing
Conditional probability
Examples (cont’):
• P(Brown eyes|Male): (P(A|B) with A := Brown eyes, B := Male)
1. What is the probability that a person has brown eyes, ignoring everyone who
is not a male?
2. Ratio: (being a male with brown eyes)/(being a male)
3. Probability ratio: probability that a person is both male and has brown eyes to
the probability that a person is male
P(Male) = P(B) = 0.52
P(Brown eyes) = P(A) = 0.78
P(Male with brown eyes) = P(A,B) = 0.38
P(A|B) = P(B|A)*P(A)/P(B) = P(A,B)/P(B) = 0.38/0.52 = 0.73..
Flipping it around (Bayes idea):
You could also calculate now what’s the prob. of being a male if you have brown
eyes P(B|A) = P(A|B)*P(B)/P(A) = 0.73*0.52/0.78 = 0.4871…
Statistic terminology
• P(A) is called the marginal or prior probability of A (since it is the
probability of A prior to having any information about B)
Similarly:
• P(B): the marginal or prior probability of B
• P(A|B) is called the likelihood function for A given B.
• P(B|A): the posterior probability of B given A (since it depends on
having information about A)
Bayes Rule
prior probabilities of B, A (“priors”)
P(B|A) = P(A|B)*P(B)/P(A)
“posterior” probability of B given A
“likelihood” function for B (for fixed A)
Bayes’ Theorem for a given parameter 
p (data) = p (data) p () / p (data)
1/P (data) is basically
a normalizing constant
Posterior  likelihood x prior
The prior is the probability of the parameter and represents what was
thought before seeing the data.
The likelihood is the probability of the data given the parameter and
represents the data now available.
The posterior represents what is thought given both prior information
and the data just seen.
It relates to the conditional density of a parameter (posterior
probability) with its unconditional density (prior, since depends on
information present before the experiment).
Data and hypotheses…
– We have a hypotheses H0 (null), H1
– We have data (Y)
– We want to check if the model that we have (H1) fits our
data (accept H1 / reject H0) or not (H0)
Inferential statistics:
what is the probability that we can reject H0 and accept H1 at some
level of significance (, P)
These are a-priori decisions even when we don’t know what the data
will be and how it will behave.
Bayes:
We get some evidence for the model (“likelihood”) and then can even
compare “likelihoods” of different models
Where does Bayes Rule come at hand?
• In diagnostic cases where we’re are trying to calculate P(Disease
| Symptom) we often know P(Symptom | Disease), the
probability that you have the symptom given the disease,
because this data has been collected from previous confirmed
cases.
• In scientific cases where we want to know P(Hypothesis |
Result), the probability that a hypothesis is true given some
relevant result, we may know P(Result | Hypothesis), the
probability that we would obtain that result given that the
hypothesis is true- this is often statistically calculable, as when
we have a p-value.
Applicability to (f)mri
• Let’s take fMRI as a relevant example
Y=X*+
• We have:
– Measured data : Y
– Model : X
– Model estimates: ,  (/variance)
What do we get with
inferential statistics?
• T-statistics on the betas ( = (1,2,…)) (taking error into account) for a
specific voxel  we would ONLY get that there is a chance (e.g. < 5%) that
there is NO effect of (e.g. 1 > 2), given the data
• But what about the likelihood of the model???
• What are the chances/likelihood that 1 > 2 at some voxel or region
• Could we get some quantitative measure on that?
What do we get with
Bayes statistics?
Here, the idea (Bayes) is to use our post-hoc knowledge (our data) to estimate the
model, ( also allowing us to compare hypotheses (models) and see which fits
our data best)
prior probabilities of Y, X (“priors”)
P(X|Y) = P(Y|X)*P(X)/P(Y)
“posterior” distribution for X given Y
“likelihood” of Y given X
i.e. P(|Y) = P(Y|)*P()/P(Y)
Now to Steve about the practical sides in SPM…
Bayes for Beginners: Applications
Bayes in SPM
SPM uses priors for estimation in…
 spatial normalization
 segmentation
 EEG source localisation
and Bayesian inference in…
 Posterior Probability Maps (PPM)
 Dynamic Causal Modelling (DCM)
Null hypothesis significance testing
p( D | H 0 )
 Standard approach in science is the null hypothesis significance test
(NHST)
 Low p value suggests “there is not nothing”
 Assumption is H0 = noise; randomness
H0 = molecules are randomly
arranged in space
Looking unlikely…
Kreuger (2001) American Psychologist
Something vs nothing
D
t
s/ n
…If there is
any effect..
D n
E(
) n
s
Our interpretations ultimately depend on p(H0)
“Risky” vs “safe” research…
Better to be explicit – incorporate subjectivity
when specifying hypotheses.
Belief change = p(H0) – p(H0 | D)
If the underlying effect δ
~= 0, no matter how small,
the test statistic grows in
size – is this
physiological?
The case for the defence
• Law of large numbers means that the test statistic will
identify a consistent trend (δ ~= 0) with a sufficient sample
size
• In SPM, we look at images of statistics, not effect sizes
• A highly significant statistic may reflect a small nonphysiological difference, with large N
• BUT… as long as we are aware of this, classical
inference works well for common sample sizes
Posterior Probability Distribution
precision l = 1/2
lpost = ld + lp Mpost = ld Md + lp Mp
lpost
lpost-1
ld-1
lp-1
Mp
Mpost Md
(1) Bayesian model comparison
BUT!!!
What is p(H0) for randomness?!
Reframe the question – compare alternative hypotheses/models:
H1
Bayes:
vs
H2
vs
H3
p( y |  ) p( )
p( | y ) 
p( y )
vs
H4
vs
etc…
If only one model, then
p(y) is a normalising
constant…
For model Hi : p ( | y, H )  p ( y |  , H i ) p ( | H i )
i
p( y | H i )
Model evidence for Hi
Practical example (1)
Dynamic causal modelling (DCM)
H=2
H=1
Photic
SPC
0.85
Photic
SPC
0.86
0.70
V1
V1
0.57
0.75
0.84
1.42
0.89
Attention
0.55
-0.02
V5
Motion
0.58
Attention
-0.02
0.56
V5
Motion
H=3
Model Evidence:
Photic
p( y | H i )   p( y |  , H i ) p( | H i )d
Bayes factor:
p( y | H1 )
B12 
p( y | H 2 )
SPC
0.85
0.70
0.85
1.36
V1
0.03
0.57
-0.02
0.23
Motion
Attention
V5
Attention
(2) Priors about the null hypothesis
General Linear Model:
Y  X  
with
  N (0, C )
What are the priors?
• In “classical” SPM, no (flat) priors
• In “full” Bayes, priors might be from theoretical arguments or
from independent data
• In “empirical” Bayes, priors derive from the same data,
assuming a hierarchical model for generation of the data
Parameters of one level can be made priors on distribution of
parameters at lower level
Parameters and hyperparameters at each level can be
estimated using EM algorithm
Shrinkage prior
General Linear Model:
Y  X  
(1)
p( (1) )  N (0, C )
Shrinkage prior:
  0   ( 2)
p(  )
p (  )  N (0, C  )
In the absence of evidence
to the contrary, parameters
will shrink to zero
0
Bayesian Inference
p(  | y )  p( y |  ) p(  )
PPMs
Posterior
Likelihood
Prior
SPMs

u
p (t |   0)
p( | y)

Bayesian test
t  f ( y)
Classical T-test
Changes
with search
volume
Practical example (2)
SPM5 Interface
(2) Posterior Probability Maps
Activation threshold 
p(    | y )  
Mean (Cbeta_*.img)
Posterior probability distribution p( |Y)
Probability 
Std dev (SDbeta_*.img)
PPM (spmP_*.img)
(3) Use informative priors (cutting edge!)
• Spatial constraints on fMRI activity (e.g. grey
matter)
• Spatial constraints on EEG sources, e.g.
using fMRI blobs
?
(4) Tasters – The Bayesian Brain
(4a) Taster: Modelling behaviour…
Ernst & Banks (2002) Nature
(4a) Taster: Modelling behaviour…
Ernst & Banks (2002) Nature
(4b) Taster: Modelling the brain…
Friston (2005) Phil Trans R Soc B
Acknowledgements and further reading
•
Previous MFD talks
•
Jean & Guillame’s SPM course slides
•
Krueger (2001) Null hypothesis significance testing Am Psychol 56: 16-26
•
Penny et al. (2004) Comparing dynamic causal models. Neuroimage 22: 1157-1172
•
Friston & Penny (2003) Posterior probability maps and SPMs Neuroimage 19: 12401249
•
Friston (2005) A theory of cortical responses Phil Trans R Soc B
•
www.ualberta.ca/~chrisw/BayesForBeginners.pdf
•
www.fil.ion.ucl.ac.uk/spm/doc/books/hbf2/pdfs/Ch17.pdf
Bayes’ ending
Bunhill Fields Burial Ground
off City Road, EC1