fMRI4Newbies_L06_Psy.. - Culham Lab Selection Page

Download Report

Transcript fMRI4Newbies_L06_Psy.. - Culham Lab Selection Page

Jody Culham
Brain and Mind Institute
Department of Psychology
Western University
http://www.fmri4newbies.com/
fMRI Analysis
with emphasis on the General Linear Model
Last Update: February 11, 2013
Last Course: Psychology 9223, W2013, Western University
Statistical Foundations
What data do we start with
Each voxel is a “big box
of neurons”
BA BA BA BA BA BA BA BA
30 slices x 64 voxels x 64 voxels
of (3 mm)3
=122,880 voxels
Each voxel has a time course
What data do we start with
BA BA BA BA BA BA BA BA
Measured BOLD Signal
We know the paradigm
Mother Nature’s
Convolution
+
Neural Activation
in Response to Stimulus/Task
Vasculature
Error
What data do we start with
We know the paradigm
and predicted neural activity
We can model the HRF
and assume it’s linear(ish)
Thus we can predict an
expected time course
Choice of HRFs
20 Ss’ HRFs
Handwerker et al.,
2004, NI
Two-Gamma
(preferred)
Boynton
We could even derive
subject-specific HRFs
What data do we start with
We know the paradigm
and predicted neural activity
We can model the HRF
and assume it’s linear(ish)
Thus we can predict an
expected time course
Now we can see how
closely our predicted time
course matches real voxel
time course
A Simple Correlation Will Do
r (df =135) = 0.528
p < .000001
Amplitude of
Data Time Point
in 1 voxel
Each dot is one time point
in our subject’s data
(except that I got too bored
to draw all 136 time points
for our run)
Amplitude of Predictor Time Point
Now we just have to
repeat this 122,879
more times!
Effect of Thresholds
r = .80
64% of variance
p < 10-33
r = .50
25% of variance
p < .000001
r = .40
16% of variance
p < .000001
r = .24
6% of variance
p < .05
r=0
0% of variance
p<1
The General Linear Model (GLM)
GLM definition from Huettel et al.:
• a class of statistical tests that assume that the
experimental data are composed of the linear
combination of different model factors, along with
uncorrelated noise
• Model
– statistical model
• Linear
– things add up sensibly (1+1 = 2)
• note that linearity refers to the predictors in the
model and not necessarily the BOLD signal
• General
– many simpler statistical procedures such as
correlations, t-tests and ANOVAs are subsumed by
the GLM
Benefits of the GLM
• GLM is an overarching tool that can do anything that the
simpler tests do
• allows any combination of contrasts (e.g., intact scrambled, scrambled - baseline), unlike simpler
methods (correlations, t-tests, Fourier analyses)
• allows more complex designs (e.g., factorial designs)
• allows much greater flexibility for combining data within
subjects and between subjects
• allows comparisons between groups
• allows counterbalancing orders within and between
subjects
• allows modelling of known sources of noise in the data
(e.g., error trials, head motion)
Composition of a Voxel Time Course
A Simple Experiment
Lateral Occipital Complex
• responds when subject
views objects
Intact
Objects
Blank
Screen
TIME
One volume (12 slices) every 2 seconds for 272
seconds (4 minutes, 32 seconds)
Condition changes every 16 seconds (8 volumes)
Scrambled
Objects
What’s real?
A.
C.
B.
D.
What’s real?
• I created each of those time courses based by taking
the predictor function and adding a variable amount
of random noise
signal
=
+
noise
What’s real?
Which of the data sets below is more convincing?
Formal Statistics
•
Formal statistics are just doing what your eyeball test of significance did
– Estimate how likely it is that the signal is real given how noisy the data is
•
confidence: how likely is it that the results could occur purely due to
chance?
•
“p value” = probability value
– If “p = .03”, that means there is a .03/1 or 3% chance that the results are
bogus
•
By convention, if the probability that a result could be due to chance is
less than 5% (p < .05), we say that result is statistically significant
•
Significance depends on
– signal (differences between conditions)
– noise (other variability)
– sample size (more time points are more convincing)
Let’s create a time course for one LO voxel
We’ll begin with activation
Response to Intact Objects is 4X greater than Scrambled Objects
Then we’ll assume that our modelled activation is off
because a transient component
Our modelled activation could be off for other reasons
All of the following could lead to inaccurate models
• different shape of function
• different width of function
• different latency of function
Reminder: Variability of HRF
Intersubject variability of HRF in M1
Handwerker et al., 2004, NeuroImage
Now let’s add some variability due to head motion
…though really motion is more complex
•
Head motion can be quantified with 6 parameters given in any motion correction
algorithm
–
–
–
–
–
–
x translation
y translation
z translation
xy rotation
xz rotation
yz rotation
•
For simplicity, I’ve only included parameter one in our model
•
Head motion can lead to other problems not predictable by these parameters
Now let’s throw in a pinch of linear drift
• linear drift could arise from magnet noise (e.g., parts warm up)
or physiological noise (e.g., subject’s head sinks)
and then we’ll add a dash of low frequency noise
•
•
low frequency noise can arise from magnet noise or physiological noise (e.g.,
subject’s cycles of alertness/drowsiness)
low frequency noise would occur over a range of frequencies but for simplicity,
I’ve only included one frequency (1 cycle per run) here
– Linear drift is really just very low frequency noise
and our last ingredient… some high frequency noise
•
high frequency noise can arise from magnet noise or physiological noise (e.g.,
subject’s breathing rate and heartrate)
When we add these all together, we get a realistic time
course
General Linear Model
Now let’s be the experimenter
•
•
•
First, we take our time course and normalize it using z scores
z = (x - mean)/SD
Alternative: You can transform the data into
normalization leads to data where % BOLD signal change.
This is usually a better approach because
– mean = zero
it’s not dependent on variance
– SD = 1
If you only pay
attention to one slide in
this lecture, it should
be the next one!!!
We create a GLM with 2 predictors
× 1
=
+
+
× 2
fMRI Signal
“our data”
=
=
Design Matrix x Betas
“what we
CAN
explain”
x
“how much of
it we CAN
explain”
+
Residuals
+
“what we
CANNOT
explain”
Statistical significance is basically a ratio of
explained to unexplained variance
Implementation of GLM in SPM
 Time
Many thanks to Øystein Bech Gadmar for
creating this figure in SPM
Intact
Predictor
•
•
•
Scrambled
Predictor
SPM represents time as going down
SPM represents predictors within the design matrix as grayscale plots (where black = low,
white = high) over time
GLM includes a constant to take care of the average activation level throughout each run
–
SPM shows this explicity (BV may not)
Effect of Beta Weights
• Adjustments to the beta weights have the effect of
raising or lowering the height of the predictor while
keeping the shape constant
Dynamic Example
The beta weight is NOT a correlation
• correlations measure goodness of fit regardless of scale
• beta weights are a measure of scale
small ß
large r
small ß
small r
large ß
large r
large ß
small r
We create a GLM with 2 predictors
when 1=2
=
+
+
when 2=0.5
fMRI Signal
“our data”
=
=
Design Matrix x Betas
“what we
CAN
explain”
x
“how much of
it we CAN
explain”
+
Residuals
+
“what we
CANNOT
explain”
Statistical significance is basically a ratio of
explained to unexplained variance
The “Linear” in GLM
The GLM
assumes that
activation adds
linearly
Much more on
this next lecture
Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis
Correlated Predictors
• Where possible, avoid predictors that are highly correlated with
one another
• This is why we NEVER include a baseline predictor
– baseline predictor is almost completely correlated with the sum of
existing predictors
+
=
r = -.53
r = -.53
r = -.95
Two stimulus predictors
Baseline predictor
Which model accounts for this data?
xβ=1
+
xβ=0
OR
+
xβ=1
+
xβ=0
+
xβ=0
•
Because the predictors are highly correlated, the model is
overdetermined and you can’t tell which beta combo is best
x β = -1
Maximizing Your Power
signal
=
As we saw earlier, the GLM is
basically comparing the amount of
signal to the amount of noise
How can we improve our stats?
• increase signal
• decrease noise
• increase sample size (keep subject in longer)
+
noise
How to Reduce Noise
• If you can’t get rid of an artifact, you can include it as a
“predictor of no interest” to soak up variance
Example: Some people
include predictors from the
outcome of motion correction
algorithms
Corollary: Never leave out
predictors for conditions
that will affect your data
(e.g., error trials)
This works best when the
motion is uncorrelated with
your paradigm (predictors
of interest)
Including First Derivative
• Some recommend including the first derivative of the
HRF-convolved predictor
– can soak up some of the variance due to misestimations of
the HRF
Now do you understand why we did temporal filtering?
raw
data
highpass
lowpass
bandpass
Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis
Reducing Residuals
Alternative to Filtering
Rather than filtering low frequencies
from our raw data, we can include a
“discrete cosine basis set” that
soaks up variance due to low
frequency noise
Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis
Contrasts:
Examples with Real Data
Sam’s Paradigm:
Localizer for Ventral-Stream Visual Areas
Fusiform Face Area
Contrasts in the GLM
• We can examine whether a single predictor is significant
(compared to the baseline)
R
L
z = -20
• We can also examine whether a single predictor is
significantly greater than another predictor
Contrast Vectors
Houses
Faces
Objects
Bodies
Scram
Faces - Baseline
0
+1
0
0
0
Faces - Houses
-1
+1
0
0
0
Faces - Objects
0
+1
-1
0
0
Faces - Bodies
0
+1
0
-1
0
Faces - Scrambled
0
+1
0
0
-1
Balanced Contrasts
β
1
2
1
1
1
Condition
Unbalanced
Balanced
Contrast
-1
+1
-1
-1
-1
β
1
2
1
1
1
Contrast
xβ
-1
2
-1
-1
-1
Σ=-3
Σ=-2
If you do not balance the contrast, you are comparing
one condition vs. the sum of all the others
Contrast
-1
+4
-1
-1
-1
β
1
2
1
1
1
Contrast
xβ
-1
8
-1
-1
-1
Σ=-0
Σ=4
If you balance the contrast, you are comparing one
condition vs. the average of all the others
Problems with Bulk Contrasts
β
β
1
2
1
1
1
2
Condition
2
2
2 .5
Condition
Balanced: Faces vs. Other
Contrast
-1
+4
-1
-1
-1
β
1
2
1
1
1
Contrast
xβ
-1
8
-1
-1
-1
Balanced: Faces vs. Other
Σ=0
Σ=4
Contrast
-1
+4
-1
-1
-1
β
2
2
2
2
0.5
Contrast
xβ
-2
8
-2
-2
0.5
• Bulk contrasts can be significant if only a subset of
conditions differ
Σ=0
Σ=1.5
Conjunctions
(sometimes called Masking)
Houses
Faces
Objects
Bodies
Scram
Faces - Baseline
0
+1
0
0
0
Faces - Houses
-1
+1
0
0
0
Faces - Objects
0
+1
-1
0
0
Faces - Bodies
0
+1
0
-1
0
AND
AND
AND
AND
Faces - Scrambled
0
+1
0
0
-1
To describe this in text:
• [(Faces > Baseline) AND (Faces > Houses) AND (Faces >
Objects) AND (Faces > Bodies) AND (Faces > Scrambled)]
Conjunction Example
Faces –
Houses
Faces –
Objects
Faces –
Bodies
Superimposed
Maps
Faces –
Scrambled
Faces –
Baseline
Conjunction
P Values for Conjunctions
• If the contrasts are independent:
• e.g., [(Faces > Houses) AND (Scrambled > Baseline)]
– pcombined = (psinglecontrast)numberofcontrasts
• e.g., pcombined = (0.05)2 = 0.0025
• If the contrasts are non-independent:
• e.g., [(Faces > Houses) AND (Faces > Baseline)]
– pcombined is less straightforward to compute
Real Voxel: GLM
•
Here’s the time course from a voxel in right FFA (defined by conjunction)
GLM Data, Model, and Residuals
dfpredictors = # of predictors
dfresidual = dftotal - dfpredictors
dftotal = #volumes - 1
262 volumes (time points)
GLM predictors account for
(0.784)2 = 61% of variance
Real Voxel: Betas
t = β/se
e.g., tFace = βFace/seFace
tFace = 1.371/0.076 = 18.145
t(5,261)= 18.145  p < .000001
Real Voxel: Contrasts
Σ[Contrast x β] = 0 x 0.964
+ 1 x 1.371
+ 0 x 0.979
+ 0 x 1.000
- 1 x 0.687
= 1.371 – 0.687
= 0.684
Dealing with Faulty Assumptions
What’s this #*%&ing reviewer
complaining about?!
1. Correction for multiple comparisons
2. Correction for serial correlations
–
–
only necessary for data from single subjects
not necessary for group data
Types of Errors
Is the region truly active?
Yes
No
Does our stat test indicate
that the region is active?
Yes
HIT
Type II
Error
No
Type I
Error
Correct
Rejection
Slide modified from Duke course
p value:
probability of a Type I error
e.g., p <.05
“There is less than a 5%
probability that a voxel our
stats have declared as
“active” is in reality NOT
active
Dead Salmon
poster at Human Brain Mapping conference, 2009
• 130,000 voxels
• no correction for
multiple
comparisons
Fishy Headlines
Mega-Multiple Comparisons Problem
Typical 3T Data Set
30 slices x 64 x 64
= 122,880 voxels of (3 mm)3
If we choose p < 0.05…
122,880 voxels x 0.05 = approx. 6144 voxels should be significant due
to chance alone
We can reduce this number by only examining voxels inside the brain
~64,000 voxels (of (3 mm)3) x 0.05 = 3200 voxels significant by chance
Possible Solutions to Multiple
Comparisons Problem
• Bonferroni Correction
– small volume correction
•
•
•
•
Cluster Correction
False Discovery Rate
Gaussian Random Field Theory
Test-Retest Reliability
Bonferroni Correction
•
divide desired p value by number of comparisons
Example:
desired p value: p < .05
number of voxels in brain: 64,000
required p value: p < .05 / 64,000  p < .00000078
•
Variant: small-volume correction
• only search within a limited space
• brain
• cortical surface
• region of interest
• reduces the number of voxels and thus the severity of Bonferroni
•
Drawback: overly conservative
• assumes that each voxel is independent of others
• not true – adjacent voxels are more likely to be sig in fMRI data
than non-adjacent voxels
Cluster Correction
•
•
•
•
•
falsely activated voxels should be randomly dispersed
set minimum cluster size (k) to be large enough to make it unlikely that
a cluster of that size would occur by chance
some algorithms assume that data from adjacent voxels are
uncorrelated (not true)
some algorithms (e.g., Brain Voyager) estimate and factor in spatial
smoothness of maps
• cluster threshold may differ for different contrasts
Drawbacks:
• handicaps small regions (e.g., subcortical foci) more than large
regions
• researcher can test many combinations of p values and k values
and publish the one that looks the best
False Discovery Rate
•
•
•
•
“controls the proportion of rejected hypotheses that are falsely rejected”
(Type II errors)
standard p value (e.g., p < .01) means that a certain proportion of all
voxels will be significant by chance (1%)
FDR uses q value (e.g., q < .01), meaning that a certain proportion of the
“activated” (colored) voxels will be significant by chance (1%)
Drawbacks
• very conservative when there is little activation; less conservative when
there is a lot of activation
Gaussian Random Field Theory
•
•
•
•
Fundamental to SPM
If data are very smooth, then the chance of noise points passing
threshold is reduced
Can correct for the number of “resolvable elements” (“resels”) rather
than number of voxels
Drawback: Requires smoothing
Slide modified from Duke course
Test-Retest Reliability
•
•
•
•
Perform statistical tests on each half of the data
The probability of a given voxel appearing in both purely by chance is
the square of the p value used in each half
e.g., .001 x .001 = .000001
Alternatively, use the first half to select an ROI and the second half to
test your hypothesis
Drawback: By splitting your data in half, you’re reducing your statistical
power to see effects
Sanity Checks: “Poor Man’s Bonferroni”
•
•
•
•
•
•
For casual data exploration, not publication
Jack up the threshold till you get rid of the schmutz (especially in air,
ventricles, white matter – may be real)
If you have a comparison where one condition is expected to produce
much more activity than the other, turn on both tails of the comparison
If two areas are symmetrically active, they’re less likely to be due to
chance (only works for bilateral areas)
Jody’s rule of thumb: “If ya can’t trust the negatives, can ya trust the
positives?”
Too subjective for serious use
Example: MT localizer data
Moving rings > stationary rings (orange)
Stationary rings > moving rings (blue)
Have We Been So Obsessed with
Limiting Type I Error that Type II Error is
Out of Control?
Yes
No
Yes
HIT
Type I
Error
No
Does our stat test indicate
that the region is active?
Is the region truly active?
Type II
Error
Correct
Rejection
Slide modified from Duke course
Comparison of Methods
simulated
data
uncorrected
-high Type I
-low Type II
Bonferroni
-low Type I
-high Type II
FDR
-low Type I
-low Type II
Poldrack, Mumford & Nichols, 2011 fMRI Data Analysis
Strategies for Exploration vs. Publication
•
Deductive approach
–
–
–
–
•
Have a specific hypothesis/contrast planned
Run all your subjects
Run the stats as planned
Publish
Inductive approach
– Run a few subjects to see if you’re on the right track
– Spend a lot of time exploring the pilot data for
interesting patterns
– “Find the story” in the data
– You may even change the experiment, run additional
subjects, or run a follow-up experiment to chase the
story
• While you need to use rigorous corrections for publication, do not be overly
conservative when exploring pilot data or you might miss interesting trends
• Random effects analyses can be quite conservative so you may want to do
exploratory analyses with fixed effects (and then run more subjects if
needed so you can publish random effects)
What’s this #*%&ing reviewer
complaining about?!
1. Correction for multiple comparisons
2. Correction for serial correlations
–
–
only necessary for data from single subjects
not necessary for group data
• stay tuned to find out why: Group Data lecture
Correction for Temporal Correlations
When analyzing a single subject, degrees of freedom = number of volumes – 1
e.g., if our run has 200 volumes (400 s long if TR = 2), then df = 199
Statistical methods assume that each of our time points is independent.
In the case of fMRI, this assumption is false.
Even in a “screen saver scan”, activation in a voxel at one time is correlated with it’s
activation within ~6 sec
This artificially inflates your statistical significance.
Autocorrelation function
original
To calculate the magnitude of the
problem, we can compute the
autocorrelation function on the residuals
shift by 1
volume
For a voxel or ROI, correlate its time
course with itself shifted in time
shift by 2
volumes
time
If there’s no autocorrelation, function
should drop from 1 to 0 abruptly – pink
line
The points circled in yellow suggest
there is some autocorrelation, especially
at a shift of 1, called AR(1)
Plot these correlations by the degree of
shift
BV can correct for the autocorrelation
to yield revised (usually lower) p values
BEFORE
AFTER
BV Preprocessing Options
Temporal Smoothing of Data
• We have the option in our software to temporally
smooth our data (i.e., remove high temporal
frequencies or “low-pass filter”)
• However, I recommended that you not use this option
• Now do you understand why?
To Localize or Not to Localise?
To Localize or Not to Localise?
Neuroimagers can’t even
agree how to SPELL
localiser/localizer!
Methodological Fundamentalism
The latest review I received…
Approach #1:
Voxelwise Statistics
Run a statistical contrast for every voxel in your search volume.
Correct for multiple comparisons.
Find a bunch of blobs.
Voxelwise Approach: Example
•
•
•
Malach et al., 1995, PNAS
Question: Are there areas of the human brain that are more responsive to
objects than scrambled objects
You will recognize this as what we now call an LO localizer, but Malach was the
first to identify LO
LO activation is shown in red, behind MT+
activation in green
LO (red) responds more to objects, abstract sculptures
and faces than to textures, unlike visual cortex (blue)
which responds well to all stimuli
Approach #2:
Region of interest (ROI) analysis
•
Identify a region of interest
images from
O’Reilly et al.,
2012, SCAN
Functional
ROI
•
Anatomical Functional-Anatomical
ROI
ROI
Perform statistical contrasts for the ROI data in an
INDEPENDENT data set
–
Because the runs that are used to generate the area are
independent from those used to test the hypothesis, liberal
statistical thresholds (e.g., p < .05) can be used
Localizer Scan
• A separate scan conducted to identify functional
regions of interest
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research
Does the Lateral Occipital Complex compute object shape for grasping?
Step 1: Localize LOC
Intact
Objects
Scrambled Objects
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research
Does the Lateral Occipital Complex compute object shape for grasping?
Step 2: Extract LOC data from experimental runs
Grasping
Reaching
NS
p = .35
NS
p = .31
Example of ROI Approach
Very Simple Stats
% BOLD Signal
Change
Left Hem. LOC
Subject
Extract average peak
from each subject for
each condition
Grasping
1
0.02
0.03
2
0.19
0.08
3
0.04
0.01
4
5
•
•
Reaching
0.10
NS
p = .35
1.01
Then simply do a
paired t-test to see
whether the peaks are
significantly different
between conditions
0.32
NS
p = .31
-0.27
6
0.16
0.09
7
0.19
0.12
Instead of using % BOLD Signal Change, you can use beta weights
You can also do a planned contrast in Brain Voyager using a module
called the ROI GLM
Example: The Danger of ROI Approaches
•
•
Example 1: LOC may be a heterogeneous area with subdivisions; ROI
analyses gloss over this
Example 2: Some experiments miss important areas (e.g., Kanwisher
et al., 1997 identified one important face processing area -- the fusiform
face area, FFA -- but did not report a second area that is a very
important part of the face processing network -- the occipital face area,
OFA -- because it was less robust and consistent than the FFA.
Pros and Cons: Voxelwise Approach
Benefits
• Require no prior hypotheses about areas involved
• Include entire brain
• May identify subregions of known areas that are implicated in a
function
• Doesn’t require independent data set
Drawbacks
• Requires conservative corrections for multiple comparisons
• vulnerable to Type II errors
• Neglects individual differences in brain regions
• poor for some types of studies (e.g., topographic areas)
• Can lose spatial resolution with intersubject averaging
• Requires speculation about areas involved
Pros and Cons: ROI Approach
Benefits
•
Extraction of ROI data can be subjected to simple stats
•
Elimination of mega multiple comparisons problem greatly improves
statistical power (e.g., p < .05)
•
Hypothesis-driven
•
Useful when hypotheses are motivated by other techniques (e.g.,
electrophysiology) in specific brain regions
•
ROI is not smeared due to intersubject averaging
•
Important for discriminating abutting areas (e.g., V1/V2)
•
Easy to analyze and interpret
•
Can be useful for dissecting factorial design data in an unbiased manner
Drawbacks
•
Neglects other areas that may play a fundamental role
•
If multiple ROIs need to be considered, you can spend a lot of scan time
collecting localizer data (thus limiting the time available for experimental
runs)
•
Works best for reliable and robust areas with unambiguous definitions
•
Sometimes you can’t find an ROI in some subjects
•
Selection of ROIs can be highly subjective and error-prone
A Proposed Resolution
• There is no reason not to do BOTH ROI analyses and
voxelwise analyses
– ROI analyses for well-defined key regions
– Voxelwise analyses to see if other regions are also involved
• Ideally, the conclusions will not differ
• If the conclusions do differ, there may be sensible reasons
– Effect in ROI but not voxelwise
• perhaps region is highly variable in stereotaxic location between subjects
• perhaps voxelwise approach is not powerful enough
– Effect in voxelwise but not ROI
• perhaps ROI is not homogenous or is context-specific
The War of Non-Independence
Finding the Obvious
A priori probability of getting JQKA
sequence = (1/13)4 = 1/28,561
A posteriori probability of getting JQKA
sequence = 1/1 = 100%
Non-independence error
• occurs when statistical tests performed are not independent
from the means used to select the brain region
Arguments from Vul & Kanwisher, book chapter in press
Non-independence Error
Egregious example
• Identify Area X with contrast of A > B
• Do post hoc stats showing that A is statistically higher than B
• Act surprised!!!
More subtle example of selection bias
• Identify Area X with contrast of A > B
• Do post hoc stats showing that A is statistically higher than C and C is
statistically greater than B
Arguments from Vul &
Kanwisher, book chapter in
press
Figure from Kriegeskorte et
al., 2009, Nature
Neuroscience
Double Dipping & How to Avoid It
• Kriegeskorte et al.,
2009, Nature
Neuroscience
• surveyed 134 papers in
prestiguous journals
• 42% showed at least
one example of nonindependence error
Correlations Between Individual Subjects’
Brain Activity and Behavioral Measures
Sample of Critiqued Papers:
Eisenberg, Lieberman & Williams, 2003, Science
• measured fMRI activity during social rejection
• correlated self-reported distress with brain activity
• found r = .88 in anterior cingulate cortex, an area implicated in physical pain
perception
• concluded “rejection hurts”
social exclusion
> inclusion
“Voodoo Correlations”
The original title of the paper
was not well-received by
reviewers so it was changed
even though some people still
use the term
Voodoo
2009
• reliability of personality and emotion measures: r ~ .7
• reliability of activation in a given voxel: r ~ .7
• highest expected behavior: fMRI correlation is ~.74
• so how can we have behavior: fMRI correlations of r ~.9?!
“Voodoo Correlations”
"Notably, 53% of the surveyed studies selected voxels based on a correlation with the
behavioral individual-differences measure and then used those same data to compute a
correlation within that subset of voxels."
Vul et al., 2009, Perspectives on Psychological Science
Avoiding “Voodoo”
• Use independent means to select
region and then evaluate
correlation
• Do split-half reliability test
– WARNING: This is reassuring that the
result can be replicated in your sample
but does not demonstrate that result
generalizes to the population
Is the “voodoo” problem all that bad?
•
•
High correlations can occur in legitimately analyzed data
Did voxelwise analyses use appropriate correction for multiple
comparisons?
– then result is statistically significant regardless of specific correlation
•
Is additional data being used for
1.
–
2.
–
inference purposes?
if they pretend to provide independent support, that’s bad
presentation purposes?
alternative formats can be useful in demonstrating that data is clean (e.g., time
courses look sensible; correlations are not driven by outliers)