GLM Analysis

Download Report

Transcript GLM Analysis

fMRI Statistics with Emphasis on the
General Linear Model
What data do we start with
…
•
•
•
•
12 slices * 64 voxels x 64 voxels = 49,152 voxels
Each voxel has 136 time points
Therefore, for each run, we have 6.7 million data points
We often have several runs for each experiment
Why do we need stats?
• We could, in principle, analyze data by voxel surfing: move the cursor
over different areas and see if any of the time courses look interesting
Slice 9, Voxel 0, 0
Slice 9, Voxel 1, 0
Slice 9, Voxel 22, 7
The signal is much
higher where
there is brain,
but there’s still
noise
Even where there’s no brain, there’s
noise
Slice 9, Voxel 18, 36
Slice 9, Voxel 9, 27
Here’s a voxel that responds well
whenever there’s visual stimulation
Slice 9, Voxel 13, 41
Here’s one that responds well
whenever there’s intact objects
Here’s a couple that sort of show the
right pattern but is it “real”?
Slice 9, Voxel 14, 42
Why do we need stats?
• Clearly voxel surfing isn’t a viable option. We’d have
to do it 49,152 times in this data set and it would
require a lot of subjective decisions about whether
activation was real
• This is why we need statistics
The lies and damned lies
come in when you write the
manuscript
• Statistics:
– tell us where to look for activation that is related to our
paradigm
– help us decide how likely it is that activation is “real”
Predicted Responses
•
•
•
fMRI is based on the Blood Oxygenation Level Dependent (BOLD) response
It takes about 5 sec for the blood to catch up with the brain
We can model the predicted activation in one of two ways:
1. shift the boxcar by approximately 5 seconds (2 images x 2 seconds/image = 4 sec,
close enough)
2. convolve the boxcar with the hemodynamic response to model the shape of the true
function as well as the delay
PREDICTED ACTIVATION IN VISUAL AREA
BOXCAR
SHIFTED
CONVOLVED
WITH HRF
fMRI for Dummies
PREDICTED ACTIVATION IN OBJECT AREA
Types of Errors
Is the region truly active?
Yes
No
Does our stat test indicate
that the region is active?
Yes
HIT
Type II
Error
No
Type I
Error
Correct
Rejection
Slide modified from Duke course
p value:
probability of a Type I error
e.g., p <.05
“There is less than a 5%
probability that a voxel our
stats have declared as
“active” is in reality NOT
active
Statistical Approaches in a Nutshell
t-tests
• compare activation levels between two conditions
• use a time-shift to account for hemodynamic lag
correlations
• model activation and see whether any areas show a similar pattern
Fourier analysis
• Do a Fourier analysis to see if there is energy at your paradigm frequency
Fourier analysis images
from Huettel, Song &
McCarthy, 2004,
Functional Magnetic
Resonance Imaging
Effect of Thresholds
r = .80
64% of variance
p < 10-33
r = .50
25% of variance
p < .000001
r = .40
16% of variance
p < .000001
r = .24
6% of variance
p < .05
r=0
0% of variance
p<1
Complications
• Not only is it hard to determine what’s real, but there are all
sorts of statistical problems
Potential problems
What’s wrong with these data?
r = .24
6% of variance
p < .05
1.
data may be
contaminated by artifacts
(e.g., head motion,
breathing artifacts)
2.
.05 * 49,152 = 2457
“significant” voxels by
chance alone
3.
many assumptions of
statistics (adjacent voxels
uncorrelated with each
other; adjacent time
points uncorrelated with
one another) are false
The General Linear Model
• T-tests, correlations and Fourier analysis work for simple designs
and were common in the early days of imaging
• The General Linear Model (GLM) is now available in many software
packages and tends to be the analysis of choice
Why is the GLM so great?
• the GLM is an overarching tool that can do anything that the simpler
tests do
• you can examine any combination of contrasts (e.g., intact scrambled, scrambled - baseline) with one GLM rather than multiple
correlations
• the GLM allows much greater flexibility for combining data within
subjects and between subjects
• it also makes it much easier to counterbalance orders and discard
bad sections of data
• the GLM allows you to model things that may account for variability
in the data even though they aren’t interesting in and of themselves
(e.g., head motion)
• as we will see later in the course, the GLM also allows you to use
more complex designs (e.g., factorial designs)
A Simple Experiment
Lateral Occipital Complex
• responds when subject
views objects
Intact
Objects
Blank
Screen
TIME
One volume (12 slices) every 2 seconds for 272
seconds (4 minutes, 32 seconds)
Condition changes every 16 seconds (8 volumes)
Scrambled
Objects
What’s real?
A.
C.
B.
D.
What’s real?
• I created each of those time courses based by taking
the predictor function and adding a variable amount
of random noise
signal
=
+
noise
What’s real?
Which of the data sets below is more convincing?
Statistical Significance
• Significance depends on
– signal (differences between conditions)
– noise (other variability)
– sample size (more time points are more convincing)
Let’s create a time course for one LO voxel
We’ll begin with activation
Response to Intact Objects is 4X greater than Scrambled Objects
Then we’ll assume that our modelled activation is off
because a transient component
Our modelled activation could be off for other reasons
All of the following could lead to inaccurate models
• different shape of function
• different width of function
• different latency of function
Variability of HRF
Aguirre, Zarahn & D’Esposito, 1998
• HRF shows considerable variability between subjects
different subjects
• Within subjects, responses are more consistent, although there is still
some variability between sessions
same subject, same session
same subject, different session
Now let’s add some variability due to head motion
…though really motion is more complex
•
Head motion can be quantified with 6 parameters given in any motion correction
algorithm
–
–
–
–
–
–
x translation
y translation
z translation
xy rotation
xz rotation
yz rotation
•
For simplicity, I’ve only included parameter one in our model
•
Head motion can lead to other problems not predictable by these parameters
Now let’s throw in a pinch of linear drift
• linear drift could arise from magnet noise (e.g., parts warm up)
or physiological noise (e.g., subject’s head sinks)
and then we’ll add a dash of low frequency noise
•
•
low frequency noise can arise from magnet noise or physiological noise (e.g.,
subject’s cycles of alertness/drowsiness)
low frequency noise would occur over a range of frequencies but for simplicity,
I’ve only included one frequency (1 cycle per run) here
– Linear drift is really just very low frequency noise
and our last ingredient… some high frequency noise
•
high frequency noise can arise from magnet noise or physiological noise (e.g.,
subject’s breathing rate and heartrate)
When we add these all together, we get a realistic time
course
Now let’s be the experimenter
•
•
•
First, we take our time course and normalize it using z scores
z = (x - mean)/SD
normalization leads to data where
– mean = zero
– SD = 1
We create a GLM with 2 predictors
× 1
=
+
+
× 2
fMRI Signal
“our data”
=
=
Design Matrix x Betas
“what we
CAN
explain”
x
“how much of
it we CAN
explain”
+
Residuals
+
“what we
CANNOT
explain”
Statistical significance is basically a ratio of
explained to unexplained variance
Implementation of GLM in SPM
 Time
Many thanks to Øystein Bech Gadmar for
creating this figure in SPM
Intact
Predictor
•
•
•
Scrambled
Predictor
SPM represents time as going down
SPM represents predictors within the design matrix as grayscale plots (where black = low,
white = high) over time
SPM includes a constant to take care of the average activation level throughout each run
Effect of Beta Weights
• Adjustments to the beta weights have the effect of
raising or lowering the height of the predictor while
keeping the shape constant
The beta weight is not a correlation
• correlations measure goodness of fit regardless of
scale
• beta weights are a measure of scale
We create a GLM with 2 predictors
when 1=2
=
+
+
when 2=0.5
fMRI Signal
“our data”
=
=
Design Matrix x Betas
“what we
CAN
explain”
x
“how much of
it we CAN
explain”
+
Residuals
+
“what we
CANNOT
explain”
Statistical significance is basically a ratio of
explained to unexplained variance
Correlated Predictors
• Where possible, avoid predictors that are highly correlated with
one another
• This is why we NEVER include a baseline predictor
– baseline predictor is almost completely correlated with the sum of
existing predictors
+
=
r = -.53
r = -.53
r = -.95
Two stimulus predictors
Baseline predictor
Which model accounts for this data?
xβ=1
+
xβ=0
OR
+
xβ=1
+
xβ=0
+
xβ=0
•
Because the predictors are highly correlated, you can’t tell which
model is best
x β = -1
Contrasts in the GLM
• We can examine whether a single predictor is significant
(compared to the baseline)
• We can also examine whether a single predictor is
significantly greater than another predictor
A Real Voxel
•
Here’s the time course from a voxel that was significant in the +Intact Scrambled comparison
Maximizing Your Power
signal
=
As we saw earlier, the GLM is
basically comparing the amount of
signal to the amount of noise
How can we improve our stats?
• increase signal
• decrease noise
• increase sample size (keep subject in longer)
+
noise
How to Reduce Noise
• If you can’t get rid of an artifact, you can include it as a
“predictor of no interest” to soak up variance
Example: Some people
include predictors from the
outcome of motion correction
algorithms
Corollary: Never leave out
predictors for conditions
that will affect your data
Reducing Residuals
What’s this #*%&ing reviewer
complaining about?!
•
Particularly if you do voxelwise stats, you have to be careful
to follow the accepted standards of the field. In the past few
years the following approaches have been recommended
by the stats mavens:
1. Correction for multiple comparisons
2. Correction for serial correlations
3. Random effects analyses
Correction for Multiple Comparisons
With conventional probability levels (e.g., p < .05) and a huge number of
comparisons (e.g., 64 x 64 x 12 = 49,152), a lot of voxels will be
significant purely by chance
e.g., .05 * 49,152 = 2458 voxels significant due to chance
How can we avoid this?
1) Bonferroni correction
•
divide desired p value by number of comparisons
Example:
desired p value: p < .05
number of voxels: 50,000
required p value: p < .05 / 50,000  p < .000001
•
•
quite conservative
can use less stringent values
•
•
e.g., Brain Voyager can use the number of voxels in the cortical surface
small volume correction: use more liberal thresholds in areas of the brain which you
expected to be active
Correction for Multiple Comparisons
2) Gaussian field theory
• Fundamental to SPM
• If data are very smooth, then the chance of noise points passing
threshold is reduced
• Can correct for the number of “resolvable elements” (“resels”)
rather than number of voxels
Slide modified from Duke course
3) Cluster correction
•
•
•
falsely activated voxels should be randomly dispersed
set minimum cluster size to be large enough to make it unlikely that a cluster of that size
would occur by chance
assumes that data from adjacent voxels are uncorrelated (not true)
4) Test-retest reliability
•
•
•
Perform statistical tests on each half of the data
The probability of a given voxel appearing in both purely by chance is the square of the p
value used in each half
e.g., .001 x .001 = .000001
Alternatively, use the first half to select an ROI and evaluate your hypothesis in the
second half.
5) Poor man’s Bonferroni
•
•
•
Jack up the threshold till you get rid of the schmutz (especially in air, ventricles, white
matter)
If you have a comparison where one condition is expected to produce much more activity
than the other, turn on both tails of the comparison
Jody’s rule of thumb: “If ya can’t trust the negatives, can ya trust the positives?”
Example: MT localizer data
Moving rings > stationary rings (orange)
Stationary rings > moving rings (blue)
6) False discovery rate
Is the region truly active?
Yes
No
Yes
•
HIT
Type I
Error
No
•
Genovese et al, 2002, NeuroImage
also “controls the proportion of rejected hypotheses that are falsely rejected”
standard p value (e.g., p < .01) means that a certain proportion of all voxels will be
significant by chance (1%)
FDR uses q value (e.g., q < .01), meaning that a certain proportion of the “activated”
(colored) voxels will be significant by chance (1%)
works in theory, though in practice, my lab hasn’t been that satisfied
Does our stat test indicate
that the region is active?
•
•
•
Type II
Error
Correct
Rejection
Correction for Temporal Correlations
Statistical methods assume that each of our time points is independent.
In the case of fMRI, this assumption is false.
Even in a screen saver scan, activation in a voxel at one time is correlated
with it’s activation within ~6 sec
This fact can artificially inflate your statistical significance.
Autocorrelation function
original
To calculate the magnitude of the
problem, we can compute the
autocorrelation function
shift by 1
volume
For a voxel or ROI, correlate its time
course with itself shifted in time
shift by 2
volumes
time
If there’s no autocorrelation, function should
drop from 1 to 0 abruptly – pink line
the points circled in yellow suggest there is
some autocorrelation, especially at a shift of
1, called AR(1)
Plot these correlations by the degree of
shift
BV can correct for the autocorrelation
to yield revised (usually lower) p values
BEFORE
AFTER
Brains are Heterogeneous
Slide from Duke course
Talairach Coordinate System
Talairach & Tournoux, 1988
• made an atlas based on one brain and
mapped it out
Note: That’s TalAIRach, not TAILarach!
•a ny brain can be squished or stretched to
fit hers and locations can be described
using a 3D coordinate system (x, y, z)
Deform brain into Talairach space
Mark 8 points in the brain:
• anterior commisure
• posterior commisure
• front
• back
• top
• bottom (of temporal lobe)
• left
• right
Squish or stretch brain to fit in “shoebox”
of Tal system
y<0
AC=0
y
y>0
z
y>0
ACPC=0
y<0
x
Extract 3 coordinates relative to AC
Collapsed Fixed Effects Models
• assume that the experimental manipulation has same effect in each
subject
• treats all data as one concatenated set with one beta per predictor
(collapsed across all subjects)
• e.g.,
Intact = 2
Scrambled = .5
• strong effect in one subject can lead to significance even when others
show weak or no effects
• you can say that effect was significant in your group of subjects but
cannot generalize to other subjects that you didn’t test
Separate Subjects Models
•
one beta per predictor per subject
• e.g.,
•
•
JC: Intact = 2.1
JC: Scrambled = 0.2
DQ: Intact = 1.5
DQ: Scrambled = 1.0
KV: Intact = 1.2
KV: Scrambled = 1.3
weights each subject equally
makes data less susceptible to effects of one rogue subject
Random Effects Analysis
•
Typical fMRI stats test whether the differences between conditions are significant
in the sample of subjects we have tested
•
Often, we want to be able to generalize to the population as a whole including all
potential subjects, not just the ones we tested
•
Random effects analyses allow you to generalize to the population you tested
underpaid graduate students in need of a few bucks!
•
Brain Voyager recommends you don’t even toy with random effects unless you’ve
got 10 or more subjects (and 50+ is best)
•
Random effects analyses can really squash your data, especially if you don’t have
many subjects. Sometimes we refer to the random effects button as the “make
my activation go away” button.
•
Reviewers are now requesting random effects analyses more frequently
•
You don’t have to worry about it if you’re using the ROI approach because (1)
presumably the ROI has already been well-established across multiple labs; and
(2) posthoc analyses of results in an ROI approach allow you to generalize to the
population (assuming you include individual variance)
Fixed vs. Random Effects GLM
Sample Data #1
Sample Data #2
Subject
Intact
beta
Scram
beta
Diff
Subject
Intact
beta
Scram
beta
Diff
1
4
3
1
1
4
3
1
2
2
1
1
2
2
3
-1
3
4
3
1
3
4
1
3
SUM
10
7
3
SUM
10
7
3
•
Fixed Effects GLM cannot tell the difference between these data sets because
(Intact sum - Scram sum) is the same in both cases
•
In Random Effects GLM, Data set #1 would be more likely to be significant
because all 3 subjects show a trend in the same direction (intact > scrambled),
whereas in data set #2, only 2 of 3 subjects show a difference in that direction
To Localize or Not to Localize?
Neuroimagers can’t even
agree how to SPELL
localiser/localizer!
Methodological Fundamentalism
The latest review I received…
Approach #1:
Voxelwise Statistics
1. You don’t necessarily need a priori hypotheses (though
sometimes you can use less conservative stats if you have them)
2. Average all of your data together in Talairach space
3. Compare two (or more) conditions using precise statistical
procedures within every voxel of the brain. Any area that passes
a carefully determined threshold is considered real.
4. Make a list of these areas and publish it.
This is the tricky part!
Voxelwise Approach: Example
•
•
•
Malach et al., 1995, PNAS
Question: Are there areas of the human brain that are more responsive to
objects than scrambled objects
You will recognize this as what we now call an LO localizer, but Malach was the
first to identify LO
LO activation is shown in red, behind MT+
activation in green
LO (red) responds more to objects, abstract sculptures
and faces than to textures, unlike visual cortex (blue)
which responds well to all stimuli
The Danger of Voxelwise Approaches
•
•
•
This is one of two tables from a paper
Some papers publish tables of activation two pages long
How can anyone make sense of so many areas?
Source: Decety et al., 1994, Nature
Approach #2:
Region of interest (ROI) analysis
•
If you are looking at a well-established area (such as visual cortex,
motor cortex, or the lateral occipital complex), it’s fairly easy to
activate and identify the area
1.
Do the stats and play with the threshold till you get something
believable in the right vicinity based on anatomical location (e.g.,
sulcal landmarks) or functional location (e.g., Talairach coordinates
from prior studies)
2.
Once you have found the ROI, do independent experiments, extract
the time course information and determine whether activation
differences between conditions are significant
–
Because the runs that are used to generate the area are independent from
those used to test the hypothesis, liberal statistics (p < .05) can be used
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research
Does the Lateral Occipital Complex compute object shape for grasping?
Step 1: Localize LOC
Intact
Objects
Scrambled Objects
Example of ROI Approach
Culham et al., 2003, Experimental Brain Research
Does the Lateral Occipital Complex compute object shape for grasping?
Step 2: Extract LOC data from experimental runs
Grasping
Reaching
NS
p = .35
NS
p = .31
Example of ROI Approach
Very Simple Stats
% BOLD Signal
Change
Left Hem. LOC
Subject
Extract average peak
from each subject for
each condition
Grasping
1
0.02
0.03
2
0.19
0.08
3
0.04
0.01
4
5
•
•
Reaching
0.10
NS
p = .35
1.01
Then simply do a
paired t-test to see
whether the peaks are
significantly different
between conditions
0.32
NS
p = .31
-0.27
6
0.16
0.09
7
0.19
0.12
Instead of using % BOLD Signal Change, you can use beta weights
You can also do a planned contrast in Brain Voyager using a module
called the ROI GLM
Utility of Doing Both Approaches
• We also verified the result with a voxelwise approach
– Why might ROI alone not suffice?
– Why might voxelwise alone not suffice?
Verification of no LOC
activation for grasping
> reaching even at
moderate threshold (p
< .001)
Example: The Danger of ROI Approaches
•
•
Example 1: LOC may be a heterogeneous area with subdivisions; ROI
analyses gloss over this
Example 2: We looked at another example of an ROI analysis last
week: Kanwisher et al.’s paper on the fusiform face area (FFA). Since
then, growing evidence suggests another area, the occipital face area
(OFA) may also be critical to face perception. The OFA was overlooked
in the original study because it was less robust and reliable than the
FFA.
Comparing the two approaches
Voxelwise Analyses
•
•
•
•
•
•
•
Requires no prior hypotheses about areas involved
Includes entire brain
Often neglects individual differences
Can lose spatial resolution with intersubject averaging
Can produce meaningless “laundry lists of areas” that are difficult to interpret
You have to be fairly stats-savvy and include all the appropriate statistical corrections to be
certain your activation is really significant
Popular in Europe
Region of Interest (ROI) Analyses
•
•
•
•
•
•
•
•
Extraction of ROI data can be subjected to simple stats (no need for multiple comparisons,
autocorrelation or random effects corrections)
Gives you more statistical power (e.g., p < .05)
Hypothesis-driven
•
Useful when hypotheses are motivated by other techniques (e.g., electrophysiology) in
specific brain regions
ROI is not smeared due to intersubject averaging
Easy to analyze and interpret
Neglects other areas which may play a fundamental role
If multiple ROIs need to be considered, you can spend a lot of scan time collecting localizer
data (thus limiting the time available for experimental runs)
Popular in North America
A Proposed Resolution
• There is no reason not to do BOTH ROI analyses and
voxelwise analyses
– ROI analyses for well-defined key regions
– Voxelwise analyses to see if other regions are also involved
• Ideally, the conclusions will not differ
• If the conclusions do differ, there may be sensible reasons
– Effect in ROI but not voxelwise
• perhaps region is highly variable in stereotaxic location between subjects
• perhaps voxelwise approach is not powerful enough
– Effect in voxelwise but not ROI
• perhaps ROI is not homogenous or is context-specific
Hypothesis- vs. Data-Driven Approaches
Hypothesis-driven
Examples: t-tests, correlations, general linear model (GLM)
• a priori model of activation is suggested
• data is checked to see how closely it matches components of the model
• most commonly used approach
Data-driven
Example: Independent Component Analysis (ICA)
• no prior hypotheses are necessary
• multivariate techniques determine the patterns in the data that account for the most
variance across all voxels
• can be used to validate a model (see if the math comes up with the components you
would’ve predicted)
• can be inspected to see if there are things happening in your data that you didn’t
predict
• can be used to identify confounds (e.g., head motion)
• need a way to organize the many possible components
JODY – THE FOLLOWING SLIDES ARE
FROM THE OLD 4 DUMMMIES PAGE
GLM
Express
Advantages of General Linear Model (GLM)
• can perform data analysis within and between subjects without needing
to average the data itself
• allows you to counterbalance orders
• can perform more sophisticated analyses (e.g., 2 factor ANOVA with
interactions)
• easier to work with (do one GLM vs. many t-tests and correlations)
Formal Statistics
•
Formal statistics are just doing what your eyeball test of significance did
– Estimate how likely it is that the signal is real given how noisy the data is
•
confidence: how likely is it that the results could occur purely due to
chance?
•
“p value” = probability value
– If “p = .03”, that means there is a .03/1 or 3% chance that the results are
bogus
•
By convention, if the probability that a result could be due to chance is
less than 5% (p < .05), we say that result is statistically significant
•
Significance depends on
– signal (differences between conditions)
– noise (other variability)
– sample size (more time points are more convincing)
Effect of Beta Weights
• Adjustments to the beta weights have the effect of raising or
lowering the height of the predictor while keeping the shape
constant
Correlated Predictors
• Where possible, avoid predictors that are correlated with
one another
• This is why we NEVER include a baseline predictor
– baseline predictor is almost completely correlated (r = -1) with the
sum of our two existing predictors
– if we included a baseline predictor, the model would have problems
assigning variance to stimulus predictors vs. baseline predictors
• for example, the model could not distinguish between two possibilities
(e.g., Beta1=1, Beta2=0 vs. Beta1=0, Beta2=-1)
Stimulus predictor
Baseline predictor
Extending the GLM
• So far, we have considered the GLM for one run in
one subject
• The same logic can be applied to multiple runs and
multiple subjects
GLM Stats
For any given region, we can evaluate the GLM stats
blue: original time course
=
green: best fitting model
+
red: residuals
total length of sequence = 4 runs * 155 volumes = 620 volumes
Fixed Effects Models
• assume that the experimental manipulation has same effect in each
subject
• treats all data as one concatenated set with one beta per predictor
(collapsed across all subjects)
• e.g.,
Intact = 2
Scrambled = .5
• strong effect in one subject can lead to significance even when others
show weak or no effects
• you can say that effect was significant in your group of subjects but
cannot generalize to other subjects that you didn’t test
Random Effects Models
Separate subjects models
• one beta per predictor per subject
• e.g.,
JC: Intact = 2.1
JC: Scrambled = 0.2
DQ: Intact = 1.5
DQ: Scrambled = 1.0
KV: Intact = 1.2
KV: Scrambled = 1.3
Random effects models (extension of separate subjects, but how?)
• each subject is treated as if they were randomly sampled from the population
• uses separate subjects model to estimate variability between subjects and
determine whether the mean effect within the population differs from zero
• can allow you to generalize to population as a whole
– well, actually, it only allows you to generalize to the population you tested, likely
underpaid graduate students in need of a few bucks and the labmates that you talked
into helping you
•
•
don’t even think of doing it unless you have at least 10 subjects
becoming much more frequently required by reviewers
What’s this #*%&ing reviewer complaining
about?!
•
Particularly if you do whole brain stats, you have to be
careful to follow the accepted standards of the field. In the
past few years the following approaches have been
recommended by the stats mavens:
1. Correction for multiple comparisons
2. Correction for serial correlations
3. Random effects analyses
How to Increase Signal
• Go with a stronger magnet (4T rules!)
• Go with a better coil (surface coil)
– Tradeoff: lose other areas
• Take bigger voxels
– Tradeoff: lose spatial resolution, if voxels are too big, you
lose signal
• Make sure your subject is paying attention to the
stimuli/task
– e.g., “1 back” task: hit button after every stimulus repetition
– Tradeoff: Attention itself can distort the activation
– Caveat: need to be sure attention is comparable across
conditions
How to Reduce Noise
• Reduce artifacts (especially head motion)
– Test experienced subjects
– Give your subjects clear instructions (e.g., try not to swallow)
– Consider motion correction algorithms (Jody’s opinion: garbage in, garbage
out, when in doubt throw it out; exception: rare subjects such as patients)
• Spatially smooth the data
– Tradeoff: lose resolution
• Choose your signal power in a frequency with low noise
– high noise at low frequencies (so don’t use long epochs: optimum epoch
duration is ~16 sec)
– high noise at breathing frequency (once every 4-10 seconds)
– try to have multiple repetitions of conditions within each run
How to Reduce Noise
You wanna be here!
~16 sec cycles
Linear Trend
Probable
Respiration
Artifact
low freq
noise
and
drift
breathing
every
4-10 sec
heartbeat
every
~sec
Source: Smith chapter in Functional MRI: An Introduction to Methods
Low frequency
noise
Smaller residuals, better data
• The more variance we can explain (i.e., the less variance that
goes into the residuals), the greater our statistical significance
How can we explain more variance?
• Filter unwanted noise
– filter low frequencies (especially linear drift)
• be careful that you know what you’re doing at don’t filter out
frequencies in your paradigm
– filtering high frequencies is “cheating” because it violates statistical
assumptions
• Use accurate models of HRF
• Include predictors of no interest
– even if you don’t care about which voxels are correlated with a
predictor (e.g., stimulus events that are irrelevant to your
hypothesis, head motion), it may be worth including the predictor to
“soak up” variability and reduce your residuals
JODY – THE FOLLOWING SLIDES ARE
FROM THE NORWAY PAGE
fMRI Statistics with Emphasis on the
General Linear Model
Hypothesis- vs. Data-Driven Approaches
Hypothesis-driven
Examples: t-tests, correlations, general linear model (GLM)
• a priori model of activation is suggested
• data is checked to see how closely it matches components of the model
• most commonly used approach
Data-driven
Example: Independent Component Analysis (ICA)
• no prior hypotheses are necessary
• multivariate techniques determine the patterns in the data that account for the most
variance across all voxels
• can be used to validate a model (see if the math comes up with the components you
would’ve predicted)
• can be inspected to see if there are things happening in your data that you didn’t
predict
• can be used to identify confounds (e.g., head motion)
• need a way to organize the many possible components
• new and upcoming
Approach #1:
Voxelwise Statistics
1. You don’t necessarily need a priori hypotheses (though
sometimes you can use less conservative stats if you have them)
2. Average all of your data together in Talairach space
3. Compare two (or more) conditions using precise statistical
procedures within every voxel of the brain. Any area that passes
a carefully determined threshold is considered real.
4. Make a list of these areas and publish it.
This is the tricky part!
Voxelwise Approach: Example
•
•
•
Malach et al., 1995, PNAS
Question: Are there areas of the human brain that are more responsive to
objects than scrambled objects
You will recognize this as what we now call an LO localizer, but Malach was the
first to identify LO
LO activation is shown in red, behind MT+
activation in green
LO (red) responds more to objects, abstract sculptures
and faces than to textures, unlike visual cortex (blue)
which responds well to all stimuli
The Danger of Voxelwise Approaches
•
•
•
This is one of two tables from a paper
Some papers publish tables of activation two pages long
How can anyone make sense of so many areas?
Source: Decety et al., 1994, Nature
Approach #2:
Region of interest (ROI) analysis
•
If you are looking at a well-established area (such as visual cortex,
motor cortex, or the visual motion area MT+), it’s fairly easy to activate
and identify the area
1.
Do the stats and play with the threshold till you get something
believable in the right vicinity based on anatomical location (e.g.,
sulcal landmarks) or functional location (e.g., Talairach coordinates
from prior studies)
2.
Once you have found the ROI, do independent experiments, extract
the time course information and determine whether activation
differences between conditions are significant
–
Because the runs that are used to generate the area are independent from
those used to test the hypothesis, liberal statistics (p < .05) can be used
Example of ROI approach
Tootell et al, 1995, Nature, Motion Aftereffect
Are motion-responsive areas of the brain active during illusory motion?
Run #1: Low contrast moving rings vs.
stationary rings
Runs #2, 3 and 4: Alternate between two new conditions,
unidirectional motion (which causes a motion aftereffect)
and bidirectional motion (which does not cause an
aftereffect). Look at activation during subsequent
stationary pattern.
MT
We can be fairly sure this is MT+ because:
• nothing else in that vicinity is activated by
low contrast motion
• the Talairach coordinates match previous
studies
• the activation is at the junction of the
inferior temporal sulcus and lateral
occipital sulcus
Source: Tootell et al., 1995
illusion
no
illusion
Example: The Danger of ROI Approaches
Source: Sunaert et al., 1999, Exp. Brain Res.
• At least 17 areas of the brain respond to visual motion!
• Why is MT+ considered THE motion area? Why do ~80% of
fMRI experiments on motion only look at MT+? What are these
other areas doing?
Comparing the two approaches
Voxelwise Analyses
•
•
•
•
•
•
•
Requires no prior hypotheses about areas involved
Includes entire brain
Often neglects individual differences
Can lose spatial resolution with intersubject averaging
Can produce meaningless “laundry lists of areas” that are difficult to interpret
You have to be fairly stats-savvy and include all the appropriate statistical corrections to be
certain your activation is really significant
Popular in Europe
Region of Interest (ROI) Analyses
•
•
•
•
•
•
•
•
Extraction of ROI data can be subjected to simple stats (no need for multiple comparisons,
autocorrelation or random effects corrections)
Gives you more statistical power (e.g., p < .05)
Hypothesis-driven
•
Useful when hypotheses are motivated by other techniques (e.g., electrophysiology) in
specific brain regions
ROI is not smeared due to intersubject averaging
Easy to analyze and interpret
Neglects other areas which may play a fundamental role
If multiple ROIs need to be considered, you can spend a lot of scan time collecting localizer
data (thus limiting the time available for experimental runs)
Popular in North America
NOTE: Though different experimenters tend to prefer one method over the other, they are NOT
mutually exclusive. You can check ROIs you predicted and then check the data for other areas.
What data do we have to work with?
Typical experiment:
• 64 voxels x 64
voxels x 12 slices x
136 time points
• Can think of as 136
volumes
• Or can think of as
64x64x12 = 49,152
voxels, each with a
time course
Correlated Predictors
• Where possible, avoid predictors that are correlated with one
another
• This is why we NEVER include a baseline predictor
– baseline predictor is almost completely correlated (r = -1) with the
sum of our two existing predictors
– if we included a baseline predictor, the model would have problems
assigning variance to stimulus predictors vs. baseline predictors
Sum of our two stimulus predictors
Baseline predictor
Which model accounts for this data?
xβ=1
xβ=0
Stimulus predictors
Stimulus predictors
OR
+
+
xβ=0
Baseline predictor
•
xβ=1
Baseline predictor
Because the predictors are highly correlated, you can’t tell which model is best
Multisubject Analyses
If we had additional subjects, we could compute a mutlisubject GLM
one predictor/condition
one predictor/condition/subject
one predictor/condition/run
Analysis across multiple subjects requires averaging brains in a common
space:
-most common: Talairach coordinate system
-collapse brains into a constant size shoebox so they are all the same
size
-will be covered later
You can also compare data between populations (e.g., schizophrenics vs.
normals, young vs. elderly) but that won’t be covered in this course.
Correction for Multiple Comparisons
• If we use a nominal p value of .05 but do ~50,000 times,
there’s a high probability we’ll get some bogus activation.
How can we deal with this?
3) Cluster correction
•
•
•
falsely activated voxels should be randomly dispersed
set minimum cluster size to be large enough to make it unlikely that a cluster of that size
would occur by chance
assumes that data from adjacent voxels are uncorrelated (not true)
4) Test-retest reliability
•
•
•
Perform statistical tests on each half of the data
The probability of a given voxel appearing in both purely by chance is the square of the p
value used in each half
e.g., .001 x .001 = .000001
Alternatively, use the first half to select an ROI and evaluate your hypothesis in the
second half.
Correction for Temporal Correlations
Statistical methods assume that each of our time points is independent.
In the case of fMRI, this assumption is false.
Even in a screen saver scan, activation in a voxel at one time is correlated
with it’s activation within ~6 sec
This fact can artificially inflate your statistical significance.