Statistical Analysis
Download
Report
Transcript Statistical Analysis
Statistical Analysis
fMRI Graduate Course
November 2, 2005
When do we not need statistical analysis?
1.2
1.15
1.1
1.05
1
0.95
0.9
0.85
0.8
1
51
101
151
201
251
Inter-ocular Trauma Test (Lockhead, personal communication)
Why use statistical analyses?
• Replaces simple subtractive methods
– Signal highly corrupted by noise
• Typical SNRs: 0.2 – 0.5
– Sources of noise
• Thermal variation (unstructured)
• Physiological, task variability (structured)
• Assesses quality of data
– How reliable is an effect?
– Allows distinction of weak, true effects from strong,
noisy effects
Statistical Parametric Maps
• 1. Brain maps of statistical quality of
measurement
– Examples: correlation, regression approaches
– Displays likelihood that the effect observed is due to
chance factors
– Typically expressed in probability (e.g., p < 0.001)
• 2. Effect size
– Determined by comparing task-related variability and
non-task-related variability
– Signal change divided by noise (SNR)
– Typically expressed as t or z statistics
What are our statistics for?
Which is more important to avoid:
Type I or Type II errors?
Simple Hypothesis-Driven Analyses
• Common
– t-test across conditions
– Fourier
– t-test at time points
– Correlation
• General Linear Model
• Other tests
– Kolmogorov-Smirnov
– Iterative Connectivity Mapping
1.15
t – Tests
across
Conditions
1.1
• Compares difference between
means to1.05
population variability
– Uses t distribution
– Defined as1the likely distribution
due to chance between samples
drawn from
0.95 a single population
• Commonly used across
conditions0.9
in blocked designs
0.85
• Subset of general linear model
0.8
1
51
5%
101
151
Drift Artifact and t-Test
Fourier Analysis
• Fourier transform: converts information in time domain to
frequency domain
– Used to change a raw time course to a power spectrum
– Hypothesis: any repetitive/blocked task should have power at the
task frequency
• BIAC function: FFTMR
– Calculates frequency and phase plots for time series data.
• Equivalent to correlation in frequency domain
• Subset of general linear model
– Same as if used sine and cosine as regressors
Power
12s on, 12s off
Frequency (Hz)
Left-Right
Right-Left
3%
2%
1%
0%
-1%
-2%
-3%
Left-Right
Right-Left
3%
2%
1%
0%
-1%
-2%
-3%
1 12 23 34 45 56 67 78 89 100 111 122
1 12 23 34 45 56 67 78 89 100 111 122
Image
Image
1600
1600
1200
1200
800
800
400
400
0
0
0.005 0.047 0.089 0.130 0.172 0.214 0.255 0.297
0.005 0.047 0.089 0.130 0.172 0.214 0.255 0.297
Frequency
Frequency
1400
1400
1200
1000
Spectral Power at 0.058 Hz
800
600
400
200
1000
800
600
400
200
0
Phase Angle (Degrees)
340
320
300
280
260
240
220
200
180
160
140
0
120
Phase Angle (Degrees)
80
120 150 180 210 240 270 300 330
100
90
60
60
40
30
0
0
20
Spectral Power at 0.058 Hz
1200
t / z – Tests across Time Points
• Determines whether a single data point in
an epoch is significantly different from
baseline
• BIAC Tool: tstatprofile
– Creates:
• Avg_V*.img
• StdDev_V*.img
• ZScore_V*.img
35
30
25
20
15
10
5
0
-5
-5
-10
-15
-20
-4
-3
-2
-1
0
1
2
3
4
5
6
7
8
9
10
11
12
13
Correlation
• Special case of General Linear Model
– Blocked t-test is equivalent to correlation with square
wave function
– Allows use of any reference waveform
• Correlation coefficient describes match between
observation and expectation
– Ranges from -1 to 1
– Amplitude of response does not affect correlation
directly
• BIAC tool: tstatprofile
Problems with Correlation Approaches
• Limited by choice of HDR
– Poorly chosen HDR can significantly impair power
• Examples from previous weeks
– May require different correlations across subjects
• Assume that correlation template is Gaussian
• Assume random variation around HDR
– Do not model variability contributing to noise (e.g.,
scanner drift)
• Such variability is usually removed in preprocessing steps
– Do not model interactions between successive events
Kolmogorov – Smirnov (KS) Test
• Statistical evaluation of differences in cumulative
density function
– Cf. t-test evaluates differences in mean
• Useful if distributions have same mean but
different shape
A
B
C
Iterative Connectivity Mapping
• Acquire two data sets
– 1: Defines regions of interest
and hypothetical connections
– 2: Evaluates connectivity
based on low frequency
correlations
• Use of Continuous Data
Sets
– Null Data
– Task Data
– Can see connections
between functional areas
(e.g., between Broca’s and
Wernicke’s Areas)
Hampson et al., Hum. Brain. Map., 2002
Use of Continuous Tasks to
Evaluate Functional Connectivity
Hampson et al., Hum. Brain. Map., 2002
The General Linear Model
Basic Concepts of the GLM
• GLM treats the data as a linear combination of
model functions plus noise
– Model functions have known shapes
– Amplitude of functions are unknown
– Assumes linearity of HDR; nonlinearities can be
modeled explicitly
• GLM analysis determines set of amplitude
values that best account for data
– Usual cost function: least-squares deviance of
residual after modeling (noise)
Signal, noise, and the General
Linear Model
Y M
Amplitude (solve for)
Measured Data
Noise
Design Model
Cf. Boynton et al., 1996
Form of the GLM
Model
Model Functions
*
Amplitudes
+
Noise
=
N Time Points
Data
N Time Points
Model Functions
Implementation of GLM in SPM
Images
Model Parameters
The Problem of Multiple
Comparisons
The Problem of Multiple
Comparisons
P < 0.05 (1682 voxels)
P < 0.01 (364 voxels)
P < 0.001 (32 voxels)
A
t = 2.10, p < 0.05 (uncorrected)
B
C
t = 3.60, p < 0.001 (uncorrected)
t = 7.15, p < 0.05,
Bonferroni Corrected
Options for Multiple Comparisons
• Statistical Correction (e.g., Bonferroni)
– Gaussian Field Theory
– False discovery rate
• Cluster Analyses
• ROI Approaches
Statistical Corrections
• If more than one test is made, then the collective
alpha value is greater than the single-test alpha
– That is, overall Type I error increases
• One option is to adjust the alpha value of the
individual tests to maintain an overall alpha
value at an acceptable level
– This procedure controls for overall Type I error
– Known as Bonferroni Correction
1.2
1
0.1
1
Adjusted Alpha
0.6
0.001
0.0001
0.00001
0.4
0.000001
0.2
0.0000001
81
92
16
38
4
32
76
8
65
53
6
13
10
72
26
21
44
52
42
88
10
48
57
6
40
96
20
48
10
24
51
2
25
6
12
8
64
32
16
8
4
0.00000001
2
0
Number of Comparisons
Corrected Alpha Value
Type I Probability
0.8
1
Probability of Type I Error
0.01
Bonferroni Correction
• Very severe correction
– Results in very strict significance values for even
medium data sets
– Typical brain may have about 15,000-20,000
functional voxels
• PType1 ~ 1.0 ; Corrected alpha ~ 0.000003
• Greatly increases Type II error rate
• Is not appropriate for correlated data
– If data set contains correlated data points, then the
effective number of statistical tests may be greatly
reduced
– Most fMRI data has significant correlation
Gaussian Field Theory
• Approach developed by Worsley and
colleagues to account for multiple
comparisons
– Forms basis for much of SPM
• Provides false positive rate for fMRI data
based upon the smoothness of the data
– If data are very smooth, then the chance of
noise points passing threshold is reduced
Cluster Analyses
• Assumptions
– Assumption I: Areas of true fMRI activity will typically
extend over multiple voxels
– Assumption II: The probability of observing an
activation of a given voxel extent can be calculated
• Cluster size thresholds can be used to reject
false positive activity
– Forman et al., Mag. Res. Med. (1995)
– Xiong et al., Hum. Brain Map. (1995)
How many foci of activation?
Data from motor/visual event-related task (used in laboratory)
How large should clusters be?
• At typical alpha values, even small cluster sizes
provide good correction
– Spatially Uncorrelated Voxels
• At alpha = 0.001, cluster size 3 reduces Type 1 rate to <<
0.00001 per voxel
– Highly correlated Voxels
• Smoothing (FW = 0.5 voxels) increases needed cluster size
to 7 or more voxels
• Efficacy of cluster analysis depends upon shape
and size of fMRI activity
– Not as effective for non-convex regions
– Power drops off rapidly if cluster size > activation size
Data from Forman et al., 1995
False Discovery Rate
• Controls the expected proportion of false
positive values among suprathreshold values
– Genovese, Lazar, and Nichols (2002, NeuroImage)
– Does not control for chance of any face positives
• FDR threshold determined based upon
observed distribution of activity
– So, sensitivity increases because metric becomes
more lenient as voxels become significant
– Weak familywise Type I error rate
ROI Comparisons
• Changes basis of statistical tests
– Voxels: ~16,000
– ROIs : ~ 1 – 100
• Each ROI can be thought of as a very large
volume element (e.g., voxel)
– Anatomically-based ROIs do not introduce bias
• Potential problems with using functional ROIs
– Functional ROIs result from statistical tests
– Therefore, they cannot be used (in themselves) to
reduce the number of comparisons
Are there differences between
voxel-wise and ROI analyses?
Summary of Multiple Comparison
Correction
• Basic statistical corrections are often too severe
for fMRI data
• What are the relative consequences of different
error types?
– Correction decreases Type I rate: false positives
– Correction increases Type II rate: misses
• Alternate approaches may be more appropriate
for fMRI
–
–
–
–
Cluster analyses
Region of interest approaches
Smoothing and Gaussian Field Theory
False Discovery Rate
Fixed and Random Effects
Comparisons
How do we compare across subjects?
• Fixed-effects Model
– Assumes that effect is constant (“fixed”) in the population
– Uses data from all subjects to construct statistical test
– Examples
• Averaging across subjects before a t-test
• Taking all subjects’ data and then doing an ANOVA
– Allows inference to subject sample
• Random-effects Model
–
–
–
–
–
Assumes that effect varies across the population
Accounts for inter-subject variance in analyses
Allows inferences to population from which subjects are drawn
Especially important for group comparisons
Required by many reviewers/journals
How are random-effects models run?
• Assumes that activation parameters may vary across
subjects
– Since subjects are randomly chosen, activation parameters may
vary within group
– Fixed-effects models assume that parameters are constant
across individuals
• Calculates descriptive statistic for each subject
– i.e., t-test for each subject based on correlation
• Uses all subjects’ statistics in a one-sample t-test
– i.e., another t-test based only on significance maps
Summary of Hypothesis Tests
• Simple experimental designs
– Blocked: t-test, Fourier analysis
– Event-related: correlation, t-test at time points
• Complex experimental designs
– Regression approaches (GLM)
• Critical problem: Minimization of Type I Error
– Strict Bonferroni correction is too severe
– Cluster analyses improve
– Accounting for smoothness of data also helps
• Use random-effects analyses to allow
generalization to the population
Data Driven Analyses
Independent
Components
Analysis
IPS
FMC
pMFG
aINS
0.8
0.6
Prediction
0.4
Violation
0.2
Partial Least
Squares
0
-0.2
-0.4
-0.6
20.0%
27.5%
35.0%
42.5%
50.0%
Why conduct data-driven analyses?
• Powerful tools for exploring data
– PCA, ICA: Intrinsic, spatially stationary patterns of
activity in dataset
– Clustering: Collections of voxels with similar time
courses of activity
– PLS: How those patterns of activity maximally
differentiate experimental conditions
• Allows segmentation of nuisance factors
• Provides check on hypothesis-driven analyses