1 - Center for Imaging of Neurodegenerative Diseases

Download Report

Transcript 1 - Center for Imaging of Neurodegenerative Diseases

Medical Imaging Informatics
Lecture #10:
Clinical Perspective: Single
Subject Classification
Susanne Mueller M.D.
Center for Imaging of Neurodegenerative Diseases
Dept. of Radiology and Biomedical Imaging
[email protected]
Overview
1.Single Subject
Classification/Characterization:
Motivation and Problems
2.Bayesian Networks for Single Subject
Classification/Characterization
1 .Single Subject
Classification/Characterization
Quantitative Neuroimaging:
Group Comparisons
Temporal Lobe Epilepsy
Posttraumatic
Stress Disorder
Major Depression
Implicit assumptions of Group Comparisons:
1. Abnormal regions are relevant/characteristic for the disease process.
2. Abnormalities present in all patients, i.e., subject showing abnormalities with
disease specific distribution is likely to have the disease.
Quantitative Neuroimaging: Do
the Assumptions hold up?
Temporal Lobe Epilepsy
Posttraumatic
Stress Disorder
Major Depression
Motivation
1. Identification of different variants and/or
degrees of the disease process.
2.Translation into clinical application.
Requirements
1.Identification and extraction of
discriminating feature:
- Single region.
- Combination of regions.
2. Definition of a threshold for “abnormality”.
Goal: High sensitivity and specificity.
Sensitivity and Specificity:
Definitions I
Sensitivity: Probability that test is positive if
the patient indeed has the disease.
P (Test positive|Patient has disease)
Test ideally always detects disease.
Sensitivity and Specificity:
Definitions II
Specificity: Probability that test is negative
if the patient does not have the disease.
P (Test negative|Patient does not have
disease)
Test ideally detects only this disease and
not some other non-disease related
state/other disease
Sensitivity and Specificity
Sensitivity and specificity provide
information about a test result given that
the patient’s disease state is known.
In clinic however the patient’s disease
state is unknown and this is why the test
was done in the first place.
=> positive and negative predictive
value of the test
Positive and Negative
Predictive Value: Definition
Positive predictive value (PPV):
P (Patient has disease|Test positive)
Negative predictive value (NPV):
P (Patient does not have disease|Test
negative
Example
Disease: pos
Disease: neg Total
Test: pos
200
True pos
20
False pos
220
All pos
Test: neg
50
False neg
300
True neg
350
All neg
250
All Disease
320
No Disease
570
Total Patients
Total
Sensitivity: 0.80
Specificity: 0.94
PPV: 0.90
NPV: 0.86
Receiver Operator Curve: ROC
Sensitivity and Specificity are good
candidates to assess test accuracy.
However, they vary with the threshold (test
pos/test neg) used.
ROC is a means to compare the accuracy
of diagnostic tests over a range of
thresholds.
ROC plot sensitivity vs 1- specificity of the
test.
EXAMPLE: ROC
High threshold
Good specificity: 0.92
Medium threshold
Medium specificity: 0.7
Medium sensitivity: 0.83
sensitivity
Medium sensitivity: 0.52
Low threshold
Low specificity: 0.45
Good sensitivity: 0.95
Extreme low threshold
No specificity: 0
Perfect sensitivity: 1
1-specificity
Example: ROC II
Chance line
Example ROC
Optimal threshold
indicated by arrow
ROC of good
test
Approaches the
left corner of ROC
Feature
Definition: Information extracted from image.
Usefulness of a feature to detect the disease
is determined by
1. Convenience of measurement.
2. Accuracy of measurement.
3. Specificity for the disease (e.g.:CK-MB)
4. Number of features (single < several/feature map)
Features and Thresholds used in
Imaging for Single-Subject Analyses I
A. Single feature = Region of interest (ROI) analysis
Previous knowledge that ROI is affected by the disease comes either
from previous imaging studies or from other sources, e.g.
histopathology
Approaches used to detect abnormality for ROI analyses:
z-scores:
z = (xs – mean xc)/SDc
t-scores*:
t = (xs – mean xc)/ SDc*(n+1/n)1/2)
Bayesian estimate**:
z* = (xs – mean xc)/q1/2
Crawford and Howell 1998*; Crawford and Grathwaite 2006**
Example: ROI Analyses and
Thresholds
Hippocampal volumes corrected for intracranial volume obtained from
T1 images of 49 age matched healthy controls (mean: 3.92±0.60) and
hippocampal volume of a patient with medial temporal lobe epilepsy
3.29
z- score: -1.05 corresponds to one-tailed p = 0.147
t – score: -1.04 corresponds to one-tailed p = 0. 152
Bayesian one-tailed probability : 0.152 , i.e. 15% of the control
hippocampal volumes fall below the patient’s volume
Features and Thresholds used in
Imaging for Single-Subject Analyses II
B. Multiple features from the same source = map that encodes
severity and distribution of the disease associated abnormality.
Previous knowledge about the distribution/severity of the
abnormalities is not mandatory to generate “abnormality” map,
i.e., typically whole brain search strategy is employed. However,
previous knowledge can be helpful for the correct interpretation.
Approaches used to generate abnormality maps :
z- score maps (continuous or thresholded)
Single-case modification of the General Linear Model used for group
analyses.
Features and Thresholds used in
Imaging for single-subject analyses III
Problems:
1. Difference reflects normal individual variability rather than disease
effects.
2. Assumption that single subject represents the mean of a
hypothetical population with equal variance as observed in the
control group
3. Higher number of comparisons (multiple ROI/voxel-wise) require:
a. correction for multiple comparisons.
b. Adjustment of result at ROI/voxel level for results in immediate
neighborhood, e.g. correction at cluster level
4. Interpretation of resulting maps
Influence of Correction for Multiple
Comparison
Increase
FWE p <0.05
Decrease
Increase
FWE p <0.01
Decrease
Increase
FWE p <0.001
Decrease
Scarpazza et al. Neuroimage 2013; 70: 175 -188
Interpretation of Single Subject
Maps
Potential strategies for map interpretation:
1. Visual inspection using knowledge about typical distribution
of abnormalities in group comparisons.
2. Quantitative comparison with known abnormalities in group
comparisons, e.g. calculation of Dice’s co-efficient for whole
map.
Problems:
1. Requires existence of “disease typical pattern”.
2. Requires selection of “threshold” indicating match with typical
pattern or not.
3. Difficulties to interpret severe abnormalities that do not match
typical pattern. Atypical representation? Different disease?
Examples
Gray matter loss in TLE
compared to controls
2. Bayesian Networks for
Single Subject
Classification/Characterization
Characteristics of an Ideal
Classification System
1. Uses non-parametric, non-linear statistics.
2. Identifies characteristic severe and mild brain abnormalities
distinguishing between two groups based on their spatial proximity
and strength of association with clinical variable (e.g. group
membership)
3. Weights abnormal regions according to their ability to discriminate
between two groups.
4. Provides probability of group membership and objective threshold
based on based on congruence of individual abnormalities with
group specific abnormalities.
5. Uses expert a priori knowledge to combine information from
different sources (other imaging modalities, clinical information) for
the determination of the final group membership.
Bayesian Networks: Basics
Definition: Probabilistic graphical model defined as:
B = (G, Q)
G is directed acyclic graph (DAG) defined as G = (n, e) where n
represents the set of nodes C in the network and e the set of
directed edges that describe the probabilistic association among the
nodes.
Q is the set of all conditional probability states q that the nodes in
the network can assume.
Bayesian Networks: Basics: Simple
Network
DAG G
Event B
Event A
Joint Probability Distribution Q
A
B
Prob (A,B)
true
true
0.10
false
true
0.40
true
false
0.35
false
false
0.15
Bayesian Networks: Basics: Slightly
more complex Network
Event C
Event A
Event B
Bayesian Networks: Basics: It is
getting more complicated
A
B
F
C
D
E
I (V, Parents (V), Non-Descendents)V = any variable in the DAG
Markovian assumptions of the DAG
Bayesian Networks: Basics: It is
getting more complicated
A
B
C
D
E
Bayesian Networks: Basics: It is
getting more complicated
A
B
C
D
E
Bayesian Networks: Inference I:
Probability of Evidence Query
A
B
C
E
D
True
True
Prob: 0.30
Bayesian Networks: Inference II: Prior
and Posterior Marginal Query
Definition: Marginal: projection of the joint distribution on a smaller set of variables
If joint probability distribution is Pr(x1,….,xn), then marginal distribution Pr(x1,….,xm),
m≤n is defined as:
Pr(x1,….,xm) =
S Pr(x ,…,x )
Xm+1,….,xn
prior marginal
posterior marginal
1
n
True = 0.60, False = 0.4
True = 0.92, False = 0.08
A
True=0.42, False = 0.58
True=0.24, False = 0.76
B
True = 0.70, False = 0.3
True = 0.84, False = 0.16
True =0.52, False = 0.48
True =1.0, False = 0.00
C
D
E
True =0.36, False = 0.64
Evidence = True
Bayesian Networks: Inference III:
Most Probable Explanation (MEP) and
Maximal a posteriori Hypothesis (MAP)
Definition:
MEP = Given evidence for one network variable, instantiation of all other network
variables for which probability of the given variable is maximal
MAP = Given evidence for one network variable, instantiation of a subset of
network variables for which probability of the given variable is maximal
A
B
Evidence mpe: D = true
Evidence mep: D = true
C
D
E
Bayesian Networks: Inference IV:
Different algorithms have been developed to update the remaining
network after observation of other network variables.
Examples for exact inference algorithms:
Variable or factor elimination
Recursive conditioning
Clique tree propagation
Belief propagation
Examples for approximate inference algorithms:
Generalized belief propagation
Loopy belief propagation
Importance sampling
Mini-bucket elimination
Bayesian Networks: Learning I:
Parameter/Structure
A
B
C
D
E
Bayesian Networks: Learning II:
Parameter Learning
1. Expert Knowledge
2. Data driven
a. Maximum likelihood (complete data)
b. Expectation maximization (incomplete data).
c. Bayesian approach
Structure Learning
1. Expert Knowledge
2. Data driven:
a. Local search approach
b. Constraint based approach: Greedy search (K2, K3), optimal search
c. Bayesian approach
Bayesian Networks: Application to
Image Analysis?
YES
1. Identification of features distinguishing between groups.
2. Combination of different distinguishing imaging features.,
e.g., volumetric and functional imaging.
Bayesian Network: Basics: Feature
Identification I
Characterization of the problem
1. Parameter and structure learning.
Preparatory steps
a. Representative trainings data set
b. Information reduction:
Structure learning
c. Definition of network nodes :
d. Definition of possible node states.
Parameter learning
e. Calculation of the strength of association
between image feature and variable of interest
2. Network query.
a. Calculation of group affiliation based on concordance with
feature set that had been identified during the learning
process.
Bayesian Network: Basics: Feature
Identification II
GAMMA: Graphical Model-Based
Morphometric Analysis*
Chen R, Herskovits E. IEEE Transactions on Medical Imaging: 2005; 24:
1237 – 1248
GAMMA: Preparatory Steps I
1. Identification of trainings set:
Images patients and controls or subjects with
and without the function variable of interest for
the Bayesian network.
Representative for population, i.e.,
encompasses the variability typically found in
each of the population
GAMMA: Preparatory Steps II
2. Data Reduction
Use of prior knowledge regarding the nature of
the feature, e.g., reduction of information in
image to regions with relative volume loss if
disease is associated with atrophy.
Creation of binary images: Each individual
image is compared to a mean image and voxels
with intensities below a predefined threshold,
e.g. – 1 SD below control are set to 1, other
voxels to zero
GAMMA: Preparatory Steps II
Data Reduction
Mean
Original
Control
Binarized
Control
(1SD below mean)
SD
Original
Patient
Binarized
Patient
(1SD below mean)
Each binary map can be represented as follows: {F, V1, V2, V3…Vm} where
F represents the state, i.e. patient or control and Vi represents the voxel at
location i. Given the above definition, a voxel Vi with the value 1 means that
there is a volume loss
Choice of images to generate mean/SD image and threshold for binarization
are crucial for performance
GAMMA: Structure Learning
Theoretical
Steps.
1. Generate Bayesian Network that identifies the
probabilistic relationship among {Vi} and F.
2. Generate cluster(s) of representative voxels (R,
output: label map) such that all voxels in a cluster have
similar probabilistic associations with F (output: belief
map). All clusters are independent from each other and
each cluster corresponds to a node.
GAMMA: Structure Learning
Practical I
Step 1
a. Definition of search space V, e.g., all voxels where at
least one subject has a value that differs from every other
subject’s value for that voxel.
b. Identification of the first search space voxel(s) that
provide optimal distinction between states F, e.g. all
controls 0, all patients 1. Assign voxel to putative group of
representative voxels A.
GAMMA: Structure Learning
Practical I
...
...
Group A n=10, “Controls”
Group B n=10 , “Patients”
Disease characterized by atrophy or “1” voxels
compared to controls
0/0
0/1
3/6
1/1
0/0
1/1
1/1
3/9
3/9
4/9
4/9
3/9
3/9
1/0
1/0
0/1
0/1
1/10
1/10
2/9
2/9
0/9
0/9
2/1
2/1
0/0
1/9
1/9
0/9
3/4
0/0
3/0
1/0
1/1
4/3
search space
representative
voxels 1st
iteration
GAMMA: Structure Learning
Practical II
Step 1 cont.
c. Identification of voxel(s) whose addition to A
increases the ability of A to correctly distinguish
between states F. Process is repeated until there is no
voxel left that fulfills that condition.
d. Identification of all those voxels Rn in A that
maximize the distinction between states F. The Rn of
the first iteration corresponds to R. (The Rn after the
first iteration are added to R). Voxels belonging to Rn
are removed from search space V.
GAMMA: Structure Learning
Practical II
0/0
0/1
3/6
1/1
0/0
1/1
3/9
4/9
3/9
1/0
0/1
1/10
2/9
0/9
2/1
0/0
1/9
1/9
0/9
3/4
0/0
3/0
1/0
1/1
4/3
GAMMA: Structure Learning
Practical III
Step 2 (iteration 2 and higher)
a. Calculation of similarity s between voxels in A and
voxels in Rn-1. Similarity s for one voxel Vi in A is defined
as
s(Vi,Rn-1)= P( Vi=1, Rn-1= 1) + P(Vi = 0, Rn-1 =0)
The similarity for all n voxels in A is expressed as a
similarity map S
S = {s(Vi,Rn-1), s(Vj,Rn-1)….s(Vn,Rn-1)}.
GAMMA: Structure Learning
Practical III
0/0
0/1
3/6
1/1
0/0
1/1
3/9
4/9
3/9
1/0
0/1
1/10
2/9
0/9
2/1
0/0
1/9
1/9
0/9
3/4
0/0
3/0
1/0
1/1
4/3
GAMMA: Structure Learning
Practical IV
Step 2 (iteration 2 and higher) cont.
b. Initial random assignment of a label L (patient or
control) to each voxel in A. Voxel with the same label
are in the same cluster. Initially there are max only 2
cluster: Cluster of voxels with 100% probability to be
patients and cluster of voxels with 100% probability to
be controls. During the optimization the probabilities
are adjusted and so is the number of clusters. The
global variance criterion is used to determine the
optimal number of clusters.
The L of all n voxels is defined as the label map L.
GAMMA: Structure Learning
Practical IV
Group A Controls
Group B Patients
0/0
0/0D
0/1
3/6
1/1
0/0
1/1
3/9
4/9
3/9
1/0
0/1
1/10
B
2/9
0/9
B
2/1
0/0
1/9
A
B
1/9
B
0/9
B
3/4
0/0
3/0
1/0
1/1
4/3
GAMMA: Structure Learning
Practical IV
c. Using the similarity map S and the initial label map L as an input, the
problem can be reduced to find a posterior MAP estimation of L given the
information of S.
MRF
“Bayesian component”
Sets penalty for different patterns.
Low penalty for combining voxels
with similar probabilistic
association and spatial closeness
Describes relationship between
similarity map and label map
Loopy Belief Propagation is used for Label map/Belief map inference
GAMMA: Structure/Parameter
Learning Practical V
Step 3. Update of the cluster(s) of representative
voxels R from previous iterations (n-1) by adding Ln
and Bn to generate Lall and B all. Voxels belonging to
Rn are removed from the search space V.
Start of a new iteration (Step 1-3) until no voxel in the
remaining search space V left. Lall and Ball of the last
iteration are defined as Lfinal and Bfinal.
GAMMA: Structure/Parameter
Learning Practical VI
Step 4. Validation of Lfinal and Bfinal using jackknife
method. The resulting sampling distribution is used to
generate a p map on which each p-value indicates
the likelihood of an outcome as or more extreme as
that observed.
Step 5. Regional state inference: Group assignment of
each subject in the trainings set based on
correspondence of the individual abnormalities with
Lfinal. Observed group membership of trainings set
and inferred group membership based RSL are used
as parameter set for DAG
GAMMA: Structure/Parameter
Learning: Outputs
Event B
Label Map Lfinal
Event A
C: 0.729
P: 0.270
Belief Map Bfinal
GAMMA vs GLM
GAMMA Label Map
SPM GLM FDR 0.05
GAMMA vs. GLM I
GLM
GAMMA
normal distribution
normal/non-normal distribution
parametric statistic
non-parametric statistic
linear state-image feature
association
probabilistic state-image feature
association
Detects:
Detects:
- Segregation
- Segregation
- Degeneration
- Integration
GAMMA vs. GLM II
Segregation
Degeneration
Integration
Group A
Group B
GLM
GAMMA
GLM
GAMMA
GLM
GAMMA
Bayesian Networks: Combination of
Features
GAMMA uses a Bayesian network approach to identify features of
a single image modality to distinguish between two groups, e.g.
patient vs. control.
However, this scenario does not really reflect the questions that
need to be answered in clinical practice.
A. The question is often not only if a subject is a patient or not
but also what type of patient the subject is.
B. Often information from several sources, imaging other
exams, that can be confirmatory but also conflicting.
=> Classical problem for “conventional” Bayesian
Network approach
Bayesian Networks: Multi-Level
Application I
Example:
Three types of focal non-lesional epilepsy with similar clinical
manifestation, controls with matching imaging protocol
A. Temporal lobe epilepsy with mesial temporal sclerosis
B. Temporal lobe epilepsy with normal MRI
C. Frontal lobe epilepsy with normal MRI und different semiology
Two MR imaging modalities:
A. structural whole brain T1 for volumetry = gray matter loss
B. whole brain DTI = white matter abnormalities
GOAL: Bayesian Network classifier that calculates the
probability of a patient to belong to one of the 3 types based
on imaging features
Bayesian Networks: Multi-Level
Application II
Strategy:
1. First Level: Full characterization of GM and WM
imaging features in each group using GAMMA. Each
group is compared against each other group, i.e. total of
12 whole brain comparisons and 1 region of interest
(hippocampus) comparison.
2. Second Level: Combination of the imaging information
incl. one clinical variable (seizures yes/no) into a
Bayesian network that allows to calculate the probability
of a patient to belong to one of the three epilepsy types
(simple evidence query).
Bayesian Networks: First Level
Characterization of GM Loss
Bayesian Networks: First Level
Characterization of WM Integrity Loss
Bayesian Networks: Second Level
Results I
TLE with sclerosis:
84.5% correctly classified
15.8% incorrectly classified
0% not classified
TLE with normal MRI
59.1% correctly classified
22.7% incorrectly classified
18.2% not classified
FLE with normal MRI
50% correctly identified
28.6% incorrectly identified
21.4% not identified
Not classified: abnormalities in both modalities not exceeding those found in controls.
Results II
Summary: Classifier using
Bayesian Networks
Bayesian networks can be used at several stages of the image
processing and analysis.
Bayesian networks are ideal to combine information from different
imaging modalities but also from sources, e.g., clinical, metabolomic,
genetic etc.
Bayesian networks do not depend on the assumptions of the classical
parametric statistics.
Bayesian network provide the probability to belong to a certain group,
i.e., are threshold-free.
Bayesian networks show some promise to be useful for “subtype”
identification
References
Crawford JR, Howell DC. Comparing an individual’s test score against norms derived from
small samples. The Clinical Neuropsychologist 1998; 12: 482 - 486
Crawford JR, Garthwaite PH. Comparison of a single case to a control or normative sample
in neuropsychology: Development of a Bayesian approach. Cog Neuropsychology 2007: 24:
343 -372
Scarpezza C, Sartori G, De Simone MS, Mechelli A. When the single matters more than the
group: Very high false positive rates in single case voxel-based morphometry. Neuroimage
2013: 70: 175 -188
Darwiche A. Modeling and Reasoning with Bayesian Networks. Cambridge University Press
2009
Chen R, Herskovits EH. Graphical-Model-Based Morphometric Analysis. IEEE Transactions
Med Imaging 2005; 24: 1237 – 1248
Chen R, Herskovits EH. Graphical-Model-Based multivariate analysis of functional magnetic
resonance data. Neuroimage 2007; 35: 635 -647
Chen R, Herskovits EH. Graphical-Model-Based multivariate analysis (GAMMA): An Open
source, cross-platform neuroimaging data analysis software package. Neuroimform DOI
10.1007/s12021-011-9129-7
Mueller SG, Young K, Hartig M, Barakos J, Garcia P. Laxer KD. A two-level multimodality
imaging Bayesian network approach for classification of partial epilepsy: Preliminary data.
Neuroimage 2013 71:224-232
Software
http://homepages.abdn.ac.uk/j.crawford/pages/dept/SingleCaseMethodsCo
mputerPrograms.HTM
http://genie.sis.pitt.edu/
http://reasoning.cs.ucla.edu/samiam/
GAMMA: http://www.nitrc.org