Personalized Risk Assessment: Are We There Yet?

Download Report

Transcript Personalized Risk Assessment: Are We There Yet?

Computer Models for Medical Diagnosis and
Prognostication
Lucila Ohno-Machado, MD, PhD
Division of Biomedical Informatics
• Evaluation of binary classifiers
(calibration)
• Ethical implications for clinical practice
Source: DOE
• Clinical pattern recognition and
predictive models
ComputerBased Models
for Medical
Diagnosis and
Prognostication
Lucila Ohno-Machado, MD, PhD
Division of Biomedical Informatics
University of California San Diego
Risk Assessment
• Popular risk calculators
– Gail Model (Breast cancer)
– Framingham Risk Calculator (CVD)
– APACHE (ICU mortality)
• Use of individual
estimates
– Prophylaxis for breast
cancer
– Cholesterol management
guidelines
– Continuation of life support
22%
16%
APACHE II
• Mortality in
intensive care
units (ICUs)
• 12 physiologic
predictors
Individualized Genome
• How many
individual
genotypes
are needed
to predict
disease?
Logistic Regression
pi 
1
1  e  (  0   i xi )
 pi 
log
   0   i xi
1  pi 
p=1
x
x
Coronary Angioplasty and Stenting
Risk of death in angioplasty
National average of deaths after angioplasty is 2%, which is stated in
the informed consent.
"Informed consent and good clinical practice require a
discussion of risks and benefits…”
Alexander et al, 52th ACC meeting
Less than 10% of the patients have an estimated risk of death
around 2%.
Are we lying to the other 90%?
Dataset: Attributes Collected
History
age
gender
diabetes
iddm
history CABG
baseline
creatinine
CRI
ESRD
hyperlipidemia
Presentation
acute MI
primary
rescue
CHF class
angina class
cardiogenic
shock
failed CABG
Angiographic
occluded
lesion type
(A,B1,B2,C)
graft lesion
vessel treated
ostial
Procedural
Operator/Lab
number lesions annual volume
multivessel
device experience
number stents daily volume
stent types (8) lab device
closure device
experience
gp 2b3a
unscheduled case
antagonists
dissection post
rotablator
atherectomy
angiojet
max pre stenosis
max post stenosis
no reflow
Data Source:
Medical Record
Clinician Derived
Other
Resnic et al, J Am Col Card 2001; Matheny et al, J Biomed Inf 2005
Study Population:
Descriptive Statistics
Development Set
Cases
2,804
Validation Set
1,460
Women
909 (32.4%)
433 (29.7%)
p=.066
Age > 74yrs
595 (21.2%)
308 (22.5%)
p=.340
Acute MI
Primary
250 (8.9%)
156 (5.6%)
144 (9.9%)
95 (6.5%)
p=.311
Shock
Class 3/4 CHF
gp IIb/IIIa antagonist
Death
Death, MI, CABG (MACE)
p=.214
62
(2.2%)
20
(1.4%)
p=.058
176
(6.3%)
80
(5.5%)
p=.298
777 (53.2%)
p<.001
1,005 (35.8%)
67
177
(2.4%)
(6.3%)
24 (1.6%)
96 (6.6%)
p=.110
p=.739
Multivariate Models
Logistic
Regression Model
Age > 74yrs
B2/C Lesion
Acute MI
Class 3/4 CHF
Left main PCI
IIb/IIIa Use
Stent Use
Cardiogenic Shock
Unstable Angina
Tachycardic
Chronic Renal Insuf.
Odds
Ratio
p-value
2.51
2.12
2.06
8.41
5.93
0.57
0.53
7.53
1.70
2.78
2.58
0.02
0.05
0.13
0.00
0.03
0.20
0.12
0.00
0.17
0.04
0.06
Prognostic Risk
Score Model
beta
Risk
coefficient Value
0.921
0.752
0.724
2.129
1.779
-0.554
-0.626
2.019
0.531
1.022
0.948
2
1
1
4
3
-1
-1
4
1
2
2
Artificial Neural
Network
Logistic Regression, Score, and
Neural Networks
Validation Set: 1460 Cases
1.00
0.90
0.80
Sensitivity
0.70
0.60
LR
0.50
Score
aNN
0.40
0.30
ROC Area
LR: 0.840
Score: 0.855
aNN: 0.835
0.20
0.10
0.00
0.00
ROC = 0.50
0.20
0.40
0.60
1 - Specificity
0.80
1.00
Risk Score of Death
Unadjusted Overall Mortality Rate = 2.1%
3000
60%
53.6%
62% Number
of Cases
Number of Cases
2500
50%
Mortality
Risk
2000
40%
1500
30%
26%
21.5%
1000
20%
12.4%
500
10%
7.6%
0.4%
1.4%
2.2% 2.9%
1.6%
0
0 to 2
3 to 4
5 to 6
7 to 8
Risk Score Category
9 to 10
1.3%
>10
0%
External Validations for CVD Models
Model/Cohort
AKA
Year Published
Framingham Risk Score
FRS
Dawber et al, 19518
1
Framingham Risk Score
FRS
Kannel et al, 19769
4
Framingham Risk Score
FRS
Anderson et al, 19912
Glostrup
Glostrup
Schroll et al, 199210
1
European Society of Cardiology
ESC
Pyorala et al, 199411
1
Framingham Risk Score
FRS
Wilson et al, 199812
32
Framingham Risk Score for ATP III
FRS ATP III
ATP III, 200113
5
Framingham Risk Score
FRS
D’Agostino et al, 200114
9
UK Prospective Diabetes Study
UKPDS
Stevens et al, 200115
1
Framingham Point System
FPS
ATP III, 20021
2
Prospective Cardiovascular Munster Study
PROCAM
Assman et al, 200216
6
Finnish Diabetes Risk Score
FINDRISC
Lindstrom et al, 200317
6
Systematic Coronary Risk Evaluation
SCORE
Conroy et al, 200318
8
Diabetes Epidemiology: Collaborative Analysis of
Diagnostic Criteria in Europe
ASSessing cardiovascular risk using SIGN
guidelines
DECODE
Balkau et al, 200419
1
ASSIGN
SIGN, 200720
2
Total
External
Validations
29
108
Predicted / Observed
Framingham models tested on
European populations
Predicted / Observed
Framingham models tested on
European populations
European models tested on North
American populations
Questions
• Which model is right?
• “True” probability would be the gold-standard
– What is the true probability?
• Are the models adequate in discrimination and
calibration?
Your Risk
“this program
shows the
estimated
health risks of
people with
your same
age, gender,
and risk factor
levels”
p=1
x
“this means that 5 of 100 people
with this level of risk will have a
heart attack or die”
Patients “like you”
Input space
“people with your
same age,
gender, and
risk factor
levels”
Output space
me
“people with this
level of risk”
Patients “like you”
height
me
gender
Patients “like you”
Patients “like you”
height
1
me
0
gender
1
Patients “like you”
risk
height
1
2
0
1
me
gender
Evaluation of Predictive Models
• Error
• Discrimination
– Area under ROC
• Calibration
– Plot of groups: observed vs
expected
– Hosmer-Lemeshow statistic
Discrimination of Binary Outcomes
• Estimate and Observed outcome (“gold standard”, “true”)
Estimate True
0.3
0
0.2
0.5
0.1
0
1
0
• Classification into category 0 or 1 is based on thresholded
estimates (e.g., if estimate > 0.5 then consider “positive”)
threshold
normal
Disease
True
Negative (TN)
True
Positive (TP)
FN
0
FP
e.g. 0.5
1.0
nl
D
Sens = TP/TP+FN
“nl”
TN
FN
“D”
FP
TP
-
+
Spec = TN/TN+FP
PPV = TP/TP+FP
NPV = TN/TN+FN
Accuracy = TN +TP
“nl”
“D”
Sensitivity = 50/50 = 1
Specificity = 40/50 = 0.8
nl
D
“nl”
40
0
40
“D”
10
50
60
50
50
threshold
disease
nl
TP
TN
FP
0.0
0.4
1.0
Sensitivity = 40/50 = .8
Specificity = 45/50 = .9
nl
D
“nl”
45
10
50
“D”
5
40
50
50
50
disease
TP
TN
FN
0.0
nl
threshold
FP
0.6
1.0
Sensitivity = 30/50 = .6
Specificity = 1
nl
D
“nl”
50
20
70
“D”
0
30
30
50
50
threshold
nl
disease
TP
TN
FN
0.0
0.7
1.0
“D”
D
40
0
40
10
50
60
50
50
nl
D
“nl”
45
10
50
“D”
5
40
50
50
50
nl
D
“nl”
50
20
70
“D”
0
30
30
50
50
1
Sensitivity
Threshold 0.4
“nl”
nl
ROC
curve
0
1 - Specificity
1
Sensitivity
All Thresholds
1
ROC
curve
0
1 - Specificity
1
Areas Under the ROC curve
concordance index
• measure adequacy of
risk ranking
(#concordant pairs + ½ #ties) /all pairs
• do not measure
adequacy of risk
estimates (collective nor
individual)
Discordant
Pairs
Predicted / Observed
Framingham models tested on
European populations
Calibration
• Measures how
close the
average
estimate is to
the observed
proportion
• Goodness-of-fit
– HosmerLemeshow
statistics
D=0
Calibration
2

 
  oDl  
1
3   Dl
 
HL    


D  0 l 1
 Dl




D=0
D=1
Risk Score of Death
Unadjusted Overall Mortality Rate = 2.1%
3000
60%
53.6%
62% Number
of Cases
Number of Cases
2500
50%
Mortality
Risk
2000
40%
1500
30%
26%
21.5%
1000
20%
12.4%
500
10%
7.6%
0.4%
1.4%
2.2% 2.9%
1.6%
0
0 to 2
3 to 4
5 to 6
7 to 8
Risk Score Category
9 to 10
1.3%
>10
0%
Interventional Cardiology Models
Validation
• 5278 patients from BWH (2001-2004) (external validation set)
• Comparisons use Areas under the ROC curve (AUC) and the
Hosmer-Lemeshow goodness-of-fit statistic (deciles)
Calibration
Are predictions obtained from external
models good for individual counseling?
APACHE II
• Mortality in
intensive care
units (ICUs)
• 12 physiologic
predictors
Summary of all comparison studies in terms of discrimination (AUC)
Summary of HL-GOF H and C statistics. X ² values and degrees of
freedom are listed where available, p values are listed otherwise.
Standardized mortality ratio in different study comparisons
3
1.4
1.2
1
Cluster 1, Y=0
0.8
Cluster 1, Y=1
Cluster 2, Y=0
x2
0.6
Cluster 2, Y=1
Cluster 3, Y=0
0.4
Cluster 3, Y=1
0.2
Cluster 4, Y=0
Cluster 4, Y=1
0
-0.5
-0.2
0
0.5
1
1.5
-0.4
x1
Simulated data set
Clusters 1 to 4 centered on (0,0), (0,1), (1,0), and (1,1), with Gaussian noise
True probability for clusters 1 to 4: 0.01, 0.40, 0.60, and 0.99
3
1.4
2
1.4
4
1.2
1.2
1
Cluster 1, Y=0
0.8
Cluster 1, Y=1
1
0.8
Cluster 2, Y=0
Cluster 2, Y=1
Cluster 3, Y=0
0.4
3
0.2
-0.2
0
0.5
-0.4
0.4
Cluster 3, Y=1
0
-0.5
0.6
x2
x2
0.6
1
LR
1
1.5
Cluster 4, Y=0
0.2
Cluster 4, Y=1
0
-0.5
-0.2
0
0.5
-0.4
x1
x1
Simulated data set
Clusters 1 to 4 centered on (0,0), (0,1), (1,0), and (1,1), with Gaussian noise
True probability for clusters 1 to 4: 0.01, 0.40, 0.60, and 0.99
1
1.5
1.4
1.2
ANN
1
Cluster 1, Y=0
Cluster 1, Y=1
0.8
Cluster 2, Y=0
x2
0.6
Cluster 2, Y=1
Cluster 3, Y=0
0.4
Cluster 3, Y=1
0.2
Cluster 4, Y=0
Cluster 4, Y=1
0
-0.5
-0.2
0
0.5
-0.4
x1
1
1.5
Calibration Plot
LR
Neural
Network
52.363
50.894
Expected
40
35
Sum of squared errors
30
Mean squared error
0.130
0.127
Cross-entropy error
154.543
150.838
0.386
0.377
103.226
100.412
0.2580
0.251
AUC
0.889
0.895
HL-C
6.437
11.773
p
0.598
0.161
LR
Neural
Network
25
Neural Network
20
Logistic Regression
Mean cross-entropy error
15
Sum of residuals
10
Mean residual
5
0
0
10
20
30
40
Observed
1
0.9
0.8
Proportion of events
Average estimate Neural Net
Average estimate Logistic Reg
0.7
Cluster 2 min (GS: 0.4)
.20
.43
0.6
Cluster 2 max
.80
.58
Cluster 3 min (GS: 0.6)
.29
.65
Cluster 3 max
.85
.73
0.5
0.4
0.3
0.2
0.1
0
1
2
Clusters
3
4
Genotype
genome
transcription
RNA
transcriptome
Source: DOE
translation
Protein
Phenotype
physical exam, imaging
Physiology
tests
Metabolites
proteome
laboratory
Will we ever achieve “individualized” risk assessment?
If so, how can we evaluate it?
Acknowledgments
• NIH, Komen Foundation
• Fred Resnic, Michael Matheny