Added Predictive Ability

Download Report

Transcript Added Predictive Ability

A SAS Macro to Compute Added
Predictive Ability of New Markers in
Logistic Regression
Kevin Kennedy, MS
Saint Luke’s Hospital, Kansas City, MO
Kansas City Area SAS User Group Meeting
September 3, 2009
Acknowledgment
• A special thanks to Michael Pencina (PhD,
Biostatistics Dept, Boston University,
Harvard Clinical Research Institute) for his
valuable input, ideas, and additional
formulas to strengthen output.
Motivation
• Predicting dichotomous outcomes
are important
– Will a patient develop a disease?
– Will an applicant default on a loan?
– Will KSU win a game in the NCAA
tournament?
• Improving an already usable models
should be a continued goal
Motivation
• Many published models exist
– Risk of Bleeding after PCI
• Predicts patient risk of bleeding after a PCI
(Percutaneous Coronary Intervention) based on 10
patient characteristics: Age, gender, shock, prior
intervention, kidney function, etc..
• Example: a 78 year old female with Prior Shock and
Congestive Heart Failure has a ~8% chance of
bleeding after procedure (national avg=2-3%)
– Why important? We can treat those at high
risk appropriately
Mehta et al. Circ Intervention, June 2009
However…..
• These models aren’t etched in stone
• New markers (variables) should be
investigated to improve model
performance
• Important Question: How do we
determine if we should add these
new markers to the model?
Project Goal
• Compare:
model1: logit( y)  1 var1  .....  n varn
With
model2: logit( y)  1 var1  .....  n varn   n1 varn1
• How Much Does the new variable add to model
performance?
• Output of Interest: Predicted Probability of
Event (one for each model) computed with the
logistic regression model
Outline
• Traditional Comparisions
– Receiver Operating Characteristic
Curve
• New Measures
– IDI, NRI, Vickers Decision Curve
• SAS macro to obtain output
Traditional Approach-AUCs
• Common to plot the Receiver
Operating Characteristic Curve (ROC)
and report the area underneath (AUC)
or c-statistic
• Measures model discrimination
• Equivalent to the probability that the
predicted risk is higher for an event
than a non-even[4]
AUC/ROC Background
• ROC/AUC plot depicts trade-off between benefit (true
positives) and costs (false positives) by defining a “cut off”
to determine positive and negative individuals
Cut Off
Events
Non-Events
False Neg
Predicted Probabilities
True Positive Rate (Sensitivity)=.9
False Positive Rate (1-Specificity)=.4
ROC curve
100%
AUC  .9
True Positive Rate
(sensitivity)
AUC
1
P erfect Discrimination
~ .9
~ .7
 .65
.5
Yippy!
Ehh...
Ugh...
No Discrimination
0%
0%
100%
False Positive Rate
(1-specificity)
AUC Computation
• All possible pairs are made between events
and nonevents
– A dataset with 100 events and 1000 non-events
would have 100*1000=100000 pairs
– If the predicted probability for the event is higher
for the subject actually experiencing the event
give a ‘1’ (concordant) otherwise ‘0’ (discordant)
– C-statistic is the average of the 1’s and 0’s (.5 for
ties)
• Now use methods by DeLong to compare
AUCs of Model1 and Model2
Advantages
• Used all the time
• Recommended guidelines exist for
Excellent/Good/Poor discrimination
• Default output in proc logistic, with
new extensions in version 9.2 (roc
and roccontrast statements) along
with other computing packages
Disadvantages
• Rank based
– A comparison of .51 to .50 is treated the same
as .7 to .1
• Doesn’t imply a useful model
– Example: all events have probability=.51 and
non-event have probability=.5---Perfect
discrimination (c=1) but not useful
• Extremely hard to find markers that result
in high AUC
– Pepe [5] claims an odds ratio of 3 doesn’t yield
good discrimination
Alternatives
• Pencina and D’Agostino in 2008
(Statistics in Medicine) suggest 2
additional statistics:
– IDI (Integrated Discrimination Improvement)
– NRI (Net Reclassification Improvement)
• Vickers [2,3] developed graphical
techniques of comparing models
IDI
• Measure of improved sensitivity without sacrificing specificity
• Formula measures how much increase in ‘p’ for events, and how
much decrease for the non-event
Absolute_ IDI  ( p _ event _ 2  p _ event _ 1)  ( p _ nonevent_ 1  p _ nonevent_ 2)
Re lative_ IDI 
p _ event _ 2  p _ nonevent_ 2
1
p _ event _ 1  p _ nonevent_ 1
where:
p _ event _ 2 is the mean predictedprobability of the eventsin the2nd model
IDI
Absolute_IDI=(.15-.12)+(.25-.2)=.08
Model1
Model2
Mean Predicted Probability in %
50
Relative_IDI 
Mean ‘p’
decreased by 3%
for non-events
.25  .12
 1  1.6
.2  .15
160% Relative
improvement
Mean ‘p’
increased by
5% for events
40
30
25
20
20
15
12
10
0
Non-Events
Events
NRI
• Measures how well the new model
reclassifies event and non-event
• Dependant on how we decide to classify
observations
– Example: 0-10% Low, 10-20% Moderate, >20%
High
• Questions: do the patients experiencing
an event go up in risk? Mod to High
between model1 and 2
• Do Patients not experiencing an event go
down in risk? Moderate to Low?
NRI
• Formula:
 # Eventsmoving up   # Eventsmovingdown
NRI  

 
# of Events
# of Events
 


 # Non - eventsmovingdown  # Non - eventsmoving up 



# of Non - events
# of Non - events
 


NRI Computation Example
• 3 groups defined
– <10% (low)
– 10-20% (moderate)
– >20% (high)
• Each individual will have a group for
model 1 and 2
• 2 cross-tabulation tables (events and
non-events)
NRI Computation Example
• Crosstab1: Events (100 Events)
Model 2
Low Mod High
20 Events moving up
Model 1
10 Events moving down
Low 10
8
2
Mod 3
30
10
High 2
5
30
70 Events not moving
Net of 10/100 (10%) of
events getting
reclassified correctly
NRI Computation Example
• Crosstab2: Non-Events (200 Non-Events)
Model 2
Model 1
Low Mod High
Low
50
5
0
Mod
20
40
10
High
5
10
60
15 Non-events moving up
35 Non-events moving down
150 Non-events not moving
Net of 20/200 (10%) of nonevents getting reclassified
correctly
10   35
15 
 20
NRI  



 
  .2
 100 100   200 200 
NRI Caveats
• Dependant on Groups
– Would we reach similar conclusions with
groups: <15, 15-30, >30?????
• Alternative ways to define groups
– Any up/down movement. A change in p from
.15 to .151 would be an ‘up’ movement
– A threshold: ex. a change of 3% would
constitute an up/down. i.e. .33 to .37 would be
an ‘up’ but .33 to .34 would be no movement
• Good News: Macro handles all these
cases and you can request all at once
Vickers Decision Curve
• A graphical comparison of model 1 and 2 based off of ‘net’
benefit (first attributed to Peirce-1884)
• Useful if a threshold is important.
• Example: If a persons predicted probability of an outcome
is greater than 10% we treat with strategy A
– Here we’d want to compare the models at this threshold
True P ositives  FalseP ositives pt 

NetBenefit
 

1  pt  
n
n

Vickers Decision Curve
• Example:
– N=1000, dichotomous event, 10% as threshold
Predicted
Probability
Outcome
≥10%
<10%
True
Positive
count
Yes
No
80
200
20
False Positive
count
700
80  200
.1 
  .0578
NetBenefit
 

1000  1000 1  .1 
5.78 net true positive
results per 100
compared with treating
all as negative
Vickers Decision Curve
No difference
between 1 2 and
treat all
0.06
0.05
Net Benefit
0.04
Model 2 seems to outperform 1
0.03
0.02
0.01
0.00
-0.01
0
10
20
30
40
Threshold Probability in %
PLOT
Treat All
Model1
Model2
50
SAS Macro to Obtain Output
• %added_pred(data= ,id= , y= , model1cov= ,model2cov= ,
nripoints=ALL, nriallcut=%str(), vickersplot=FALSE,
vickerpoints=%str());
• Model1cov=initial model
• Model2cov=new model
• Nripoints=nri levels (eg: <10, 10-20, >20) insert nripoints=.1
.2
• Nriallcut=if you want to test amount of increase or decrease
instead of levels, i.e. if you want to know if a person
increase/decreases by 5% insert nriallcut=.05
• Vickerpoints=thresholds to test (eg, 10%)
AUC section of Macro
• AUC Comparisions are easy with
SAS 9.2
PROC LOGISTIC DATA=&DATA;
MODEL &y=&model1cov &model2cov;
ROC ‘FIRST’ &model1cov;
ROC ‘SECOND’ &model2cov;
ROCCONTRAST REFERENCE=‘FIRST’;
ODS OUTPUT ROCASSOCIATION=ROCASS
ROCCONTRASTESTIMATE=ROCDIFF;
RUN;
• If working from an earlier version the %roc macro
will be called from the SAS website[6]
AUC Section of Macro
• Sample output
• %added_pred(data=data, id=id, y=event,
model1cov=x1 x2 x3 x4, model2cov=x1 x2 x3 x4 x5,
…….);
Model1
AUC
Model2
AUC
Difference
in AUC
0.77907 0.79375 0.0147
Std Error for
Difference
P-value for
difference
95% CI for
Difference
.0042
0.0005
(0.00646,0.0229)
IDI Section of Macro
• Use proc logistic output dataset for
model1 and model2
PROC LOGISTIC DATA=&DATA;
MODEL &y=&model1cov;
OUTPUT OUT=OLD PRED=P_OLD;
RUN;
PROC LOGISTIC DATA=&DATA;
MODEL &y=&model2cov;
OUTPUT OUT=NEW PRED=P_NEW;
RUN;
proc sql noprint;
create table probs as select *, (p_new-p_old) as pdiff
from old(keep=&id &y &model1cov &model2cov p_old ) as a join new(keep=&id &y
&model1cov &model2cov p_new ) as b on a.&id=b.&id
order by &y;
quit;
• Now use proc means or sql to obtain:
p_event_new, p_event_old, p_nonevent_new,
p_nonevent_old
IDI Section of Macro
• Sample Output
IDI
IDI Std
Err
Z-value
Pvalue
95% CI
Probability
change for
events
Probability
change for
non-events
Relative
IDI
.0207
.0064
6.3
<.0001
(0.0143, 0.0272)
.0186
-.002125
.167
NRI Section of Macro
• In 3 parts:
– All groups (any up/down movement)
– User defined (eg. <10,10-20,>20)
– Threshold (eg. a change of 3%)
• Coding is more involved here
containing
– Counts for number of groups
– Do-loops for various # of thresholds
and user groups
NRI Section of Macro
User Defined Groups(eg. <10, 10-20,>20
nripoints=.1 .2)
%if &nripoints^=ALL %then %do;
%let numgroups=%eval(%words(&nripoints)+1); /*figure out how many ordinal groups*/
data nriprobs;
set probs;
/*define first ordinal group for pre and post*/
if 0<=p_old<=%scan(&nripoints,1,' ') then group_pre=1;
if 0<=p_new<=%scan(&nripoints,1,' ') then group_post=1;
%let i=1;
%do %until(&i>%eval(&numgroups-1));
if %scan(&nripoints,&i,' ')<p_old then do;group_pre=&i+1;end;
if %scan(&nripoints,&i,' ')<p_new then do;group_post=&i+1;end;
%let i=%eval(&i+1);
%end;
if &y=0 then do;
if group_post>group_pre then up_nonevent=1;
if group_post<group_pre then down_nonevent=1;
end;
if &y=1 then do;
if group_post>group_pre then up_event=1;
if group_post<group_pre then down_event=1
end;
if up_nonevent=. then up_nonevent=0;
if down_nonevent=. then down_nonevent=0;
if up_event=. then up_event=0;
if down_event=. then down_event=0;
run;
NRI Section of Macro
• Sample Output
• %added_pred(data=,…..,nripoints=.1 .2,
nriallcut=.03,…..);
Group
NRI
STD
ERR
Zvalue
PVALUE
95% CI
% of events
correctly
reclassified
% of non-event
correctly
reclassified
ALL
.454
.09
9.7
<.0001
(0.3649,0.5447)
(10%)
56%
CUT_.03
.101
.08
2.46
.014
(0.0205,0.1817)
4%
6%
USER
.127
.05
4.95
<.0001
(0.0769,0.177)
5%
8%
Vickers’ Section of Macro
• Default is no analysis
• Can test multiple thresholds
• Uses bootstrapping techniques to
create 95% CI’s
• If testing thresholds run time will
increase due to Bootstrapping
Vickers Section of Macro
%added_pred(data=,…..,vickersplot=TRUE,…..)
Vickers Decision Curve
0.11
0.10
0.09
Net Benefit
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
-0.01
0
10
20
30
40
50
60
70
Threshold Probability in %
PLOT
Treat All
Model1
Model2
80
90
100
Conclusions
• Don’t rely only on AUC and statistical
significance to access added marker
predictive ability, but a combination
of methods
• Future: extend to time-to-event
analysis
Q’s or Comments
• If you want to use the macro or obtain
literature contact me at:
• Email: [email protected] or
[email protected]
References
1) Pencina MJ, D'Agostino RB Sr, D’Agostino RB Jr. Evaluating the added
predictive ability of a new marker: From area under the ROC curve to
reclassification and beyond. Stat Med 2008; 27:157-72.
2) Vickers AJ, Elkin EB. Decision curve analysis: a novel method for
evaluating prediction models. Medical Decision Making. 2006 NovDec;26(6):565-74
3) Vickers AJ, Cronin AM, Elkin EB, Gonen M. Extensions to decision curve
analysis, A novel method for evaluating diagnostic tests, prediction
models and molecular markers. BMC Medical Informatics and
Decision Making. 2008 Nov 26;8(1):53.
4) Cook NR. Use and Misuse of the receiver operating characteristics curve
in risk prediction. Circulation 2007; 115:928-935.
5) Pepe MS, Janes H, Longton G, Leisenring W, Newcomb P. Limitations of
the odds ratio in gauging the performance of a diagnostic, prognostic, or
screening marker. Am J Epidemiol. 2004;159:882-890.
6) http://support.sas.com/kb/25/017.html