Transcript lect16

Lecture 16:
Logistic Regression:
Goodness of Fit
Information Criteria
ROC analysis
BMTRY 701
Biostatistical Methods II
Goodness of Fit
 A test of how well the model explains the data
 Applies to linear models and generalized linear
models
 How to do it?
 It is simply a comparison of the “current” model
to a perfect model
• What would the estimated likelihood function be in a
perfect model?
• What would the estimated log-likelihood function be in
a perfect model
Set up as a hypothesis test
 Ho: current model
 H1: perfect model
 Recall the G2 statistic comparing models:
G2 = Dev(0) - Dev(1)
 How many parameters are there in the null
model?
 How many parameters are there in the perfect
model?
Goodness of Fit test
 Perfect model: Assumed to be ‘saturated’ in
most cases
 That is, there is a parameter for each
combination of predictors
 In our model? that is likely to be close to N due
to the number of continuous variables
 Define c = number of parameters in saturated
model
 Deviance goodness of fit: Dev(0)
Goodness of Fit test
 Deviance goodness of fit: Dev(0)
 If Dev(Ho) < χ2(c-p),1-α, conclude H0
 If Dev(Ho) > χ2(c-p),1-α conclude H1
 Why arent we subtracting deviances?
GoF test for Prostate Cancer Model
> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),
+ family=binomial)
> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)
> mreg1
Coefficients:
(Intercept)
gleason
log(psa)
vol
-8.31383
0.93147
0.53422
-0.01507
factor(dpros)2 factor(dpros)3 factor(dpros)4
0.76840
1.55109
1.44743
Degrees of Freedom: 378 Total (i.e. Null); 372 Residual
(1 observation deleted due to missingness)
Null Deviance:
511.3
Residual Deviance: 377.1
AIC: 391.1
Test Statistic: 377.1 ~ χ2(380 - 7)
Threshold: χ2(373),1-α, = 419.0339
p-value = 0.43
More Goodness of Fit
 There are a lot of options!
 Deviance GoF is just one
• Pearson Chi-square
• Hosmer-Lemeshow
• etc
 Principles, however, are essentially the same
 GoF is not that commonly seen in medical
research because it is rarely very important
Information Criteria
 Information criterion is a measure of the goodness of
fit of an estimated statistical model.
 It is grounded in the concept of entropy,
• offers a relative measure of the information lost
• describes the tradeoff precision and complexity of the model.
 An IC is not a test on the model in the sense of
hypothesis testing
 it is a tool for model selection.
 Given a data set, several competing models may be
ranked according to their IC
 The model with the lowest IC is chosen as the “best”
Information Criteria
 IC rewards goodness of fit, but also includes a penalty
that is an increasing function of the number of estimated
parameters.
 This penalty discourages overfitting.
 The IC methodology attempts to find the model that best
explains the data with a minimum of free parameters.
 More traditional approaches such as LRT start from a
null hypothesis.
 IC judges a model by how close its fitted values tend to
be to the true values.
 the AIC value assigned to a model is only meant to rank
competing models and tell you which is the best among
the given alternatives.
Akaike Information Criteria (AIC)
AIC  2 log Lik  2 p
Akaike, Hirotugu (1974). "A new look at the statistical model identification".
IEEE Transactions on Automatic Control 19 (6): 716–723..
Bayesian Information Criteria
BIC  2 log Lik  p ln(N )
Schwarz, Gideon E. (1978). "Estimating the dimension of a model".
Annals of Statistics 6 (2): 461–464.
AIC versus BIC
2 p vs. p ln(N )
 BIC and AIC are similar
 Different penalty for number of parameters
 The BIC penalizes free parameters more
strongly than does the AIC.
 Implications: BIC tends to choose smaller
models
 The larger the N, the more likely that AIC and
BIC will disagree on model selection
Prostate cancer models
 We looked at different forms for volume:
A: volume as continuous
B: volume as binary (detectable vs. undetectable)
C: 4 categories of volume
D: 3 categories of volume
E: linear + squared term for volume
AIC vs. BIC (N=380)
p
-2logLik
AIC
BIC
A: continuous
8
376.0
392.0
423.5
B: binary
8
375.2
391.2
422.7
C: 4 categories
10
373.6
393.6
433.0
D: 3 categories
9
375.2
393.2
428.6
E: quadratic
9
376.0
394.0
429.4
AIC vs. BIC if N is multiplied by 10 (N=3800)
p
-2logLik
AIC
BIC
A: continuous
8
3760.0
3776.0
3825.9
B: binary
8
3752.0
3768.0
3817.9
C: 4 categories
10
3736.0
3756.0
3818.4
D: 3 categories
9
3751.9
3769.9
3826.1
E: quadratic
9
3760.0
3778.0
3834.2
ROC curve analysis
 Receiver Operating Characteristic Curve
Analysis
 Traditionally, looks at the sensitivity and
specificity of a ‘model’ for predicting an outcome
 Question: based on our model, can we
accurately predict if a prostate cancer patient
has capsular penetration?
ROC curve analysis
 Associations between predictors and outcomes
is not enough
 Need ‘stronger’ relationship
 Classic interpretation of sens and spec
• a binary test and a binary outcome
• sensitivity = P(test + | true disease)
• specificity = P(test - |true no disease)
 What is test + in our dataset?
 What does the model provide for us?
0.00
0.25
0.50
0.75
1.00
ROC curve analysis
0.00
0.25
0.50
Probability cutoff
Sensitivity
0.75
Specificity
1.00
Fitted probabilities
 The fitted probabilities are the probability that a
NEW patient with the same ‘covariate profile’ will
be a “case” (e.g., capsular penetration, disease,
etc.)
 We select a probability ‘threshold’ to determine
whether a patient is defined as a case or not
 Some options:
• high sensitivity (e.g., cancer screens)
• high specificity (e.g., PPD skin test for TB)
• maximize the sum of sens and spec
ROC curve
. xi: logit capsule i.dpros detected gleason logpsa
i.dpros
_Idpros_1-4
(naturally coded; _Idpros_1 omitted)
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
Logistic regression
Log likelihood =
-188.0471
=
=
=
=
=
-255.62831
-193.51543
-188.23598
-188.04747
-188.0471
Number of obs
LR chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
379
135.16
0.0000
0.2644
-----------------------------------------------------------------------------capsule |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Idpros_2 |
.7801903
.3573241
2.18
0.029
.079848
1.480533
_Idpros_3 |
1.606646
.3744828
4.29
0.000
.8726729
2.340618
_Idpros_4 |
1.504732
.4495287
3.35
0.001
.6236723
2.385793
detected | -.5719155
.2570359
-2.23
0.026
-1.075697
-.0681344
gleason |
.9418179
.1648245
5.71
0.000
.6187677
1.264868
logpsa |
.5152153
.1547649
3.33
0.001
.2118817
.8185488
_cons | -8.275811
1.056036
-7.84
0.000
-10.3456
-6.206018
------------------------------------------------------------------------------
0.00
0.25
0.50
0.75
1.00
ROC curve
0.00
0.25
Area under ROC curve = 0.8295
0.50
1 - Specificity
0.75
1.00
How to interpret?
 Every point represents a patient(s) in the dataset
 Question: if we use that person’s fitted
probability as the threshold, what are the sens
and spec values?
 Empirically driven based on the fitted
probabilities
 Choosing the threshold:
• high sens or spec
• maximize both? the point on ROC curve closest to
the upper left corner
AUC of ROC curve





AUC = Area Under the Curve
0.5 < AUC < 1
AUC = 1 if the model is perfect
AUC = 0.50 if the model is no better than chance
“Good” AUC?
• context specific
• for some outcomes, there are already good diagnostic
measures so AUC would need to be very high
• for others, if there is very little, even an AUC of 0.70
would be useful.
Utility in model selection
 If the goal of the modeling is prediction, AUC can
be used to determine the ‘best’ model
 A variable may be associated with the outcome,
but not add much in terms of prediction
 Example:
• Model 1: gleason + logPSA + detectable + dpros
• Model 2: gleason + logPSA + detectable
• Model 3: gleason + logPSA
0.6
0.4
0.2
1: AUC=0.83
2: AUC=0.80
3: AUC=0.79
0.0
True positive rate
0.8
1.0
ROC curve of models 1, 2, and 3
0.0
0.2
0.4
0.6
False positive rate
0.8
1.0
Sensitivity and Specificity
 For ‘true’ use, you need to choose a cutoff.
 The AUC of the ROC curve tells you about
prediction of model
 But, not directly translatable into ‘accuracy’ of a
given threshold
phat = 0.50 cutoff
Logistic model for capsule
-------- True -------Classified |
D
~D |
Total
-----------+--------------------------+----------+
|
100
39 |
139
|
53
187 |
240
-----------+--------------------------+----------Total
|
153
226 |
379
Classified + if predicted Pr(D) >= .5
True D defined as capsule != 0
-------------------------------------------------Sensitivity
Pr( +| D)
65.36%
Specificity
Pr( -|~D)
82.74%
Positive predictive value
Pr( D| +)
71.94%
Negative predictive value
Pr(~D| -)
77.92%
-------------------------------------------------False + rate for true ~D
Pr( +|~D)
17.26%
False - rate for true D
Pr( -| D)
34.64%
False + rate for classified +
Pr(~D| +)
28.06%
False - rate for classified Pr( D| -)
22.08%
-------------------------------------------------Correctly classified
75.73%
--------------------------------------------------
phat = 0.25 cutoff
Logistic model for capsule
-------- True -------Classified |
D
~D |
Total
-----------+--------------------------+----------+
|
137
96 |
233
|
16
130 |
146
-----------+--------------------------+----------Total
|
153
226 |
379
Classified + if predicted Pr(D) >= .25
True D defined as capsule != 0
-------------------------------------------------Sensitivity
Pr( +| D)
89.54%
Specificity
Pr( -|~D)
57.52%
Positive predictive value
Pr( D| +)
58.80%
Negative predictive value
Pr(~D| -)
89.04%
-------------------------------------------------False + rate for true ~D
Pr( +|~D)
42.48%
False - rate for true D
Pr( -| D)
10.46%
False + rate for classified +
Pr(~D| +)
41.20%
False - rate for classified Pr( D| -)
10.96%
-------------------------------------------------Correctly classified
70.45%
--------------------------------------------------