Transcript lect16
Lecture 16:
Logistic Regression:
Goodness of Fit
Information Criteria
ROC analysis
BMTRY 701
Biostatistical Methods II
Goodness of Fit
A test of how well the model explains the data
Applies to linear models and generalized linear
models
How to do it?
It is simply a comparison of the “current” model
to a perfect model
• What would the estimated likelihood function be in a
perfect model?
• What would the estimated log-likelihood function be in
a perfect model
Set up as a hypothesis test
Ho: current model
H1: perfect model
Recall the G2 statistic comparing models:
G2 = Dev(0) - Dev(1)
How many parameters are there in the null
model?
How many parameters are there in the perfect
model?
Goodness of Fit test
Perfect model: Assumed to be ‘saturated’ in
most cases
That is, there is a parameter for each
combination of predictors
In our model? that is likely to be close to N due
to the number of continuous variables
Define c = number of parameters in saturated
model
Deviance goodness of fit: Dev(0)
Goodness of Fit test
Deviance goodness of fit: Dev(0)
If Dev(Ho) < χ2(c-p),1-α, conclude H0
If Dev(Ho) > χ2(c-p),1-α conclude H1
Why arent we subtracting deviances?
GoF test for Prostate Cancer Model
> mreg1 <- glm(cap.inv ~ gleason + log(psa) + vol + factor(dpros),
+ family=binomial)
> mreg0 <- glm(cap.inv ~ gleason + log(psa) + vol, family=binomial)
> mreg1
Coefficients:
(Intercept)
gleason
log(psa)
vol
-8.31383
0.93147
0.53422
-0.01507
factor(dpros)2 factor(dpros)3 factor(dpros)4
0.76840
1.55109
1.44743
Degrees of Freedom: 378 Total (i.e. Null); 372 Residual
(1 observation deleted due to missingness)
Null Deviance:
511.3
Residual Deviance: 377.1
AIC: 391.1
Test Statistic: 377.1 ~ χ2(380 - 7)
Threshold: χ2(373),1-α, = 419.0339
p-value = 0.43
More Goodness of Fit
There are a lot of options!
Deviance GoF is just one
• Pearson Chi-square
• Hosmer-Lemeshow
• etc
Principles, however, are essentially the same
GoF is not that commonly seen in medical
research because it is rarely very important
Information Criteria
Information criterion is a measure of the goodness of
fit of an estimated statistical model.
It is grounded in the concept of entropy,
• offers a relative measure of the information lost
• describes the tradeoff precision and complexity of the model.
An IC is not a test on the model in the sense of
hypothesis testing
it is a tool for model selection.
Given a data set, several competing models may be
ranked according to their IC
The model with the lowest IC is chosen as the “best”
Information Criteria
IC rewards goodness of fit, but also includes a penalty
that is an increasing function of the number of estimated
parameters.
This penalty discourages overfitting.
The IC methodology attempts to find the model that best
explains the data with a minimum of free parameters.
More traditional approaches such as LRT start from a
null hypothesis.
IC judges a model by how close its fitted values tend to
be to the true values.
the AIC value assigned to a model is only meant to rank
competing models and tell you which is the best among
the given alternatives.
Akaike Information Criteria (AIC)
AIC 2 log Lik 2 p
Akaike, Hirotugu (1974). "A new look at the statistical model identification".
IEEE Transactions on Automatic Control 19 (6): 716–723..
Bayesian Information Criteria
BIC 2 log Lik p ln(N )
Schwarz, Gideon E. (1978). "Estimating the dimension of a model".
Annals of Statistics 6 (2): 461–464.
AIC versus BIC
2 p vs. p ln(N )
BIC and AIC are similar
Different penalty for number of parameters
The BIC penalizes free parameters more
strongly than does the AIC.
Implications: BIC tends to choose smaller
models
The larger the N, the more likely that AIC and
BIC will disagree on model selection
Prostate cancer models
We looked at different forms for volume:
A: volume as continuous
B: volume as binary (detectable vs. undetectable)
C: 4 categories of volume
D: 3 categories of volume
E: linear + squared term for volume
AIC vs. BIC (N=380)
p
-2logLik
AIC
BIC
A: continuous
8
376.0
392.0
423.5
B: binary
8
375.2
391.2
422.7
C: 4 categories
10
373.6
393.6
433.0
D: 3 categories
9
375.2
393.2
428.6
E: quadratic
9
376.0
394.0
429.4
AIC vs. BIC if N is multiplied by 10 (N=3800)
p
-2logLik
AIC
BIC
A: continuous
8
3760.0
3776.0
3825.9
B: binary
8
3752.0
3768.0
3817.9
C: 4 categories
10
3736.0
3756.0
3818.4
D: 3 categories
9
3751.9
3769.9
3826.1
E: quadratic
9
3760.0
3778.0
3834.2
ROC curve analysis
Receiver Operating Characteristic Curve
Analysis
Traditionally, looks at the sensitivity and
specificity of a ‘model’ for predicting an outcome
Question: based on our model, can we
accurately predict if a prostate cancer patient
has capsular penetration?
ROC curve analysis
Associations between predictors and outcomes
is not enough
Need ‘stronger’ relationship
Classic interpretation of sens and spec
• a binary test and a binary outcome
• sensitivity = P(test + | true disease)
• specificity = P(test - |true no disease)
What is test + in our dataset?
What does the model provide for us?
0.00
0.25
0.50
0.75
1.00
ROC curve analysis
0.00
0.25
0.50
Probability cutoff
Sensitivity
0.75
Specificity
1.00
Fitted probabilities
The fitted probabilities are the probability that a
NEW patient with the same ‘covariate profile’ will
be a “case” (e.g., capsular penetration, disease,
etc.)
We select a probability ‘threshold’ to determine
whether a patient is defined as a case or not
Some options:
• high sensitivity (e.g., cancer screens)
• high specificity (e.g., PPD skin test for TB)
• maximize the sum of sens and spec
ROC curve
. xi: logit capsule i.dpros detected gleason logpsa
i.dpros
_Idpros_1-4
(naturally coded; _Idpros_1 omitted)
Iteration
Iteration
Iteration
Iteration
Iteration
0:
1:
2:
3:
4:
log
log
log
log
log
likelihood
likelihood
likelihood
likelihood
likelihood
Logistic regression
Log likelihood =
-188.0471
=
=
=
=
=
-255.62831
-193.51543
-188.23598
-188.04747
-188.0471
Number of obs
LR chi2(6)
Prob > chi2
Pseudo R2
=
=
=
=
379
135.16
0.0000
0.2644
-----------------------------------------------------------------------------capsule |
Coef.
Std. Err.
z
P>|z|
[95% Conf. Interval]
-------------+---------------------------------------------------------------_Idpros_2 |
.7801903
.3573241
2.18
0.029
.079848
1.480533
_Idpros_3 |
1.606646
.3744828
4.29
0.000
.8726729
2.340618
_Idpros_4 |
1.504732
.4495287
3.35
0.001
.6236723
2.385793
detected | -.5719155
.2570359
-2.23
0.026
-1.075697
-.0681344
gleason |
.9418179
.1648245
5.71
0.000
.6187677
1.264868
logpsa |
.5152153
.1547649
3.33
0.001
.2118817
.8185488
_cons | -8.275811
1.056036
-7.84
0.000
-10.3456
-6.206018
------------------------------------------------------------------------------
0.00
0.25
0.50
0.75
1.00
ROC curve
0.00
0.25
Area under ROC curve = 0.8295
0.50
1 - Specificity
0.75
1.00
How to interpret?
Every point represents a patient(s) in the dataset
Question: if we use that person’s fitted
probability as the threshold, what are the sens
and spec values?
Empirically driven based on the fitted
probabilities
Choosing the threshold:
• high sens or spec
• maximize both? the point on ROC curve closest to
the upper left corner
AUC of ROC curve
AUC = Area Under the Curve
0.5 < AUC < 1
AUC = 1 if the model is perfect
AUC = 0.50 if the model is no better than chance
“Good” AUC?
• context specific
• for some outcomes, there are already good diagnostic
measures so AUC would need to be very high
• for others, if there is very little, even an AUC of 0.70
would be useful.
Utility in model selection
If the goal of the modeling is prediction, AUC can
be used to determine the ‘best’ model
A variable may be associated with the outcome,
but not add much in terms of prediction
Example:
• Model 1: gleason + logPSA + detectable + dpros
• Model 2: gleason + logPSA + detectable
• Model 3: gleason + logPSA
0.6
0.4
0.2
1: AUC=0.83
2: AUC=0.80
3: AUC=0.79
0.0
True positive rate
0.8
1.0
ROC curve of models 1, 2, and 3
0.0
0.2
0.4
0.6
False positive rate
0.8
1.0
Sensitivity and Specificity
For ‘true’ use, you need to choose a cutoff.
The AUC of the ROC curve tells you about
prediction of model
But, not directly translatable into ‘accuracy’ of a
given threshold
phat = 0.50 cutoff
Logistic model for capsule
-------- True -------Classified |
D
~D |
Total
-----------+--------------------------+----------+
|
100
39 |
139
|
53
187 |
240
-----------+--------------------------+----------Total
|
153
226 |
379
Classified + if predicted Pr(D) >= .5
True D defined as capsule != 0
-------------------------------------------------Sensitivity
Pr( +| D)
65.36%
Specificity
Pr( -|~D)
82.74%
Positive predictive value
Pr( D| +)
71.94%
Negative predictive value
Pr(~D| -)
77.92%
-------------------------------------------------False + rate for true ~D
Pr( +|~D)
17.26%
False - rate for true D
Pr( -| D)
34.64%
False + rate for classified +
Pr(~D| +)
28.06%
False - rate for classified Pr( D| -)
22.08%
-------------------------------------------------Correctly classified
75.73%
--------------------------------------------------
phat = 0.25 cutoff
Logistic model for capsule
-------- True -------Classified |
D
~D |
Total
-----------+--------------------------+----------+
|
137
96 |
233
|
16
130 |
146
-----------+--------------------------+----------Total
|
153
226 |
379
Classified + if predicted Pr(D) >= .25
True D defined as capsule != 0
-------------------------------------------------Sensitivity
Pr( +| D)
89.54%
Specificity
Pr( -|~D)
57.52%
Positive predictive value
Pr( D| +)
58.80%
Negative predictive value
Pr(~D| -)
89.04%
-------------------------------------------------False + rate for true ~D
Pr( +|~D)
42.48%
False - rate for true D
Pr( -| D)
10.46%
False + rate for classified +
Pr(~D| +)
41.20%
False - rate for classified Pr( D| -)
10.96%
-------------------------------------------------Correctly classified
70.45%
--------------------------------------------------