Logistic Regression

Download Report

Transcript Logistic Regression

LOGISTIC REGRESSION
A statistical procedure to relate the
probability of an event to explanatory
variables
Used in epidemiology to describe and
evaluate the effect of a risk on the
occurrence of a disease event.
Example: Framingham Heart Study
Coronary heart disease and blood
pressure
LOGISTIC REGRESSION: AN EXAMPLE
Event: Coronary Heart Disease
Occurrence is the dependent variable,
which takes 2 values: Yes or No.
Risk factor: Blood pressure
Systolic blood pressure is the independent
variable X, a continuous measurement.
The probability of getting coronary heart
disease depends on blood pressure.
DATA
MAN
John
Steven
Sean
Brian
Michael
Terry
Joseph
Patrick
Teddy
Ryan
.
.
.
SYSTOLIC
BP
130
140
145
150
155
160
165
170
175
180
.
.
.
DEVELOPED
CHD
NO
NO
NO
NO
YES
NO
NO
YES
YES
YES
.
.
.
.
.
.
0
0
0
0
1
0
1
1
1
1
SCATTER PLOT
1.0
CHD
0.8
0.6
0.4
0.2
0.0
120
140
160
180
Systolic blood pressure
200
Prob(CHD)
LINEAR REGRESSION FOR Prob.(CHD):
NOT A GOOD IDEA!
1.2
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
120
140
160
180
Systolic blood pressure
200
PROPORTION WITH CHD
BY SBP GROUP
Systolic BP Range
Proportion
130-149 mmHg
0/3 0.00
150-169 mmHg
2/4 0.50
170-189 mmHg
3/3 1.00
LOGISTIC REGRESSION
PROBABILITY MODEL
1
p(X) = ----------------------------1 + exp (- b0 - b1X)
The probability of the event varies
as an S-shaped function of the
risk factor X: the logistic curve.
LOGISTIC CURVE MODEL:
OCCURRENCE OF CHD AS A FUNCTION OF SBP
Probability of CHD
1
Probability
0.8
0.6
0.4
0.2
0
0
100
200
300
Systolic Blood Pressure
prob.=1/{1+exp(-6.08 + 0.0243(SBP)}
LOGISTIC MODEL: LOG ODDS
p (X)
log ----------- = b0 + b1X
1 - p (X)
The log of the odds of the event is
a linear function of X.
Log(odds of CHD) = - 6.08 + 0.0243(SBP)
ODDS
The odds of an event is the
chance that the event occurs
divided by the chance of its not
occurring:
Odds = p/(1 - p) = p/q
b1: KEY PARAMETER
OF THE LOGISTIC MODEL
p (X)
log ----------- = b0 + b1X
1 - p (X)
The parameter b1 is like the slope of a
linear regression model.
b1 = 0 indicates that X has no effect on
the probability, e.g., a man’s chance
of CHD does not depend on his SBP.
b 1:
KEY PARAMETER
p (X)
log ----------- = b0 + b1X
1 - p (X)
The coefficient b1 measures the amount
of change in the log of the odds per
unit change in X.
b 1:
KEY PARAMETER
log odds(X+1) = b0 + b1(X+1)
= b0 + b1X+ b1
log odds(X) = b0 + b1X
Difference in log odds = b1
E.g., the log of the odds of getting CHD
increases by 0.0243 for an increase of 1
mmHg of systolic blood pressure.
(Hard to explain to a patient!)
THE COEFFICIENT b1
AND THE ODDS RATIO
Difference in log odds given by b1
translates into the odds ratio (OR).
exp(b1) = OR =
ratio of odds at risk level of X+1
to the odds when risk level is X
b1 = 0  OR = 1.
THE COEFFICIENT $1
AND THE ODDS RATIO
For example, the odds of CHD are
multiplied by the factor
exp(0.0243) = 1.025 for every
increase of 1 mmHg in SBP.
A difference of 10 mmHg multiplies
the odds of CHD by (1.025)10, or
1.275.
ESTIMATION
OF THE PARAMETERS
Technique:
Maximum likelihood estimation
For large sample sizes, the normal
distribution is used to put a
confidence interval around the
estimate of the coefficient b1.
HYPOTHESIS TESTING
Ho: b1 = 0
No difference in risk at different
levels of the risk factor X.
No association between risk
factor X and probability of
occurrence.
HYPOTHESIS TESTING
Ha: b1 =/= 0 or
b1 > 0 (risk increases
with X) or
b1 < 0 (risk goes down
as X increases)
HYPOTHESIS TESTING
Ho: OR = 1
Ha: OR =/= 1 or
OR > 1 (risk increases
with X) or
OR < 1 (X is protective)
RESULTS OF LOGISTIC REGRESSION
OR with confidence interval and
p value indicate whether there is
a significant association
between level of the risk factor
and chance of occurrence
OR = 1.025 (1.015, 1.034), p < 0.001
RESULTS OF LOGISTIC REGRESSION
Can be used to predict an
individual’s risk:
prob. of CHD when SBP = 180:
p/q = exp{-6.082 + 0.0243(180)}
Solve for p:
prob. of CHD = 0.125
MULTIVARIATE LOGISTIC
REGRESSION
Model with additional risk factors:
p (X)
log ----------- = b0 + b1X + b2X
1 - p (X)
Log(odds of CHD) =
b 0+ b1(SBP) + b2(CHOL) + b3(smoker)