Logistic regression
Download
Report
Transcript Logistic regression
Statistics for Health Research
Assessing Binary Outcomes:
Logistic Regression
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Objectives of Session
• Understand what is meant by a
binary outcome
• How analyses of binary outcomes
implemented in logistic regression
model
• Understand when a logistic model
is appropriate
• Be able to implement in SPSS and
• Interpret logistic model output
Binary Outcome
Extremely common in health research:
•Dead / Alive
•Hospitalisation (Yes / No)
•Diagnosis of diabetes (Yes / No)
•Met target e.g. total cholesterol < 5.0 mmol/l
(Yes / No)
n.b. Can use any code such as 1 / 2 but mathematically easier to
use 0 / 1
How is relationship
formulated?
For linear simplest equation is :
y a bx ei
y is the outcome; a is the intercept;
b is the slope related to x the
explanatory variable and;
e is the error term or random ‘noise’
Can we fit y as a
probability range 0 to 1?
y a bx ei
Not quite!
Y as continuous - any value from -∞ to + ∞
Outcome is a probability of event, Π (or p) on
scale 0 – 1
Certain transformations of p can give the
required scale
Probit is a normal transformation of p but not
easy to interpret results
The logit transformation works!
We can now fit p as a probability range 0 to 1
And y in range -∞ to + ∞
y log it (p) a bx e
i
p
log
a bx e
1 p
i
Logistic Regression Model
p
log
a bx e
1 p
i
This has very useful properties
The term p/(1-p) is called the ‘Odds’ of an event
Note: not the same as the probability of an event p
If x is binary coded 0/1 then -
exp (b) = ODDS RATIO
for the outcome in those coded 1 relative to code 0
e.g. Odds of death in men (1) vs. women (0)
Logistic Regression Model
Consider the LDL data.
It has two binary outcomes –
1) LDL target achieved
2) Chol target achieved
For example consider gender as a
predictor – Male = 1 & Female = 2
For a binary x we can express results as
odds ratios (available in crosstabs)
LDL target achieved
Gender
No
Male
Female
140
149
Yes
563
Odds yes
= 563/140
531
Odds yes
= 531/149
Odds ratio = 4.02 / 3.56
OR = 0.886 Female cf Male
LDL target achieved
No
Gender
Male
Female
140
149
Yes
563
531
Odds yes
= 563/140
= 4.02
Odds yes
= 531/149
= 3.56
N.b. Odds is different to prob – Men p = 563/(140+563) = 0.80 or 80%
Odds ratio from Crosstabs
Obtain odds ratios for 2 x 2 tables
from crosstabs and select option ‘risk’
Results from Crosstabs
Odds ratios for achieving LDL target
in females vs. males
n.b. OR given for Female vs
male = 0.886
Fit Logistic Regression Model
Dependent is binary outcome –
LDL target met (Yes = 1, No = 0)
Independent – Gender 1 = M, 2 = F
Should get same as the crosstabs result
Select Analyze / Regression / Binary Logistic
Select option of 95% CI for exp (b)
Regression /
Binary logistic…..
Odds ratio from logistic model
results for a binary predictor
EXP (B) = Odds ratio F vs. M
Note that OR for Men vs Women
= 1/0.886 = 1.13
Fit Logistic Regression Model
– continuous predictor
Dependent is binary outcome –
LDL target met
Independent – Continuous predictor –
Adherence
B represents the change in the ODDS RATIO
for a 1 unit increase in adherence
B x 10 represents the change in the ODDS
RATIO for a 10 unit increase in adherence
Odds ratio from logistic model
results for a continuous
EXP (B) = Odds ratio for 1% increase in Adherence
OR for 10% increase is exp(10 x 0.010) = 1.105
i.e. a 10.5% increase in odds of meeting
LDL target for each 10% increase in
adherence
Fit Logistic Regression Model
– categorical predictor
Dependent is binary outcome –
LDL target met
Independent – APOE genotype (1 – 6)
Choose a reference category, in this case worst
outcome is genotype 6 so choose 6 to give ORs
> 1
B represents the OR for each category relative
to the reference category
Regression /
Binary logistic…..
Choose Categorical
Odds ratios from logistic model
results for a categorical predictor
EXP (B) = Odds ratio for
APOE (2) vs APOE (6)
OR = 4.381
(95% CI 1.742, 11.021)
Other binary models
The logistic model is only applicable
whenever the length of follow-up is same
for each individual e.g. 5-yr follow-up of
a cohort
For binary outcomes where censoring
occurs i.e. people leave the cohort from
death or migration then length of followup varies and need to use survival models
such as Cox Proportional Hazards model
Summary
• Logistic model easily fitted in SPSS
• Clear link with ODDS RATIOS
• Common model for case-control, cohort
studies as well as development of
clinical prediction models