LOGISTIC REGRESSION MODEL
Download
Report
Transcript LOGISTIC REGRESSION MODEL
Probability and odds
Suppose we a frequency distribution for the variable “TB status”
frequency
Have TB
Don’t have TB
Total
73
607
680
Relative
frequency
0.107
0.893
The probability of an individual having TB is 0.107
2
Definition of odds :
odds
probability of event occurring
probability of event not occurring
The odds of having TB is
73 0.107
0.12
607 0.893
3
Interpreting odds
The odds of 0.12 can be interpreted as:
For the community in consideration, we expected about one
eighth of them to have TB.
Or inverting the odds, an individual is eight times as likely not to
have TB as having TB.
4
Association between a dependent variable and
an independent variable
If an independent variable impacts or has a relationship with the
dependent variable, it will change the odds of being in the key
dependent variable group( group with the event of interest)
For example suppose we have information on HIV status of the
individuals whose “TB status” we had earlier:
5
Have TB Don’t have TB Total
HIV+ve
57
32
89
HIV-ve
106
505
611
Total
73
607
700
The impact of HIV on TB status can be measured using
odds ratio
6
The odds ratio is equal to
odds of having TBif HIV+ve
odds ratio(OR)
odds of having TBif HIV-ve
The odds of having TB for HIV +ve individuals is
57/32= 1.7813
The odds of having TB for HIV-ve individuals is
106/505= 0.21
7
Therefore the odds ratio is 1.7813/0.21= 8.48
The odds ratio can also be calculated directly as
O.R
57 505
8.48
32 106
Interpretation:
HIV +ve individuals are 8.5 times more likely to have TB
compared to HIV-ve individuals.
8
Another example…..
The table below gives the contingency table of number of women
in a study according to use of contraceptive pill and
presence/absence of myocardial infarction
Myocardial infraction
contraceptive
Yes
No
total
using pill
23
49
72
not using pill
35
132
167
Total
58
181
239
9
The odds of women using the pill having infraction is
23/49 = 0.469
And the odds of women not using the pill having infraction is
35/132= 0.265
Thus the odds ratio having infraction for women using the pill
compared to those not using the pill is
0.469/0.265= 1.77
10
Interpreting the odds ratio
Women using the pill are 1.77 times more likely to have
myocardial infraction compared to women not using the pill.
women using the pill are 77% [ (1.77-1.00) x 100%] more
likely to have myocardial infraction compared to women not
using the pill.
11
Logistic model
It is a mathematical expression used to determine if a relationship
exists between a binary dependent variable and a set of independent
variables
Logistic regression combines the independent variables to estimate
the probability that a particular event will occur, i.e. an individual
will be a member of one of the groups defined by the binary
dependent variable
12
If we have only one independent variable, the model is
log(odds of event) = a + b predictor
If we have two or more predictors, the model is
log(odds of event) = a + b1 predictor1 + b2 predictor2
+….. + bk predictork
b1 , b2 , …., bk are known regression coefficients
13
Measurements of independent variables:
The independent variables can be either qualitative (categorical)
or quantitative (continuous).
The independent variables usually include exposure variables,
potential confounders and potential effect modifiers
14
Interpreting output of logistic regression
If a coefficient is positive, its transformed log value will be
greater than one, meaning that the modeled event is more likely
to occur.
If a coefficient is negative, its transformed log value will be less
than one, and the odds of the event occurring decrease.
A coefficient of zero (0) has a transformed log value of 1.0,
meaning that this coefficient does not change the odds of the
event one way or the other
15
The transformed log value is an odds ratio.
For a qualitative independent variable, one level of the variable
is selected as an reference and the other levels compared to it.
For example, using gender as a variable; then suppose female is
chosen as reference then the coefficient corresponding to this
variable is interpreted using
odds of event for males
OR=
odds for event for females
16
Another example, suppose age is categorized into five groups :
20 -29, 30 – 39, 40 – 49, 50 – 59, 60 -69;
and 20 – 29 group is chosen as the reference group and then
have four odds ratio (OR) for this variable
17
OR =
odds of having disease for age group 30 – 39
odds of having disease for age group 20 – 29
OR=
odds of having disease for age group 40 – 49
odds of having disease for age group 20 – 29
18
OR =
odds of having disease for age group 50 – 59
odds of having disease for age group 20 – 29
OR =
odds of having disease for age group 60 – 69
odds of having disease for age group 20 – 29
19
For a quantitative independent variable, then we compare two
groups with a difference of one unit of measurement of the
variable.
For example if blood pressure is an independent variable
Then odds ratio is
OR =
odds of having disease for those with (x + 1)mm/Hg
odds of having disease for those with x mm/Hg
where x is say 120
20
Each independent variable is interpreted adjusting for
others.
When reporting the results it is advised to report both
the unadjusted and adjusted odds ratios
21
Example:
A study is designed to assess the association between obesity
(defined as BMI > 30) and incident cardiovascular disease.
Data were collected from participants who were between the
ages of 35 and 65, and free of cardiovascular disease (CVD)
at baseline. Each participant was followed for 10 years for the
development of cardiovascular disease.
A logistic regression analysis is fitted to assess the association
between obesity(independent variable ) and CVD
(present=1,absent=0)
For independent variable, non obese persons is reference
group.
22
The results of the fit were:
regression
independent coefficient
Exp()
variable
()
z-value p-value (odds ratio)
constant
-2.367
obesity
0.658
3.1423 0.0017
1.93
95% for odds ratio
1.281
2.911
exp(0.658) = 1.93, is the unadjusted odds ratio.
The odds of developing CVD are 1.93 times higher among obese
persons as compared to non obese persons.
Obese persons are 93% more likely to develop CVD.
The association between obesity and incident CVD is statistically
significant (p=0.0017).
23
When examining the association between obesity and CVD, age
was determined as a confounder. To adjust for it, a logistic
regression model is fitted with obesity and age as independent
variables and CVD as the dependent variable.
Age is categorized as less than 50 years of age and 50 years of age
and older.
For the analysis, age group of less than 50 years of age is reference
group.
24
The fitted model was:
Log(odds of developing CVD)= -2.592 +0.415 obesity+0.655age
Exp(0.415)=1.52; the odds of developing CVD are 1.52
times higher among obese persons as compared to non obese
persons, adjusting for age. This is adjusted odd ratio.
25
Example:
Researchers examined the relationship between coronary heart
disease (CHD) risk and the risk factors: age(in years),
cholesterol (mg/dL), systolic blood pressure (mmHg), body
mass index (BMI) and smoking status.
Using a logistic model, they obtained the results below:
26
independent
variable
constant
age
cholestrol
sbp
bmi
smokes
regression
coefficient
()
-12.311
0.0644
0.0107
0.0193
0.0574
0.6345
z-value p-value
5.41
7.08
4.72
2.18
4.53
< 0.001
< 0.001
<0.001
0.029
< 0.001
Exp(b)
Odds ratio
1.06
1.02
1.03
1.06
1.89
95% for odds ratio
1.042
1.01
1.02
1.01
1.432
1.092
1.04
1.05
1.12
2.482
27
Adjusting for cholesterol level, systolic blood pressure,
body mass index and smoking status; for every additional
year in age, an individual is 1.06 times more likely to have CHD.
Alternatively, we can say that an is 6% more likely to have CHD
for every additional age while adjusting for other variables.
The unadjusted odds ratio for age was 1.08
Age is a significant predictor since p-value is small( <0.001).
28