Transcript or “1”
Statistics for clinicians
Biostatistics course by Kevin E. Kip, Ph.D., FAHA
Professor and Executive Director, Research Center
University of South Florida, College of Nursing
Professor, College of Public Health
Department of Epidemiology and Biostatistics
Associate Member, Byrd Alzheimer’s Institute
Morsani College of Medicine
Tampa, FL, USA
1
SECTION 6.6
Introduction to
survival analysis
2
Learning Outcome:
Recognize concepts and methods used in
survival analysis
Survival Analysis
• A technique to estimate the probability of
“survival” (and also risk of disease) that
takes into account incomplete subject followup.
• Calculates risks over a time period with
changing incidence rates.
• Wide application in a variety of disciplines,
such as engineering.
Survival Analysis
• With the Kaplan-Meier method (“product-limit
method”), survival probabilities are calculated at
each time interval in which an event occurs.
• The cumulative survival over the entire follow-up
period is derived from the product of all interval
survival probabilities.
• Cumulative incidence (risk) is the complement of
cumulative survival.
K-M formula:
# of time
S
Where:
=
intervals
(Nk – Ak)
-------------
k=1
Nk
k = sequence of time interval
Nk = number of subjects at risk
Ak = number of outcome events
Survival Analysis
• With the Kaplan-Meier method, subjects
with incomplete follow-up (FU) are
“censored” at their last known time of (FU).
• An important assumption (often not upheld)
is that censoring is “non-informative”
(survival experience of subjects censored is
the same as those with complete FU).
• Non-fatal outcomes can also be studied.
Survival Analysis
• The Life-Table method is conceptually
similar to the Kaplan-Meier method.
• The primary difference is that
survival probabilities are determined
at pre-determined intervals (i.e.
years), rather than when events occur.
SECTION 6.7
Calculation and
Interpretation of
Survival Analysis
Estimates
9
Learning Outcome:
Calculate and interpret survival analysis
estimates of incidence
Survival Analysis
Example:
• Assume a study of 10 subjects conducted
over a 2-year period.
• A total of 4 subjects die.
• Another 2 subjects have incomplete followup (study withdrawal or late study entry).
What is the probability of 2-year survival, and
the corresponding risk of 2-year death?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
?
1
1
?
?
?
?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
?
?
?
?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
0.143 0.857 0.675 0.325
22
5
1
0
?
?
?
?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
0.143 0.857 0.675 0.325
22
5
1
0
0.20
0.80
0.54
0.46
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
0.143 0.857 0.675 0.325
22
5
1
0
0.20
0.80
0.54
0.46
24
4
0
0
0.0
1.0
0.54
0.46
Interpretation: What is the 2-year risk of death?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
0.143 0.857 0.675 0.325
22
5
1
0
0.20
0.80
0.54
0.46
24
4
0
0
0.0
1.0
0.54
0.46
Interpretation: What is the 1-year risk of death?
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
5
10
1
1
0.10
0.90
0.90
0.10
7
8
1
0
0.125 0.875 0.788 0.212
20
7
1
1
0.143 0.857 0.675 0.325
22
5
1
0
0.20
0.80
0.54
0.46
24
4
0
0
0.0
1.0
0.54
0.46
Survival Analysis (Practice)
Example:
• Assume a study of 12 subjects conducted
over a 3-year period.
• A total of 5 subjects die.
• Another 2 subjects have incomplete followup (study withdrawal or late study entry).
What is the probability of 3-year survival, and
the corresponding risk of 3-year death?
19
Complete the worksheet below
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
7
12
1
1
0.0833
0.9167
0.9167
0.0833
11
10
1
0
0.10
0.90
0.8250
0.1750
16
1
0
24
1
1
30
1
0
36
0
0
What is the probability of 3-year survival, and the corresponding
risk of 3-year death? Survival _______ Death _________
20
Complete the worksheet below
(1)
Time to
Death
from
Entry
(Mo)
(2)
No.
Alive at
Each
Time
(3)
No.
Who
Died at
Each
Time
(4)
No. Lost
to FU
Prior to
Next
Time
(5)
Prop.
Died at
That
Time
(3) / (2)
(6)
Prop.
Survive
At That
Time
1 – (5)
(7)
Cumul.
Survival
To that
Time
(8)
Cumul.
Risk to
That
Time
1 – (7)
7
12
1
1
0.0833
0.9167
0.9167
0.0833
11
10
1
0
0.10
0.90
0.8250
0.1750
16
9
1
0
0.1111
0.8889
0.7333
0.2667
24
8
1
1
0.125
0.875
0.6416
0.3584
30
6
1
0
0.1667
0.8333
0.5346
0.4654
36
5
0
0
0.0
1.0
0.5346
0.4654
What is the probability of 3-year survival, and the corresponding
risk of 3-year death? Survival _0.5346_ Death _0.4654_
21
SECTION 6.8
Logistic Regression
Model
22
Learning Outcome:
Recognize components and interpret
parameters from the logistic regression
model
23
Logistic Regression Analysis
Conceptually similar to linear regression with
dichotomous outcome.
Outcome is usually coded as “0” or “1”, with
“1” referring to presence of the outcome in
interest (although SAS assumes 0).
p represents the probability that the outcome
is present (e.g. value of 1), given particular
covariate values of an individual
Logistic Regression Analysis
Multiple logistic regression model can be
written in different ways:
where:p = expected probability that outcome is present
x1 through xp = independent variables
b0 through bp = regression coefficients
Logistic Regression Analysis
b1 = change in the expected log odds in the outcome relative
to a 1-unit change in xi holding other predictors constant
Anti-log of regression coefficient, exp(bi), produces odds ratio
Logistic Regression Analysis
Example: Estimate the risk of incident CVD among persons
defined as obese.
Variable
Intercept
Obesity (yes vs. no)
p
ln
1–p
{
}
= b0
b
-2.367
0.658
χ2
p-value
307.38 0.0001
9.87
0.0017
+ b1x1 + b2x2 + … bpxp
p
ln
= -2.367 + 0.658(Obesity) = log odds
1–p
exp(0.658) = 1.93 (odds ratio)
{
}
Example: Estimate the log odds of being on a statin drug in relation
to the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
p
ln
1–p
{
}= b
0
+ b1x1 + b2x2 + … bpxp
Write out the logistic regression equation below. (Practice)
p
ln
1–p
{
}
=
Example: Estimate the log odds of being on a statin drug in relation
To the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
p
ln
1–p
{
}= b
0
+ b1x1 + b2x2 + … bpxp
Write out the logistic regression equation below.
p
ln
1–p
{
}
= -3.065 + 0.036(age) – 0.53(female) + 0.029(BMI)
– 0.001 (physical activity) + 1.067(diabetes)
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
p
ln
1–p
{
}
= b0
+ b1x1 + b2x2 + … bpxp
So, the predicted odds of an individual being on a statin drug =
= EXP[(-3.065 + 0.036(age) – 0.53(female) + 0.029(BMI)
– 0.001 (physical activity) + 1.067(diabetes)]
AND
Predicted Probability = Predicted odds / (1 + predicted odds).
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Estimate the predicted odds and probability of an individual being on
a statin drug with the following characteristics:
Age=55; male; BMI=31.4; physical activity level=2; diabetic
= EXP[(-3.065 + 0.036(55) – 0.53(0) + 0.029(31.4) – 0.001 (2) + 1.067(1)]
= exp(0.896) = 2.446
Predicted Probability
= odds / (1 + predicted odds)
= 2.446 / (3.446) = 0.71
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Estimate the predicted odds and probability of an individual being on
a statin drug with the following characteristics: PRACTICE
Age=52; female; BMI=29.5; physical activity level=3; non-diabetic
=
Predicted Probability
= odds / (1 + predicted odds)
=
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Estimate the predicted odds and probability of an individual being on
a statin drug with the following characteristics:
Age=52; female; BMI=29.5; physical activity level=3; non-diabetic
= EXP[(-3.065 + 0.036(52) – 0.53(1) + 0.029(29.5) – 0.001 (3) + 1.067(0)]
= exp(-0.8645) = 0.42
Predicted Probability
= odds / (1 + predicted odds)
= 0.42 / (1.42) = 0.296
Example: Estimate the log odds of being on a statin drug in relation
to the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Produce odds ratio estimates of statin use for the following
(Practice):
Age (per year)
Age per 5 years)
Female gender
History of diabetes
=
=
=
=
Example: Estimate the log odds of being on a statin drug in relation
To the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Produce odds ratio estimates of statin use for the following:
Age (per year)
Age per 10 years)
Female gender
History of diabetes
=
=
=
=
exp(0.036) = 1.04
exp(10 x 0.036) = 1.43
exp(-0.530) = 0.59
exp(1.067) = 2.91
Example: Estimate the log odds of being on a statin drug in relation
To the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Interpret odds ratio estimates of statin use for the following:
Age per 10 years)
=
exp(10 x 0.036) = 1.43
History of diabetes
=
exp(1.067) = 2.91
Example: Estimate the log odds of being on a statin drug in relation
To the predictors listed below.
Variable
b
Wald χ2
p-value
Intercept
-3.065
8.015
0.027
Age (per year)
0.036
5.334
0.021
Gender (female = 1)
-0.530
5.082
0.024
Body mass index (per unit)
0.029
2.187
0.139
Physical activity (per unit)
-0.001
0.000
0.996
History of diabetes (1 = yes)
1.067
9.250
0.002
Interpret odds ratio estimates of statin use for the following:
Age per 10 years)
=
exp(10 x 0.036) = 1.43
For every 10 year increase in age, the adjusted odds of
being on a statin drug increases 1.43-fold
History of diabetes =
exp(1.067) = 2.91
Persons with diabetes have 2.91 times higher odds of
being on a statin drug compared to persons without diabetes
SECTION 6.9
SPSS for Logistic
Regression Analysis
38
Learning Outcome:
Use SPSS to fit and interpret a logistic
regression model
39
SPSS
Analyze
Regression
Binary Logistic
Dependent Variable
Covariates
SPSS
Analyze
Descriptive Statistics
Crosstabs
Row=Hx diabetes
Col = Statin use
Odds Ratio = odds exposure cases
odd exposure controls
= (17 / 88) / (24 / 372)
= 0.193 / 0.0645 = 2.99