Transcript lect19

Analysis of Survival Data
Time to Event outcomes
Censoring
Survival Function
Point estimation
Kaplan-Meier
Introduction to survival analysis


What makes it different?
Three main variable types




Continuous
Categorical
Time-to-event
Examples of each
Example: Death Times of Psychiatric
Patients (K&M 1.15)




Dataset reported on by Woolson (1981)
26 inpatient psychiatric patients admitted
to U of Iowa between 1935-1948.
Part of larger study
Variables included:



Age at first admission to hospital
Gender
Time from first admission to death (years)
.04
.03
. tab gender
Data summary
.02
0
.01
Density
|
Freq.
Percent
Cum.
gender
age
deathtimegender
death
1
51
1------------+----------------------------------1
1
58
1
1
0 |
11
42.31
42.31
1
55
2
1
1
28
22
1
1 |
15
57.69
100.00
0
21
30
0
------------+----------------------------------0
19
28
1
1
25
32
1
Total
|
26
100.00
1
48
11
1
1
47
14
1
1
25
36
0
1
31
31
0
0
24
33
0
0
25
33
0
20 30
30 37
40 0
50
60
1
age
1
33
35
0
0
36
25
1
30
31
0
. sum 00age
41
22
1
1
43
26
1
1
45
24
1
Variable
|
Obs
Mean
Std. Dev.
Min
Max
1
35
35
0
-------------+-------------------------------------------------------0
29
34
0
0
35
3026
0
age |
35.15385
10.47928
19
58
0
32
35
1
1
36
40
1
0
32
39
0
Death time?
. sum deathtime
.03
.02
0
.01
Density
.04
.05
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------deathtime |
26
26.42308
11.55915
1
40
0
10
20
deathtime
30
40
Does that make sense?
. tab death
death |
Freq.
Percent
Cum.
------------+----------------------------------0 |
12
46.15
46.15
1 |
14
53.85
100.00
------------+----------------------------------Total |
26
100.00






Only 14 patients died
The rest were still alive at the end of the study
Does it make sense to estimate mean? Median?
How can we interpret the histogram?
What if all had died?
What if none had died?
CENSORING

Different types





Right
Left
Interval
Each leads to a different likelihood
function
Most common is right censored
Right censored data






“Type I censoring”
Event is observed if it occurs before
some prespecified time
Mouse study
Clock starts: at first day of
treatment
Clock ends: at death
Always be thinking about ‘the clock’
Simple example: Type I censoring
Time 0
Introduce “administrative” censoring
Time 0
STUDY END
Introduce “administrative” censoring
Time 0
STUDY END
More realistic: clinical trial
“Generalized Type I censoring”
Time 0
STUDY END
More realistic: clinical trial
“Generalized Type I censoring”
Time 0
STUDY END
Additional issues


Patient drop-out
Loss to follow-up
Drop-out or LTFU
Time 0
STUDY END
How do we ‘treat” the data?
Shift everything
so each
patient time
represents time
on study
Time of
enrollment
Another type of censoring:
Competing Risks



Patient can have either event of interest
or another event prior to it
Event types ‘compete’ with one another
Example of competers:



Death from lung cancer
Death from heart disease
Common issue not commonly addressed,
but gaining more recognition
Left Censoring





The event has occurred prior to the start of the
study
OR the true survival time is less than the person’s
observed survival time
We know the event occurred, but unsure when
prior to observation
In this kind of study, exact time would be known
if it occurred after the study started
Example:



Survey question: when did you first smoke?
Alzheimers disease: onset generally hard to
determine
HPV: infection time
Interval censoring


Due to discrete observation times, actual
times not observed
Example: progression-free survival





Progression of cancer defined by change in
tumor size
Measure in 3-6 month intervals
If increase occurs, it is known to be within
interval, but not exactly when.
Times are biased to longer values
Challenging issue when intervals are long
Key components

Event: must have clear definition of what
constitutes the ‘event’





Need to know when the clock starts





Death
Disease
Recurrence
Response
Age at event?
Time from study initiation?
Time from randomization?
time since response?
Can event occur more than once?
Time to event outcomes


Modeled using “survival analysis”
Define T = time to event




T is a random variable
Realizations of T are denoted t
T0
Key characterizing functions:


Survival function
Hazard rate (or function)
Survival Function


S(t) = The probability of an
individual surviving to time t
Basic properties



Monotonic non-increasing
S(0)=1
S(∞)=0*
* debatable: cure-rate distributions allow plateau at some
other value
0.2
0.4
0.6
0.8
lambda=0.1
lambda=0.05
lambda=0.01
0.0
Survival Function
1.0
Example: exponential
0
10
20
30
time (months)
40
50
60
0.6
0.4
0.2
lam=0.05,a=0.5
lam=0.05,a=1
lam=0.01,a=0.5
lam=0.01,a=1
0.0
Survival Function
0.8
1.0
Weibull example
0
10
20
30
time (months)
40
50
60
Applied example
Van Spall, H. G. C., A. Chong, et al. (2007). "Inpatient smokingcessation counseling and all-cause mortality in patients with
acute myocardial infarction." American Heart Journal 154(2):
213-220.
Background Smoking cessation is associated with improved health
outcomes, but the prevalence, predictors, and mortality benefit of
inpatient smoking-cessation counseling after acute myocardial
infarction (AMI) have not been described in detail.
Methods The study was a retrospective, cohort analysis of a populationbased clinical AMI database involving 9041 inpatients discharged
from 83 hospital corporations in Ontario, Canada. The prevalence
and predictors of inpatient smoking-cessation counseling were
determined.
Results…..
Conclusions Post-MI inpatient smoking-cessation counseling is an
underused intervention, but is independently associated with a
significant mortality benefit. Given the minimal cost and potential
benefit of inpatient counseling, we recommend that it receive
greater emphasis as a routine part of post-MI management.
Applied example
Adjusted 1-year survival
curves of counseled
smokers, noncounseled
smokers, and neversmokers admitted with AMI
(N = 3511). Survival curves
have been adjusted for age,
income quintile, Killip class,
systolic blood pressure,
heart rate, creatinine level,
cardiac arrest, ST-segment
deviation or elevated cardiac
biomarkers, history of CHF;
specialty of admitting
physician; size of hospital of
admission; hospital
clustering; inhospital
administration of aspirin and
β-blockers; reperfusion
during index hospitalization;
and discharge medications.
Hazard Function


A little harder to conceptualize
Instantaneous failure rate or conditional failure rate
P(t  T  t  t | T  t )
h(t )  lim
t 0
t



Interpretation: approximate probability that a person
at time t experiences the event in the next instant.
Only constraint: h(t)0
For continuous time,
h(t )  f (t ) / S (t ) 
d
dt
ln S (t )
Hazard Function



Useful for conceptualizing how chance of
event changes over time
That is, consider hazard ‘relative’ over time
Examples:


Treatment related mortality
 Early on, high risk of death
 Later on, risk of death decreases
Aging
 Early on, low risk of death
 Later on, higher risk of death
Shapes of hazard functions

Increasing


Decreasing


Early failures due to device or transplant
failures
Bathtub


Natural aging and wear
Populations followed from birth
Hump-shaped

Initial risk of event, followed by decreasing
chance of event
0.0
0.2
0.4
0.6
Hazard Function
0.8
1.0
Examples
0
1
2
3
Time
4
5
6
Median



Very/most common way to express the
‘center’ of the distribution
Rarely see another quantile expressed
Find t such that
S (t )  0.5



Complication: in some applications, median is
not reached empirically
Reported median based on model seems like
an extrapolation
Often just state ‘median not reached’ and give
alternative point estimate.
X-year survival rate


Many applications have ‘landmark’ times
that historically used to quantify survival
Examples:




Breast cancer: 5 year relapse-free survival
Pancreatic cancer: 6 month survival
Acute myeloid leukemia (AML): 12 month
relapse-free survival
Solve for S(t) given t
Competing Risks



Used to be somewhat ignored.
Not so much anymore
Idea:




Each subject can fail due to one of K causes
(K>1)
Occurrence of one event precludes us from
observing the other event.
Usually, quantity of interest is the causespecific hazard
Overall hazard equals sum of each
K
hazard:
hT (t )   hk (t )
k 1
Example


1.0
0.8
0.6
0.4

0.2

Interest is in RELAPSE
Need to account for treatment
related mortality (TRM)?
Should we censor TRM?
 No. that would make things look
more optimistic
Should we exclude them?
 No. That would also bias the
results
Solution:
 Treat it as a competing risk
 Estimate the incidence of both
Relapse
TRM
0.0

Myeloablative Allogeneic Bone Marrow
Transplant Using T Cell Depleted
Allografts Followed by Post-Transplant
GM-CSF in High Risk Myelodysplastic
Syndromes
Cumulative Incidence

0
5
10
15
Time from BMT (Months)
20
Estimating the Survival Function


Most common approach abandons
parametric assumptions
Why?


Not one ‘catch-all’ distribution
No central limit theorem for large
samples
Censoring

Assumption:




Potential censoring time is unrelated to the
potential event time
Reasonable?
Estimation approaches are biased when
this is violated
Violation examples


Sick patients tend to miss clinical visits more
often
High school drop-out. Kids who move may be
more likely to drop-out.
Terminology





D distinct event times
t1 < t2 < t3 < …. < tD
ties allowed
at time ti, there are di deaths
Yi is the number of individuals at risk at ti
 Yi is all the people who have event
times  ti
 di/Yi is an estimate of the conditional
probability of an event at ti, given
survival to ti
Kaplan-Meier estimation

AKA ‘product-limit’ estimator
1

ˆ
S (t )   [1  di ]

Yi
 ti t


if t  t1
if t  t1
Step-function
Size of steps depends on


Number of events at t
Pattern of censoring before t
Kaplan-Meier estimation

Greenwood’s formula


Most common variance estimator
Point-wise
di
ti t Yi (Yi  d i )
Vˆ[ Sˆ (t )]  Sˆ (t ) 2 
Example:



Kim paper
Event = time to relapse
Data:

10, 20+, 35, 40+, 50+, 55, 70+, 71+, 80, 90+
0.6
0.4
0.2
0.0
Survival Function
0.8
1.0
Plot it:
0
20
40
60
Time to relapse (months)
80
100
Interpreting S(t)


General philosophy: bad to
extrapolate
In survival: bad to put a lot of
stock in estimates at late time
points
Fernandes et al: A Prospective Follow Up of Alcohol Septal Ablation
For Symptomatic Hypertrophic Obstructive Cardiomyopathy The
Ten-Year Baylor and MUSC Experience (1996-2007)”
R for KM
library(survival)
library(help=survival)
t <- c(10,20,35,40,50,55,70,71,80,90)
d <- c(1,0,1,0,0,1,0,0,1,0)
cbind(t,d)
st <- Surv(t,d)
st
help(survfit)
fit.km <- survfit(st)
fit.km
summary(fit.km)
attributes(fit.km)
plot(fit.km, conf.int=F, xlab="time to relapse (months)",
ylab="Survival Function“, lwd=2)
0.6
0.4
0.2
0.0
Survival Function
0.8
1.0
Kaplan-Meier Curve
0
20
40
60
time to relapse (months)
80