Statistics 262: Intermediate Biostatistics
Download
Report
Transcript Statistics 262: Intermediate Biostatistics
Statistics 262: Intermediate
Biostatistics
April 20, 2004: Introduction to Survival Analysis
Jonathan Taylor and Kristin Cobb
Satistics 262
1
What is survival analysis?
Statistical methods for analyzing longitudinal
data on the occurrence of events.
Events may include death, injury, onset of
illness, recovery from illness (binary
variables) or transition above or below the
clinical threshold of a meaningful continuous
variable (e.g. CD4 counts).
Accommodates data from randomized clinical
trial or cohort study design.
Satistics 262
2
Randomized Clinical Trial (RCT)
Disease
Intervention
Random
assignment
Target
population
Disease-free,
at-risk cohort
Disease-free
Disease
Control
Disease-free
TIME
Randomized Clinical Trial (RCT)
Cured
Treatment
Random
assignment
Target
population
Patient
population
Not cured
Cured
Control
Not cured
TIME
Randomized Clinical Trial (RCT)
Dead
Treatment
Random
assignment
Target
population
Patient
population
Alive
Dead
Control
Alive
TIME
Cohort study
(prospective/retrospective)
Disease
Exposed
Target
population
Disease-free
cohort
Disease-free
Disease
Unexposed
Disease-free
TIME
Objectives of survival analysis
Estimate time-to-event for a group of
individuals, such as time until second heartattack for a group of MI patients.
To compare time-to-event between two or
more groups, such as treated vs. placebo MI
patients in a randomized controlled trial.
To assess the relationship of co-variables to
time-to-event, such as: does weight, insulin
resistance, or cholesterol influence survival time of
MI patients?
Note: expected time-to-event = 1/incidence rate
Satistics 262
7
Examples of survival analysis
in medicine
Satistics 262
8
RCT: Women’s Health
Initiative (JAMA, 2001)
On hormones
Cumulative
incidence
On placebo
Prospective cohort study:
From April 15, 2004 NEJM:
Use of Gene-Expression Profiling to Identify Prognostic
Subclasses in Adult Acute Myeloid Leukemia
Satistics 262
10
Retrospective cohort study:
From December 2003 BMJ:
Aspirin, ibuprofen, and mortality after myocardial
infarction: retrospective cohort study
Satistics 262
11
Why use survival analysis?
1. Why not compare mean time-to-event
between your groups using a t-test or
linear regression?
-- ignores censoring
2. Why not compare proportion of events
in your groups using logistic regression?
--ignores time
Satistics 262
12
Cox regression vs.logistic
regression
Distinction between rate and proportion:
Incidence (hazard) rate: number of new
cases of disease per population at-risk
per unit time (or mortality rate, if
outcome is death)
Cumulative incidence: proportion of
new cases that develop in a given time
period
Satistics 262
13
Cox regression vs.logistic
regression
Distinction between hazard/rate ratio and
odds ratio/risk ratio:
Hazard/rate ratio: ratio of incidence
rates
Odds/risk ratio: ratio of proportions
By
takingregression
into account
you are the
taking
into
account
Logistic
aimstime,
to estimate
odds
ratio;
Cox
more information
just binary
yes/no.
regression
aims tothan
estimate
the hazard
ratio
Gain power/precision.
Satistics 262
14
Rates vs. risks
Relationship between risk and rates:
R(t ) 1 e
ht
h constant hazard rate
R(t) probability of disease in time t
Satistics 262
15
Rates vs. risks
For example, if rate is 5 cases/1000
person-years, then the chance of
developing disease over 10 years is:
R(t ) 1 e
(.005 )(10 )
.05
Compare to .005(10) = 5%
R(t ) 1 e
R(t ) 1 .951 .0488
Satistics 262
The loss of persons at
risk because they have
developed disease
within the period of
observation is small
relative to the size of
the total group.
16
Rates vs. risks
If rate is 50 cases/1000 person-years,
then the chance of developing disease
over 10 years is:
R(t ) 1 e (.05)(10 )
.5
R(t ) 1 e
R(t ) 1 .61 .39
Compare to .05(10) = 50%
Satistics 262
17
Rates vs. risks
Relationship between risk and rates (derivation):
r (t ) he
ht
Exponential density function
for waiting time until the
event (constant hazard rate)
t
R(t ) he
hu
du e
hu
t
0
e
ht
e
0
1 e
ht
0
Preview: Waiting time distribution will
change if the hazard rate changes as
a function of time: h(t)
Satistics 262
18
Survival Analysis: Terms
Time-to-event: The time from entry into a
study until a subject has a particular outcome
Censoring: Subjects are said to be censored
if they are lost to follow up or drop out of the
study, or if the study ends before ends before
they die or have an outcome of interest.
They are counted as alive or disease-free for
the time they were enrolled in the study.
If dropout is related to both outcome and
treatment, dropouts may bias the results
Right Censoring (T>t)
Common examples
Termination of the study
Death due to a cause that is not the event
of interest
Loss to follow-up
We know that subject survived at least to
time t.
Satistics 262
20
Left censoring (T<t)
The origin time, not the event time, is known
only to be less than some value.
For example, if you are studying menarche
and you begin following girls at age 12, you
may find that some of them have already
begun menstruating. Unless you can obtain
information about the start date for those
girls, the age of menarche is left-censored at
age 12.
*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.
Satistics 262
21
Interval censoring (a<T<b)
When we know the event has occurred
between two time points, but don’t
know the exact dates.
For example, if you’re screening
subjects for HIV infection yearly, you
may not be able to determine the exact
date of infection.*
*from:Allison, Paul. Survival Analysis. SAS Institute. 1995.
Satistics 262
22
Data Structure: survival
analysis
Time variable: ti = time at last diseasefree observation or time at event
Censoring variable: ci =1 if had the
event; ci =0 no event by time ti
Satistics 262
23
Choice of origin
Satistics 262
24
Satistics 262
25
Describing survival
distributions
Ti the event time for an individual, is a
random variable having a probability
distribution.
Different models for survival data are
distinguished by different choice of
distribution for Ti.
Satistics 262
26
Survivor function (cumulative
distribution function)
Cumulative failure function
F (t ) P(T t )
Survival analysis typically uses complement, or the
survivor function:
S (t ) 1 P(T t ) 1 F (t )
Example: If t=100 years, S(t=100) = probability of
surviving beyond 100 years.
Satistics 262
27
Corresponding density function
dF (t )
dS (t )
f (t )
dt
dt
The probability of the failure time
occurring at exactly time t (out of the
whole range of possible t’s).
Also written:
P(t T t t )
f (t ) lim
t
0
t
Satistics 262
28
Hazard function
P(t T t t / T t )
h(t ) lim
t
0
t
In words: the probability that if you survive to t,
you will succumb to the event in the next instant.
f (t )
Hazard from density and survival : h(t)
S (t )
Derivation:
h(t )dt P(t T t dt / T t )
P(t T t dt & T t ) P(t T t dt ) f (t )dt
P(T t )
P(T t )
S (t )
Satistics 262
29
Relating these functions:
f (t )
Hazard from density and survival : h(t)
S (t )
Survival from density : S(t) f (u )du
t
dS (t )
Density from survival : f (t )
dt t
( h ( u ) du )
Density from hazard : f (t ) h(t )e
0
t
( h ( u ) du )
Survival from hazard : S(t) e
Hazard from survival : h(t) -
0
d
ln S (t )
dt
Satistics 262
30
Introduction to Kaplan-Meier
Non-parametric estimate of survivor
function.
Commonly used to describe survivorship
of study population/s.
Commonly used to compare two study
populations.
Intuitive graphical presentation.
Satistics 262
31
Survival Data (right-censored)
Subject A
Subject B
Subject C
Subject D
Subject E
X 1. subject E dies at 4
months
Beginning of study
Time in months
End of study
Corresponding Kaplan-Meier
Curve
100%
Probability of
surviving to just
before 4 months
is 100% = 5/5
Fraction
surviving this
death = 4/5
Subject E dies at 4
months
Time in months
Survival Data
Subject A
Subject B
2. subject A
drops out after
6 months
Subject C
3. subject C dies
X at 7 months
Subject D
Subject E
X 1. subject E dies at 4
months
Beginning of study
Time in months
End of study
Corresponding Kaplan-Meier
Curve
100%
subject C dies at
7 months
Time in months
Fraction
surviving this
death = 2/3
Survival Data
Subject A
Subject B
2. subject A
drops out after
6 months
Subject C
3. subject C dies
X at 7 months
Subject D
4. Subjects B
and D survive
for the whole
year-long
study period
Subject E
X 1. subject E dies at 4
months
Beginning of study
Time in months
End of study
Corresponding Kaplan-Meier
Curve
100%
Product limit estimate of survival =
P(surviving/at-risk through failure 1) *
P(surviving/at-risk through failure 2) =
4/5 * 2/3= .5333
Time in months
The product limit estimate
The probability of surviving in the entire year,
taking into account censoring
= (4/5) (2/3) = 53%
NOTE: 40% (2/5) because the one drop-out
survived at least a portion of the year.
AND <60% (3/5) because we don’t know if
the one drop-out would have survived until
the end of the year.
KM estimator, formally
k distinct event times t1 t j ... t k
at each time t j , there are n j individuals at - risk
d j is the number who have the event at timet j
S (tˆ)
dj
[1 n
j:t j t
]
j
Satistics 262
39
Comparing 2 groups
Caveats
Survival estimates can be unreliable
toward the end of a study when there
are small numbers of subjects at risk of
having an event.
WHI and breast cancer
Small
numbers
left
Overview of SAS PROCS
LIFETEST - Produces life tables and Kaplan-Meier
survival curves. Is primarily for univariate analysis of
the timing of events.
LIFEREG – Estimates regression models with
censored, continuous-time data under several
alternative distributional assumptions. Does not
allow for time-dependent covariates.
PHREG– Uses Cox’s partial likelihood method to
estimate regression models with censored data.
Handles both continuous-time and discrete-time data
and allows for time-dependent covariables
Satistics 262
43