ALDA, Chapter Three - Harvard Graduate School of Education
Download
Report
Transcript ALDA, Chapter Three - Harvard Graduate School of Education
Introducing discrete-time survival analysis
ALDA, Chapter Eleven
“To exist is to change, to change is to mature”
Henri Bergson
John B. Willett & Judith D. Singer
Harvard Graduate School of Education
Chapter 11: Fitting basic discrete-time hazard models
Review basic descriptive statistics for discrete-time survival data (Ch 10)
Life table
Hazard function
Survivor function
Median lifetime
Specifying a suitable discrete-time hazard model (§11.1 & 11.2)—both
heuristic and formal representations
Fitting the discrete-time hazard model to data (§11.3)—it turns out that
it’s very easy to fit the model
Interpreting parameter estimates (§11.4)—very different from growth
modeling, but more similar to logistic regression
Displaying fitted hazard and survivor functions (§11.5)—as in growth
modeling, we’ll display fitted functions at prototypical predictor values
Comparing (nested) discrete-time hazard models using goodness-of-fit
statistics (§11.5)—methods for data analysis and model comparison
Illustrative example: Grade at first heterosexual intercourse
Data source: Deborah Capaldi & colleagues (1996) Child Development
Sample: 180 middle school boys (all considered “at risk”)
Research design:
Large panel study in which each boy was tracked from 7 th through 12th
grades
By the end of data collection (at the end of 12 th grade), n=126 (70.0%) had
had sex
The remaining n=54 (30%) were still virgins. These censored observations
pose a challenge for data analysis.
Question predictor: PT, for parenting transition, a dichotomy indicating
whether the boy lived with his biological parents during his early
formative years (before 7th grade when data collection began)
72 boys (40%) lived with both biological parents (PT=0)
108 boys (60%) experienced at least one parenting transition before 7 th
grade (PT=1)
Ultimately, we’ll also examine a continuous predictor, PAS, which
assesses the parents’ level of antisocial behavior during the child’s
formative years (also time-invariant—behavior before the study started).
Because the original scale is totally arbitrary, scores have been standardized
to a mean of 0 and sd of 1
(ALDA, Section 11.1, pp 358-360)
The life table: Summarizing the distribution of event occurrence over time
J intervals,
T=7, 8, …, 12
Risk set
n experiencing target
event in interval j
n censored in
interval j
How might we summarize the
distribution of event occurrence?
(ALDA, Section 10.1, pp 326-329)
Assessing the conditional risk of event occurrence: The discrete-time hazard function
hˆ(t j )
n events j
n at risk j
,
15
hˆ(t 7 )
0.0833
180
24
hˆ(t 9 )
0.1519
158
26
hˆ(t12 )
0.3250
80
Discrete-time hazard
Conditional probability that individual i will experience the
target event in time period j (Ti = j) given that s/he didn’t
experience it in any earlier time period (Ti j)
h(tij)=Pr{Ti= j|Ti j}
As a probability (only in discrete time), hazard is bounded
by 0 and 1. This is an issue for modeling that we’ll need
to address
Estimation is easy because each value of hazard is based
on that interval’s risk set.
h(t)
0.30
0.20
0.10
0.00
6
7
8
9
Grade
(ALDA, Section 10.2.1, pp 330-339)
10
11
12
Cumulating risk over time: The survivor function (and median lifetime)
Sˆ (t j ) Sˆ (t j 1 )[1 hˆ(t j )]
Sˆ (t 7 ) 1.0 [1 0.0833] 0.9167
Sˆ (t 9 ) 0.8778 [1 0.1519] 0.7444
Discrete-time survival probability
Probability that individual i will “survive”
beyond time period j (Ti > j)
(i.e.,will not experience the event until after time period j).
S(tij)=Pr{Ti > j}
Also a probability bounded by 0 and 1.
At the beginning of time, S(ti0)=1.0
Strategy for estimation: Since h(tij) tells us about the
probability of event occurrence, 1-h(tij) tells us about the
probability of non-occurrence (i.e., about survival)
Estimated median lifetime
S(t)
1.00
0.75
ML = 10.6
0.50
0.25
0.00
6
7
8
9
Grade
(ALDA, Section 10.2, pp 330-339)
10
11
12
Converting a person-level data set into a person-period data set
Person-period data set:
•
Person-level data set:
one row per person
•
ID
T
CENSOR
PT
193
9
0
1
126
12
0
1
407
12
1
0
ID 407 was
censored,
remaining a virgin
through 12th grade
(ALDA, Section 10.5.1, pp 351-354)
ID 193 had sex in
the 9th grade
ID 126 had sex
in the 12th grade
one row for every person-period until
event occurrence or censoring—different
from growth modeling
EVENT indicates either event occurrence
or censoring
Contemplating a DTSA model:
Inspecting sample plots of within-group hazard and survivor functions
Q’s to ask when examining sample hazard f ns:
• What is the shape of each hazard function?—here,
their shape is similar—both beginning low and
climbing steadily over time.
• Does the relative level of hazard differ across
groups?—here, hazard for boys with a parenting
transition is consistently higher
• Suggests partitioning variation in risk into:
• A baseline profile of risk
• A shift in risk corresponding to variation in the
predictor
Q’s to ask when examining sample survivor f ns:
• They tend to be less useful because they assess the
predictor’s cumulative effect—here, telling us that the
ML for boys with a PT is 10.0 vs. 11.7 when PT=0.
• Note: reversal of relative rankings
We’re almost ready to go, but back
to the bounded nature of hazard
(ALDA, Section 11.1.1, pp 358-361)
As in regular regression, we use transformation to deal with hazard’s bounds:
Understanding the effects of taking odds and logits
1.0
Estimated hazard
0.8
0.6
odds
hazard
odds
1 hazard
logit
One or more early
parenting transitions
0.4
0.2
No early
parenting transitions
0.0
6
7
8
9
10
11
12
hazard
log(odds) log
1
hazard
Grade
1.0
Estimated odds
0.0
0.8
Estimated logit(hazard)
One or more early
parenting transitions
-1.0
One or more early
parenting transitions
0.5
No early
parenting transitions
-2.0
0.3
-3.0
No early
parenting transitions
0.0
-4.0
6
7
8
9
10
11
12
Grade
Facts about odds scale
• Symmetric about 1 (50/50)
• Effect most prominent when hazard is larger
• Easy to get back to raw hazard:
hazard
odds
1 odds
• But it’s still bounded below by 0 and it’s
asymmetric (raw differences have different
meanings depending upon value of odds)
(ALDA, Section 11.1.2, pp 362-365)
6
7
8
9
10
11
12
Grade
Facts about logit scale
Not bounded at all, although you need to get
used to negative values (whenever hazard<.50)
Usually regularizes distance betw hazard f ns
Stretches distance between small values
Compresses distance between large values
It’s easy to get back to raw hazard
1
hazard
1 e logit
What population model might have generated these sample data?
Plotting sample hazard estimates and overlaying alternative hypothesized models
0.0
General population logit hazard,
shifted when PT switches from 0 to 1
Logit(hazard)
"
"
-1.0
PT=1 "
-2.0
PT=0!
6
!
9
10
!
11
12
Flat population logit hazard,
shifted when PT switches from
0 to 1
"
!
-3.0
-4.0
"
!
"
!
Linear population logit hazard,
shifted when PT switches from 0
to 1
7
8
Grade
Three reasonable features of a population discrete-time hazard model
1. For each predictor value, there is a population logit-hazard function.
•
When the predictor(s)=0, we call it the “baseline” logit-hazard function.
2. Each population logit-hazard function is constrained to have the identical
shape, regardless of predictor value.
•
This is an assumption, and it can—and will—be relaxed later.
3. The distance between each of these logit hazard functions is identical in every
time period.
•
•
•
(ALDA, Section 11.1.1, pp 366-369)
Differences in predictor value only “shift” the logit-hazard function “vertically.”
This assumption can—and will—be relaxed later
In the meantime, the magnitude of this shift is the magnitude of the predictor’s effect
How do we specify a discrete-time hazard model that has these 3 features?
Recode PERIOD into a set of TIME indicators
Constant vertical shift in
logit hazard associated
with variation in PT
logit h(tij ) [7 D7 j Dj 12 D12 ] 1PTi
(ALDA, Section 11.2, pp369-372)
How does this
model relate to the
previous graph?
Carefully unpacking the discrete-time hazard model
0.0
When PT=1, you shift this
entire baseline vertically by 1
Logit(hazard)
-1.0
1
-2.0
PT = 1
PT = 0
-3.0
-4.0
6
7
(D7=1)
1 0
9
7
1 2
1 1
When PT=0, you get the
baseline logit hazard function
8
8
(D8=1)
9
...
Grade
10
...
11
...
12
(D12=1)
logit h(tij ) [7 D7 j Dj 12 D12 ] 1
PT
1i
And we can add predictors just as in regular (logistic) regression
logit h(t ij ) [ 7 D7 j D j 12 D12 ] 1 PTi 2 PASi
(ALDA, Section 11.2.1, pp 372-376)
How does this model behave when hazard is
expressed in the other scales?
What does the DT hazard model look like when expressed on the other scales?
On the logit scale, the distances between
functions is identical in every time period
(assumption built into our model)
Logit(hazard)
0.0
-1.0
1
odds e logit
-2.0
PT = 1
-3.0
PT = 0
hazard
-4.0
6
7
8
9
10
11
1
1 e logit
12
Grade
0.8
Odds
0.5
Hazard
0.4
0.6
0.3
0.4
0.2
exp(1)
0.2
0.0
PT = 1
0.1
PT = 1
PT = 0
PT = 0
0.0
6
7
8
9
10
11
12
Grade
On the odds scale, one function is a constant
magnification (or dimunition) of the other
—they are proportional
6
7
8
9
10
11
12
Grade
On the hazard scale, the functions have no
constant relationship
(Would need to use a complementary log-log
transformation to get a proportional hazards model)
The “standard” DTSA model is a proportional odds model!
(ALDA, Section 11.2.2, pp 376-379)
Fitting the model to data: Use logistic regression in the person-period data set
TIME indicators
Outcome
Substantive predictors
All parameter estimates,
standard errors, t- and zstatistics, goodness-of-fit
statistics, and tests will be
correct for the discrete-time
hazard model
Model A :
Model B :
Model C :
Model D :
logit h(t j ) 7 D7 8 D8 ... 12 D12
logit h(t j ) 7 D7 8 D8 ... 12 D12 1 PT
logit h(t j ) 7 D7 8 D8 ... 12 D12
logit h(t j ) 7 D7 8 D8 ... 12 D12 1 PT 2 PAS
’s estimate the baseline
logit hazard function
(ALDA, Section 11.3, pp 378-386)
2 PAS
’s assess the effects of
substantive predictors
^
Strategies for interpreting the ’s: ML estimates of the baseline hazard function
Simplifying interpretation by transforming back to odds and hazard
Because there are no substantive predictors, Model
A’s estimates are the full sample estimates
Because there are no
predictors in Model A,
this baseline is for the
entire sample
• If est’s are approx equal,
baseline is flat
• If est’s decline, hazard
declines
• If est’s increase (as they do
here), hazard increases
(ALDA, Section 11.4.1, pp 386-388)
^
Strategies for interpreting the ’s: ML estimates of the substantive predictors’ effects
Dichotomous predictors
As in regular logistic
regression, antilogging a
parameter estimate yields the
estimated odds-ratio
associated with a 1-unit
difference in the predictor:
Continuous predictors
Antilogging still yields a
estimated odds-ratio
associated with a 1-unit
difference in the predictor:
e
ˆ PAS
e0.4428 1.56
ˆ
e PT e0.8736 2.4
The estimated odds of first
intercourse for boys whose
parents exhibited “1 unit more”
of antisocial behavior are 1.56
times the odds for boys whose
parental antisocial behavior was
one unit lower.
The estimated odds of first
intercourse for boys who have
experienced a parenting
transition are 2.4 times higher
than the odds for boys who did
not experience such a transition.
Estimated odds of first
intercourse for boys who did
not experience a parenting
transition are 1/(2.40)=.42 or
approximately 40% the odds for
boys who did
(ALDA, Section 11.4.2 & 11.4.3, pp 388-390)
Because odds ratios are
symmetric about 1, you can
also invert the odds ratios and
change the reference group
Estimated odds of first
intercourse for boys who
parents have “1 unit less” of
antisocial behavior are
1/(1.56)=.641 or approximately
2/3rds the odds for boys whose
parents were 1 unit higher
Displaying fitted hazard and survivor functions
Illustrating the general idea using Model B for a single dichotomous predictor
With a single dichotomous predictor, there are only 2 possible prototypical functions:
PT=0 (for boys from stable homes with no parenting transitions before 7th grade)
PT=1 (for boys who experienced one of more early parenting transitions)
logit hˆ(t j ) ˆ j ˆ1 PT
hˆ(t j )
(ALDA, Section 11.5.1, pp 392-394)
Sˆ (t j ) Sˆ (t j 1 )[1 hˆ(t j )]
1
1 e
logit hˆ ( t j )
Displaying fitted hazard and survivor functions
Constant vertical separation of 0.8736
(the parameter estimate for PT)
Easy to see the effect of PT
Non-constant vertical separation (no
simple interpretation because the model
is proportional in odds, not hazard)
Effect of PT cumulates into a large
difference in estimated median lifetimes
(9.9 vs. 11.8 2 years)
(ALDA, Section 11.5.1, pp 392-394)
Displaying fitted hazard and survivor functions when some predictors are continuous
As in growth modeling,
select substantively interesting
prototypical values and proceed in
just as you did for
dichotomous predictors
here, we’ll choose +/- 1 sd PAS (lo=1, medium=0, and high=+1)
0.5
Fitted hazard
PAS=+1
0.4
PAS= 0
One or more early
parenting transitions
0.3
PAS= -1
PAS=+1
PAS= 0
0.2
PAS= -1
0.1
No early
parenting transitions
0.0
6
7
8
9
10
11
12
Grade
Estimated Median Lifetimes
PAS
PT=0
PT=1
Low (-1)
>12.0
10.7
Medium (0)
11.5
10.1
High (+1)
10.9
9.6
1.0
Fitted survival probability
No early
parenting transitions
PAS = -1
PAS = 0
PAS = +1
PAS = -1
PAS = 0
PAS = +1
0.5
One or more early
parenting transitions
0.0
6
7
8
9
Grade
(ALDA, Section 11.5.1, pp 392-394)
10
11
12
Comparing goodness of fit using deviance statistics and information criteria:
The strategies are generally the same as in growth modeling
TIME dummies
Deviance
smaller value, better fit, 2
dist., compare nested models
AIC, BIC
smaller value, better fit,
compare non- nested models
Model B vs. Model A provides an
uncontrolled test of H0: PT=0
DDeviance=17.30(1), p<.001
Model C vs. Model A provides an
uncontrolled test of H0: PAS=0
DDeviance=14.79(1), p<.001
Model D vs. Models B&C provide
controlled tests
[Both rejected as well]
(ALDA, Section 11.6, pp 397-402)