Event History Analysis
Download
Report
Transcript Event History Analysis
Event History Analysis
PS 791
Advanced Topics in Data Analysis
Event History Analysis
… and its cousins
Event History Analysis is a general
term comprising a set of time duration
models
Survival Analysis
Duration analysis
Hazard Modeling
Event Duration
When we look at processes that occur
over time, we are often interested in
two aspects of the process:
the duration of the events,
How long a regime or alliance lasts
the transition event or state
The occurrence of a coup
Survival in broader terms
Survival analysis is often used to examine
the length of time that an entity survives
after exposure to a disease or toxin.
In toxicity studies this time might be the LC50
The concentration of the toxin that will kill 50%
of the species during the time of exposure – say
24 hours
Used for determining acute toxicity of a chemical
compound
Survival in a non-fatal sense
Other senses of survival
Length of time a regime lasts or stays in
power
Length of a military intervention
Duration of wars; or alliances
The Mathematics of Survival
Some definitions:
T is a positive random variable for survival time
– the length of time before a change of state
T is continuous
Until we assume it isn’t – for later
The actual measure of the survival time, or instance of
it, is t.
The possible values of T have a probability distribution,
f(t), and a cumulative distribution function F(t).
The distribution function of T
The distribution function of T is expressed as:
t
F (t ) f (u )d (u ) Pr(T t )
0
This expresses the idea that some survival
time T is less than or equal to t
The Unconditional Failure Rate
If we differentiate F(t), we get the density
function
dF (t )
f (t )
F ' (t )
d (t )
We can characterize the distribution of
failures by either distribution or density
function
The Survivor Function
The survivor function denotes the probability a survival
time T is equal to or greater that some time T.
S (t ) 1 F (t ) Pr(T t )
This is also the proportion of units surviving beyond t.
S(t) is a strictly decreasing function since as time
passes there are fewer and fewer individuals surviving
The Hazard Rate
Given the survival function and the density
of failures, we have a way that “survival”
and “death are accounted for in EHA
(Event History Analysis)
We obtain another important component in
EHA when we look at the relationship
between the two in the hazard rate.
h(t )
f (t )
S (t )
A Conditional Failure Rate
The hazard rate is the rate at which
units fail - or durations end – by t given
that the unit has survived until t.
Thus the hazard rate is a conditional
failure rate.
The Interrelationships
The hazard rate, survivor function, and
distribution and density functions all
interrelated.
dS (t )
f (t )
dt
Thus the hazard rate can be represented by
dS(t ) / dt d log S (t )
h(t )
S (t )
dt
Using OLS on Durations
If we model the duration of an event using
OLS
Like the year a regime lasts
We regress the duration length on a set of
characteristics or exogenous variables
Often we will log the duration time because
of some extremely durable cases that make
the distribution asymmetric.
This will cause problems
Censoring
In some cases, a case may not have
failed by the end of the observation
period.
We refer to this as right-censoring.
Model adoption of state lottery
If a state has not adopted it by the end of
the sample time frame, it is right
censored
Left-censoring
Left censoring occurs when the history
of the event begins prior to the start of
the observed period
A regime that began before the time
frame
A dispute already underway
Censoring (cont)
Note that both right- and left-censoring
is common in many time-series data
sets and is not dealt with in regression
designs at all.
EHA can incorporate censoring in the
models.
Based on calculating likelihoods
Selection Bias
Duration Models can give us a tool to look at
Selection Bias
When we study something like the determinants
of regime failure, and we have a data set
comprised of regimes, their failure dates, and
the exogenous variables we think led to the
failure, we have omitted cases that didn’t fail
Because they did not fail because of the same
factors that those that did fail we have biased
our sample.
Duration models can account for this bias.
Somehow!
Time Varying Covariates
Regression assumes constant relationships
(covariates)
Yt B0 B1 X t et
What if the slope changes over the course of the
study?
Yt B0 Bt X t et
Regression can handle this through Stochastic or
Time-Varying Parameter models, but they are
usually ignored
Distribution of failure times
If we can correctly specify the type and
shape of the distribution of the failure
rate, we can estimate the impact of the
covariates on the failure rate.
The shape of that failure rate is a
function of it’s parameterization
The model’s covariates are used to
assess that parameterization
The exponential model
The exponential model implies a baseline
hazard rate that is flat
The likelihood of a failure is the same at any
given time
This implies a constant hazard rate
h(t )
Other distributions
Weibell
Used if the hazard rate is increasing or
decreasing
Log-logistic or Log-normal
Gompertz
How to choose?
Theory?
Generalized Gamma
Proportional Hazard Models
Cox Proportional Hazard
Similar to Weibull
Discrete Time Data
An example
Events
Action-reaction Models