Predictive Methods and Statistical Modeling of Crash Data II

Download Report

Transcript Predictive Methods and Statistical Modeling of Crash Data II

Fall 2015
Statistical Models For Crash Data
Modeling Process
Determine Modeling Objectives
•Definition (Intersections, Pedestrians, etc.)
•Data availability
•Unit Scales (Crashes/year; Severity; etc.)
Establish Appropriate Process
•Sampling Models
•Observational Models
•Process/System State Models
•Parameter Models (Bayesian Models Only)
Statistical Models For Crash Data
Modeling Process
Determine Inferential Goals
•Point estimate (Value + Standard Error)
•Distribution (Bayesian Models)
•Percentiles (2.5%, 85%, etc.; Bayesian Models)
Select Computation Techniques
•Frequentist (MLE)
•Bayesian (via simulation)
•Empirical Bayes
Evaluate Models
•Goodness-of-Fit
•Prediction
•Confidence Intervals
Statistical Models For Crash Data
Data and Methodological Issues Associated with Crash-Frequency Data
Data/Methodological Issue
Overdispersion
Associated Problems
Can violate some the basic count-data modeling assumptions of some modeling
approaches
Underdispersion
As with overdispersion, can violate some the basic count-data modeling
assumptions of some modeling approaches
Time-varying explanatory variables
Averaging of variables over studied time intervals ignores potentially important
variations within time intervals – which can result in erroneous parameter
estimates
Correlation over time and space causes losses in estimation efficiency
Causes an excess number of observations where zero crashes are observed
which can cause errors in parameter estimates
Temporal and spatial correlation
Low sample mean and small sample size
Injury severity and crash type correlation Correlation between severities and crash types causes losses in estimation
efficiency when separate severity-count models are estimated
Under reporting
Under reporting can distort model predictions and lead to erroneous inferences
with regard to the influence of explanatory variables
Omitted variables bias
If significant variables are omitted from the model, parameter estimates will be
biased and possibly erroneous inferences with regard to the influence of
explanatory variables will result
Endogenous variables
If endogenous variables are included without appropriate statistical corrections
parameter estimates will be biased and erroneous inferences with regard to the
influence of explanatory variables may be drawn
Functional form
If incorrect functional for is used, the result will be biased parameter estimates
and possibly erroneous inferences with regard to the influence of explanatory
variables
If parameters are estimated as fixed when they actually vary across
observations, the result will be biased parameter estimates and possibly
erroneous inferences with regard to the influence of explanatory variables
Fixed parameters
Statistical Models For Crash Data
Summary of Existing Models for Analyzing Crash-Frequency Data
Model Type
Poisson
Advantages
Most basic model; easy to estimate
Negative binomial/Poissongamma
Easy to estimate can account for
overdispersion
Poisson-lognormal
More flexible than the Poisson-gamma to
handle over-dispersion
Zero-inflated Poisson and
negative binomial
Handles datasets that have a large number of
zero-crash observations
Conway-Maxwell-Poisson
Gamma
Generalized estimating
equation models
Generalized additive
models
Disadvantages
Cannot handle over- and under-dispersion;
negatively influenced by the low sample
mean and small sample size bias
Cannot handle under-dispersion; can be
adversely influenced by the low sample
mean and small sample size bias
Cannot handle under-dispersion; can be
adversely influenced by the low sample
mean and small sample size bias (less than
the Poisson-gamma); cannot estimate a
varying dispersion parameter
Can create theoretical inconsistencies; zeroinflated negative binomial can be adversely
influenced by the low sample mean and
small sample size bias
Can handle under- and over-dispersion or
Could be negatively influenced by the low
combination of both using a variable dispersion sample mean and small sample size bias; no
(scaling) parameter
multivariate extensions available to date
Can handle under-dispersed data
Truncated distribution (full gamma
function); independence of data (incomplete
gamma function)
Can handle temporal correlation
May need to determine or evaluate the type
of temporal correlation a priori; results
sensitive to missing values
More flexible than the traditional generalized
Relatively complex to implement; may not
estimating equation models; allows non-linear be easily transferable to other datasets
variable interactions
Statistical Models For Crash Data
Summary of Existing Models for Analyzing Crash-Frequency Data
Model Type
Advantages
Random-effects models
Handles temporal and spatial correlation
Negative multinomial
Can account for overdispersion and serial
correlation; panel count data.
Random-parameters
models
More flexible than the traditional fixed
parameter models in accounting for
unobserved heterogeneity
Can model different crash types
simultaneously; more flexible functional form
than the generalized estimating equation
models (can use non-linear functions)
Bivariate/multivariate
models
Finite mixture/Markov
Switching
Duration models
Can be used for analyzing sources of
dispersion in the data
By considering the time between crashes (as
opposed to crash frequency directly), allows
for a very in-depth analysis of data and
duration effects
Hierarchical/Multilevel
Models
Can handle temporal, spatial and other
correlations among groups of observations
Neural Network, Bayesian
Neural Network, and
support vector machine
Disadvantages
May not be easily transferable to other
datasets
Cannot handle under-dispersion; can be
adversely influenced by the low sample
mean and small sample size bias
Complex estimation process; may not be
easily transferable to other datasets
Complex estimation process; requires
formulation of correlation matrix
Complex estimation process; may not be
easily transferable to other datasets
Requires more detailed data than
traditional crash frequency models; timevarying explanatory variables are difficult
to handle
May not be easily transferable to other
datasets; correlation results can be difficult
to interpret
Non parametric approach does not require an Complex estimation process; may not be
assumption about distribution of data; flexible transferable to other datasets; work as
functional form; usually provides better
black-boxes; may not have interpretable
statistical fit than traditional parametric
parameters
models
Review of Multivariate Linear Models
Ordinary Least Square Method:
This is an estimation technique that is used for estimating
unknown coefficients. It consists of solving p = k + 1
simultaneously linear equations and by minimizing the sum
of square errors.
Let
yi   0  1 xi1   2 xi 2 
  k xik   i
k
yi   0    j xij   i
j 1
Note: E(ε) = 0 and var(ε) = σ2
Review of Multivariate Linear Models
The least square function S is given by
n
S   2
i 1



S    yi  b0   b j xij  


i 1 
j 1


n
k
2
The S function is to be minimized with respect to β1, β2, …, βk.
The least square estimators, say b0, b1, …, bk, must satisfy
S
|b0 ,b1,
 0
S
|b0 ,b1,
 j
k



,bk  2  yi  b0    j xij    0
i 1 
i 1


n 
k

 
( yi  b0   b j xij )  xij  0
,bk  2 


i 1 
j 1

 
n
j = 1, 2, …, k
Review of Multivariate Linear Models
It is easier to solve the equations by using a matrix
format. The equations can be written the following way:
y  Xβ  
where
 y1 
1 x11 x12

 
1
x
x
y
21
22
2



X
y

 

 
1 xn1 xn 2
 yn 
x1k 
 1 
 1 
 

 
2 

x2 k 
2 


ε
β
 
 

 
 

xnk 
n 
 n 
Review of Multivariate Linear Models
Need to find the least square estimator b that minimizes
n
S (  )    i2     (y  Xβ)(y  Xβ)
i 1
It can be shown that S(β) can be expressed this way
S ( )  yy  2βXy  βXXβ
The least square estimator* must satisfy
S
|b  2Xy  2XXb  0

which simplifies to
XXb  Xy
b  (XX) Xy
1
* b is called the ordinary least squares estimator of β.
Review of Multivariate Linear Models
Maximum Likelihood Method:
The likelihood function is found from the joint probability
distribution of the observations. Given the assumption that
the distribution of errors is normally distributed and the
variance σ2 is constant, the likelihood function is the
following (normal distribution)
(y, β,  ) 

1
2
(2 )
Same model as before:
2
e
n/2
1
2
Y  Xβ
2
( y  Xβ )( y  Xβ )
Review of Multivariate Linear Models
The maximum likelihood estimators are the values of the
parameters β and σ2 that maximize the likelihood function.
Maximizing the likelihood is equivalent to maximizing the
log-likelihood, ln( ). The log-likelihood is:
n
n
1
2
ln[ (y, β,  )]   ln(2 )  ln( )  2 ( y  Xβ)( y  Xβ)
2
2
2
2
The derivative of the log-likelihood function is called the
score function. Taking the derivatives with respect to the
coefficients β and equating to zero yields
 ln( )
1
  2 (2bXy  bXXβ)  0
β
2
1
2
2
X(y  Xb)  0
b =  XX Xy
Review of Multivariate Linear Models
Taking the partial derivative with respect to

2
gives
 ln( )
1
1
  2  4 (y  Xb)(y  Xb)  0
2

2
2
Which is
1
  (y  Xb)(y  Xb)
n
2
Generalized Linear Models
In the previous overheads, it was obvious how the normal
distribution played an important role in estimating the
coefficients and inferences of probabilistic models. Unfortunately,
there are many practical situations where the normal assumption
is not valid. Count data, binary response (0 or 1) or other
continuous variables with positive and high-skewed distribution
cannot be modeled with a normally distributed errors.
The generalized linear model (GLM) was developed to allow
fitting regression models for univariate response data that
follows a very general distribution called exponential family. This
family includes the normal, binomial, negative binomial,
geometric, gamma, etc.
Statistical Models For Crash Data
Poisson-gamma Model (NB)
The crash count (or any count) follows a Poisson distribution:
e  i  i 
f ( yi | i ) 
yi !
The mean of yi, conditional on μi, is Poisson with the
conditional mean and variance given by
f ( yi | i )  

0
e
 i


 i     1e d 
i
i
yi !
  
i
i
Statistical Models For Crash Data
Poisson-gamma Model (NB)
The PDF of the Poisson-gamma regression for yi is

( yi   )     ui 
f ( yi ) 

 

( yi  1)( )    ui     i 
yi
The mean and variance are given by

E ( yi )  ui Var ( yi )  ui 

2
i
The mean function is given by
or
Var ( yi )  ui  i2
E ( yi )  ui  exp(xi β)
Statistical Models For Crash Data
Poisson-gamma Model
Example – Crash Data at 3-legged signalized intersections:
Functional form:   e 
0  1 Fmaj   2 Fmaj
Functional form needed to model crash data:


  0 Fmaj
Fmin
1
2
Where,
  Expected number of crashes
Fmaj  Major traffic flow
Fmin  Minor traffic flow
Need to take the
natural log of the
flow variables
Statistical Models For Crash Data
Poisson-gamma Model
The GENMOD Procedure
Model Information
Data Set
WORK.C
Distribution
Negative Binomial
Link Function
Log
Dependent Variable
Total Total
 e
10.113
0.740
maj
F
0.505
min
Number of Observations Read
Number of Observations Used
F
  4.05E  05  F
0.740
maj
255
255
Criteria For Assessing Goodness Of Fit
0.505
min
F
Criterion
DF
Value
Value/DF
Deviance
252
288.8580
1.1463
Scaled Deviance
252
288.8580
1.1463
Pearson Chi-Square
252
312.6975
1.2409
Scaled Pearson X2
252
312.6975
1.2409
Log Likelihood
836.0686
Full Log Likelihood
-606.7989
AIC (smaller is better)
1221.5978
AICC (smaller is better)
1221.7578
BIC (smaller is better)
1235.7628
Algorithm converged.
Var ( y)    0.313 2
Analysis Of Maximum Likelihood Parameter Estimates
Standard Wald 95% Confidence
Wald
Parameter DF Estimate
Error
Limits
Chi-Square Pr > ChiSq
Intercept
1 -10.0648
1.3659 -12.7420 -7.3876
54.29
<.0001
logf_maj
1
0.7517
0.1320
0.4929
1.0105
32.41
<.0001
logf_min
1
0.4837
0.0562
0.3735
0.5939
74.01
<.0001
Dispersion 1
0.3153
0.0519
0.2135
0.4170
NOTE: The negative binomial dispersion parameter was estimated by maximum
likelihood.
Statistical Models For Crash Data
Statistical fit (Goodness of fit)
There are various methods for estimating the statistical fit of models.
The methods cane be divided into two categories:
Likelihood Statistics
• Log-Likelihood
• Deviance
• Pearson Chi-Square
• Akaike’s Information Criterion (AIC)
• Bayesian Information Criterion (BIC)
Model Errors
• Mean Absolute Deviance
• Mean Squared Prediction Errors
Statistical Models For Crash Data
Log-likelihood
Poisson:
n
ln L       yi ln  i   i  ln  yi !
i 1
NB:

  1i

ln L  ,      yi ln 
1
1


i
i 1 


n
Where:
i  expxiβ


1


ln
1




ln

y



ln

y

1

ln


 i 
 i 
 

i




Statistical Models For Crash Data
Log-likelihood
Example – Crash Data at 3-legged signalized intersections:
Poisson: -685.34
NB:
-606.80
Statistical Models For Crash Data
Statistical fit (Goodness of fit)
The deviance statistic is defined as twice the difference
between the maximum log-likelihood achievable (y=μ)
and the log-likelihood of the fitted model:
D(y | μˆ )  2{ (y )  (μˆ )}
When competitive models are compared, the model with
the lowest deviance offers the best statistical fit. A note
of caution: this is only valid when the dispersion
parameter Φ is the same for each competitive model.
Statistical Models For Crash Data
Statistical fit (Goodness of fit)
The deviance statistic for the Poisson model is the
following:

 yi
DP  2  yi ln 
i 1 
 ˆ i

n


  ( yi  ˆ i ) 


The deviance statistic for the Poisson-gamma model is
the following:

 yi
 2  yi ln 
i 1 
 ˆ i

n
DNB
1


yi    
1
  ( yi   ) ln 
1  

 ˆ i    
Statistical Models For Crash Data
Statistical fit (Goodness of fit)
The deviance statistic for the Poisson model is the
following:
DP  644.4
The deviance statistic for the Poisson-gamma model is
the following:
DNB  288.9
Statistical Models For Crash Data
Statistical fit (Goodness of fit)
AIC and BIC penalize the fit when additional variables are
added to the model.
AIC:
AIC  2ln L  2P
BIC:
BIC  2 ln L  P  ln(n)
P = estimated coefficients + 1
n = number of observations
Statistical Models For Crash Data
AIC and BIC
AIC and BIC penalize the fit when additional variables are
added to the model.
AIC:
AICP  2  685.3  2  3  1,376.7
AICNB  2  606.8  2  4  1, 221.6
BIC:
BICP  2  685.3  3  ln(255)  1,387.2
BICNB  2  606.8  4  ln(255)  1, 235.8
Statistical Models For Crash Data
Statistical fit (Model Errors)
Mean Absolute Deviation (MAD)
This criterion has been proposed by Oh et al. (2003) to evaluate the fit of
models. The Mean Absolute Deviance (MAD) calculates the absolute difference
between the estimated and observed values
1 n
MAD   ˆ i  yi
n i 1
Mean Squared Prediction Error (MSPE)
The Man Squared Prediction Error (MSPE) is a traditional indicator of error and
calculates the difference between the estimated and observed values squared.
1 n
2
MPSE    ˆ i  yi 
n i 1

Recent Models for Over-dispersion:
◦ Poisson-lognormal
 Poisson mean follows a lognormal distribution
◦ Poisson-Weibull
 Poisson mean follows a Weibull distribution
◦ Random-Parameters (investigation of the variance)
◦ Negative Binomial-Lindley (highly dispersed data)
 Overcome problems with zero-inflated models.
◦ Generalized Sichel (highly dispersed data)
◦ Generalized Waring (highly dispersed data – investigation of
variance)
◦ Finite mixture (Poisson and Poisson-gamma – investigation of
variance and structure of data)
◦ Bayesian Model Averaging (automatically compare different
models)
◦ See AA&P and Safety Science for info on some of these models.

Recent Models for Under-dispersion:
◦ Not very common; usually with low sample mean and often based
on model output (conditional on the mean).
◦ All the models below can be also used for over-dispersion
◦ Gamma time-dependent
 Observations not independent.
◦ Conway-Maxwell-Poisson
 Has become increasingly popular
◦ Double-Poisson
 Work published
◦ Hyper-Poisson
 Work published



Crash data have often the characteristics that
the mean μ can be very low (below 1.0)
Create problems with goodness-of-fit and
prediction
Read papers by
◦ Wood, G.R. (2004) Generalised Linear Models and
Goodness of Fit Testing. Accident Analysis &
Prevention, Vol. 34, pp. 417-427.
◦ Lord, D. (2006) Modeling Motor Vehicle Crashes using
Poisson-gamma Models: Examining the Effects of Low
Sample Mean Values and Small Sample Size on the
Estimation of the Fixed Dispersion Parameter.
Accident Analysis & Prevention, Vol. 38, No. 4, pp.
751-766.
Statistical Models For Crash Data
Low Mean Issue
Statistical Models For Crash Data
Time Trend Effects
2.5
Mean (crashes per year)
2
1.5
1
0.5
0
0
1
2
3
4
Year
5
6
7
Statistical Models For Crash Data
Time Trend Effects
Goal: capture changes that vary from year to year directly
into the model.
The model structure is given by the following:
yit  0,it   j 1  ji x j
p
Time Trend captured with the intercept (i.e.,
one intercept for each year)
Characteristic: each year is defined as a different
observation.
Issues: Since each site is observed at a different point in
time, a temporal serial correlation exits and affects the
statistical inferences of statistical models. Therefore, you
need to account for this correlation into the model.
Modeling approach: Generalized Estimating Equations
(GEE); Random-Effects models, etc.





The Bayes method approaches the
analysis of data differently than the
classical method (frequentist)
Subjective judgment more easily
incorporated with the observed data and
models
Treat unknown coefficients of regression
models as random variables
Data analysis less limited by the number
of observations (can be supplemented
with subjective judgment)
Computationally intensive (no longer an
issue)


The Bayes method makes inferences from data using
probability models for quantities that are observed
and for quantities one is interested to learn about
Bayesian data analysis can be divided into three
steps:
◦ Setting up a full probability model: provide a joint probability
distribution for all observable and unobservable quantities
◦ Conditioning on observed data: calculating and interpreting
the appropriate posterior distribution (conditional probability
distribution)
◦ Evaluating the fit of the model and implication of the
posterior distribution

Emphasis placed on interval estimation
(confidence interval) rather than hypothesis
testing




For the EB method, a different weight is
assigned to the prior distribution and
standard estimate respectively
In safety analyses, the weights are
estimated with the assumption that the
mean () for each site follows a Gamma
distribution
The EB estimates has been found to
outperform other estimates, such as the
MLE
The EB framework is presented on next
overhead
Formulation:
ˆˆ  ˆ  (1   ) y
where

1
ˆ
1

Mean of a Poisson-gamma regression

1

Dispersion parameter of NB regression
Using the same example shown earlier:
F1 = 24,164; F2 = 3,392; y=10
The values are estimated as follows
0.816
0.3732
ˆ
u  5.5E  5  24,164
 2,560
ˆ  3.9 Crashes per year
1

 0.39
3.90
1
2.46
ˆˆ  0.39  3.9  (1  0.39)10  7.63
Crashes per year
Crashes per Year
Observed value 10
EB estimate 7.63
MLE estimate 3.9
1
t
2
Year