Transcript Slide 1

Bayesian Travel
Time Reliability
Feng Guo, Associate Professor
Department of Statistics,Virginia Tech
Virginia Tech Transportation Institute
Dengfeng Zhang, Department of Statistics,
Virginia Tech
Travel Time Reliability
• Travel time is random in nature.
• Effects to quantify the uncertainty
– Percentage Variation
– Misery index
– Buffer time index
– Distribution
• Normal
• Log-normal distribution
•…
Multi-State Travel Time Reliability Models
• Better fitting for the
data
• Easy for interpretation
and prediction, similar
to weather forecasting:
– The probability of
encountering congestion
– The estimated travel time
IF congestion
Guo et al 2010
Multi-State Travel Time Reliability Models
• Direct link with
underline traffic
condition and
fundamental diagram
• Can be extended to
skewed component
distributions such as lognormal
Park et al 2010; Guo et al 2012
Model Specification
• 𝑓 𝑦 ~𝜆𝑓1 𝑦 𝜇1 , 𝜎1 + 1 − 𝜆 𝑓2 (𝑦|𝜇2 , 𝜎2 )
– 𝑓1 : distribution under free-flow condition
– 𝑓2 : distribution under congested condition
– 𝜆 is the proportion of the free-flow
component.
Model Parameters Vary by Time of Day
Mean
Probability in
congested
state
Variance
90th
Percentile
What is the root cause of this fluctuation?
Bayesian Multi-State Travel Time Models
• The fluctuation by time-of-day most like due
to traffic volume
• How to incorporate this into the model?
Model Specification: Model 1
• 𝑓 𝑦 ~𝜆𝑓1 𝑦 𝜇1 , 𝜎1 + 1 − 𝜆 𝑓2 (𝑦|𝜇2 , 𝜎2 )
– 𝑓1 : distribution under free-flow condition
– 𝑓2 : distribution under congested condition
– 𝜆 is the proportion of the free-flow
component.
Link mean travel time
of congested state
• 𝜇2 = 𝜃0 + 𝜃1 ∗ 𝑥
with covariates
• Φ−1 1 − 𝜆 = 𝛽0 + 𝛽1 ∗ 𝑥
Link probability of
travel time state
with covariates
Bayesian Model Setup
• Inference based on posterior distribution
• Using non-informative priors: let data
dominate results.
• Developed Markov China Monte Carlo
(MCMC) to simulate posterior distributions
•
Issues with Model 1
• When traffic volume is low,
the two component
distribution can be very
similar to each other
• The mixture proportion
estimation is not stable
Model Specification: Model 2
• 𝑓 𝑦 ~𝜆𝑓1 𝑦 𝜇1 , 𝜎1 + 1 − 𝜆 𝑓2 (𝑦|𝜇2 , 𝜎2 )
– 𝑓1 : distribution under free-flow condition
– 𝑓2 : distribution under congested condition
– 𝜆 is the proportion of the free-flow
component.
• 𝜇2 = 𝜽𝒔 ∗ 𝝁𝟏 + 𝜃1 ∗ 𝑥
• Φ−1 1 − 𝜆 = 𝛽0 + 𝛽1 ∗ 𝑥
where 𝜽𝒔 is a predefined scale parameter: How
large the minimum value of 𝝁𝟐 comparing to 𝝁𝟏
Comparing Model 1 and 2
• 𝜇2 = 𝜽𝒔 ∗ 𝝁𝟏 + 𝜃1 ∗ 𝑥
• 𝜽𝒔 =1: the minimum value of congested state is
the same as free flow
• 𝜽𝒔 =1.5: congested state is at least 50% higher
than free flow
𝜽𝒔 -1
Simulation Study
• To evaluate the performance of models
• Based on two metrics
– Average of posterior mean
– Coverage probability
1.Set n=Number of simulations we plan to run.
2.For (i in 1:n)
{
Generate data
Do
{
Markov Chain Monte Carlo
}While convergence
Record if the 95% credible intervals cover the true values
}
Simulation Study: Data Generation
• 𝑋𝑖𝑗 : Observed traffic volume at time interval i on day j
• 𝜇𝑖 : Average Traffic volume at time interval i (e.g. 8:009:00)
𝑘 𝑋𝑖𝑘
𝜇𝑖 =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑎𝑦𝑠
𝑌𝑖𝑗 : Simulated traffic volume at time interval i on day j 𝑌𝑖𝑗
+
= 𝑐 ∗ 𝜇𝑖 + 𝜖𝑖𝑗 , 𝜖𝑖𝑗 ~𝑁(0, 𝑑2 )
Model 1 VS Model 2: Posterior Means
Model 1 VS Model 2: Coverage Probability
Setting 3: when both 𝜃0 and 𝜃1 are small
Robustness
• What if…
– the true value of 𝜃𝑠 is unknown
– The two components are too close
• We showed that the overall estimation are quite
stable, even if the tuning parameter is misspecified
• When the two components are too close, by
selecting a misspecified tuning parameter could
improve the coverage probabilities of some
parameters
Robustness
Robustness
Robustness: Coverage Probabilities
Data
• The data set contains 4306 observations
• A section of the I-35 freeway in San Antonio,
Texas.
• Vehicles were tagged by a radio frequency
device
• High precision
Study Corridor
Modeling Results
Real Data Analysis
Probability of Congested State as A Function of
Traffic Volume
Next Step…
• Apply the model to a large dataset
Any available data are welcome!
• Hidden Markov Model
HMM
• The models discussed are based on the
assumption that all the observations are
independent. Is it realistic?
Simulated TravelTime
0.0
0.0
0.2
0.2
0.4
0.4
ACF
ACF
0.6
0.6
0.8
0.8
1.0
1.0
Observed Travel Time
0
10
20
Lag
30
40
0
5
10
15
20
Lag
25
30
35
Hidden Markov Model
• Hidden Markov model is able to incorporate
the dependency structure of the data.
• Markov chain is a sequence satisfies:
𝑃 𝑥𝑡+1 = 𝑥 𝑥1 , 𝑥2 , … 𝑥𝑡 = 𝑃 𝑥𝑡+1 = 𝑥 𝑥𝑡
• In hidden Markov Chain, the state 𝑥𝑡 is not
visible (i.e. latent) but the output 𝑦𝑡 is
determined by 𝑥𝑡
Hidden Markov Model
• Latent state:
1: 𝑓𝑟𝑒𝑒 − 𝑓𝑙𝑜𝑤
𝑤𝑡 =
2: 𝑐𝑜𝑛𝑔𝑒𝑠𝑡𝑒𝑑
• Distribution of travel time:
𝑓1 , 𝑖𝑓 𝑤𝑡 = 1
𝑦𝑡 |𝑤𝑡 ~
𝑓2 , 𝑖𝑓 𝑤𝑡 = 2
• 𝑤𝑡 and 𝑤𝑡−1 satisfy Markov property:
𝑃 𝑤𝑡+1 = 𝑤 𝑤1 , 𝑤2 , … 𝑤𝑡 = 𝑃 𝑤𝑡+1 = 𝑤 𝑤𝑡
• If {𝑤𝑡 } are independent, this is exactly the
traditional mixture Gaussian model we have
Model Specification
• Transition Probability: 𝑃𝑖𝑗 = 𝑃(𝑤𝑡+1 = 𝑗|𝑤𝑡 = 𝑖)
• E.g. 𝑃12 is the probability that the travel time is
jumping from free-flow state to congested state.
• We use logit link function to model the transition
probabilities with traffic volume:
𝑃12
log
= 𝛽0,1 + 𝛽1,1 ∗ 𝑥
𝑃11
𝑃22
log
= 𝛽0,2 + 𝛽1,2 ∗ 𝑥
𝑃21
Preliminary Results
• When the traffic volume is higher, the congested
state will be more likely to stay and free-flow state
will be more likely to make a jump.
• The mean travel time of the two states are 578.8 and
972.6 seconds.
• If we calculate the stationary distribution, the
proportion of congested state is around 11.3%.
• AIC indicates that hidden Markov model is superior
to traditional mixture Gaussian model.
Simulation Study
-20000
-21000
-22000
Log-likelihood
-19000
-18000
Dots: Hidden Markov
Lines: Traditional Mixture
0
200
600
400
Data Set ID
800
1000
• Questions?
• …
• Thanks!