USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A

Transcript USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A

USING DATA MINING TO PREDICT ROAD
CRASH COUNT WITH A FOCUS ON SKID
RESISTANCE VALUES
Authors
Daniel Emerson; Richi Nayak; QUT
Justin Weligamage: QDTMR
Presenter
Daniel Emerson
Computer Science Discipline
Queensland University of Technology (QUT)
Project Details
• The work for this presentation was conducted as a larger
skid resistance – crash analysis as CIEAM I and CIEAM
II projects from 2009 -20011 and conducted at QUT.
• Project initiators & organizers: Justin Weligamage,
Richi Nayak.
• Data mining supervisor: Richi Nayak.
• Data preparation, data mining & dm strategist : Daniel
Emerson
• Road engineering advisor: Nappadol Piyatrapoomi
•
Motivation
(why the work was done)
• Applied data mining as a new approach for analysis with
Queensland road & crashes data.
• Had found a relationship between the crash risk of roads
and their attributes, with skid resistance being significant.
(roads having crash).
• Sought a higher resolution measure of road crash risk
through the crash count method.
• Application of crash count data mining models in
decision support systems to identify potential roads for
investigation and treatment.
Introduction
This paper presents a data mining case study in which
predictive data mining is applied to model the skid
resistance & road attributes to predict crash
relationship with the purpose of:
 development of models (algorithms) on sample data,
 application of the models to other data to predict high
risk roads.
Data and Data Preprocessing
• Several data sources obtained from QDTMR for four year period of
2004 to 2007 include
– annual 1 km (or less) road segment snapshots with a list of road
variables,
• road surface texture depth test readings; seal type and seal
age; roadway features, traffic flow, features such as
intersections and many others.
– dated, skid resistance 100 metre (or less) values representing
skid resistance tests F0,
– Crash instances, crash details and their road location
Examination of road segment crash count
• Meeting our need for a more precise crash measure:
crashes per 1km per year.
Crash count characteristics
Scatterplot of 2004, 2005, 2006, 2007 vs Year Crash Count
1400
Variable
2004
2005
2006
2007
Crash Instance Count
1200
1000
800
600
400
200
0
1 yr time scale
0
5
10
20
15
Year Crash Count
25
30
35
• Road segment crash count showed stability from year
to year, indicating its value in crash risk analysis.
Clusters: crash count ranges (4yr)
• Road segment data mining clusters based on road properties
showed characteristic crash counts, thus relating road crash
8
proneness with road properties
Method: Applying predictive data mining
Reasons;
• To demonstrate that road segment crash count can be
modeled, thus establishing a relationship between crash
count and roadway features.
• Use the rules obtained from the model output in the
analytical process to further contribute to understanding
of how the roadway features contribute to crash count.
• Later apply successful models in decision support.
Method: Applying predictive data mining
… using a subset of quality data
•
•
•
•
Select the target variable to be predicted (crash count).
Select the input variables (road segment attributes).
Select a modelling method (regression tree algorithm).
Run a range of models with varying configurations
(regression tree).
• Evaluate and understand the results.
Model variables
Target Variable
Road segment
crash count
Road attribute input variables (significant order)
AVG_FRICTION_AT_60_Ikm (F60 skid resistance)
AADT (traffic rates)
traffic_percent_heavy
lane_count
Texture Depth
roughness_average
rutting_average
seal_age
seal_type
CRASH_SPEED_LIMIT
CWAY_TYPE (single, double)
CRAS_DIVIDED_ROAD
ROAD_TYPE (highway, urban arterial etc)
Roadway Feature (roundabouts, bridges, intersections etc)
• These road segment attributes were relevant to predicting
road segment crash count and became model input variables.
Model results
Model
Leaves
& rules
Correlation
(R-squared)
1
143
0.93
2
159
0.93
3
161
0.93
4
163
0.92
5
119
0.91
6
88
0.86
• All models show a high correlation between
actual crash count and predicted crash count
Charts of actual value vs. predicted value
predicted
value
Actual value
• Comparing models with 143 leaves and 83 leaves
A sample output rule
Sample Rule 1.
IF AVG_FRICTION_AT_60 < 0.4095
• AND CRASH_SPEED_LIMIT IS ONE OF: 90 100 110
• AND 3987 <= AADT < 6105
• AND CWAY_TYPE EQUALS SINGLE
THEN
• NODE : 48
• N : 315 …. Number of road segments in the group
• AVE : 4.04444 …average crashes for the group
• SD : 2.5357 ..standard deviation of the predicted crash
values
Conclusion
 Road segment crash count can be successfully
modelled with road attributes using data mining.
 A strong relationship exists between road crash count
and road attributes.
 Skid resistance plays an important role in determining
the crash characteristics of the road segment.
 The models may be of sufficient quality to use in
decision support.
 While the models are specific to Queensland roads,
the method can be trialled and evaluated elsewhere.
Future Work
• Work with road asset domain experts to analyse the
rules, draw conclusions and improve the models.
• Apply models for analysis of data subsets, such as
crashes with severe human outcomes.
• Apply the models to the whole-of-network dataset with
the goal of identifying road segments that are skid
resistance sensitive, i.e surface intervention to improve
skid resistance will result in reduce crash risk.
Acknowledgement
This study is an ongoing investigation into road-crash
supported by CIEAM (CRC Asset Management), QDTMR
and Faculty of Science and Technology, QUT
Data mining tools used include
•SAS (Statistical Analysis Software)
•WEKA (Data Mining Software)
Acknowledgement
Thanks and Questions
Project Publications
[1] Nayak, R., Piyatrapoomi, N. and Weligamage, J. (2009). Application of text mining in analysing road
crashes for road asset management. Proceedings of the Third World Congress on Engineering Asset
Management, WCEAM 2009, ( Athens, Greece, 28-30 September 2009).
[2] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N.(2010) Using Data Mining on Road
Asset Management Data in Analysing Road Crashes. Proceedings of the 16th Annual TMR
Engineering & Technology Forum, (Brisbane, July 20, 2010, 2010).
[3] Emerson, D., Nayak, R., Weligamage, J. and Piyatrapoomi, N. (2011). Identifying differences in wet
and dry road crashes using data mining. (2010). Proceedings of the Fifth World Congress on
Engineering Asset Management, WCEAM 2010, ( Brisbane, October 26,2010).
[4] Nayak, R., Emerson, D., Weligamage, J. and Piyatrapoomi, N. (2011) Road Crash Proneness
Prediction using Data Mining, Proceedings of the EDBT 2011, (Uppsala, Sweden., 2011).

USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A

Transcript USING DATA MINING TO PREDICT ROAD CRASH COUNT WITH A

Directory