#### Transcript 3.Data mining

```Parametric calibration of speed–density
relationships in mesoscopic traffic simulator
with data mining
Speaker: Gung-Shian Lin
Date:2009/10/20
Information Sciences, vol.179, no.12,
pp. 2002-2013, 2009

Outline
2
1
Introduction
2
Literature review
3
Data mining
4
Experiments and results
5
Conclusions
1.Introduction
Calibrating speed–density relationship parameters using data
mining techniques, and proposes a novel hierarchical clustering
algorithm based on K-means clustering
Mesoscopic simulators aim to model either a single vehicle or a
group of vehicles in order to depict any responsive actions of
different vehicles to the traffic information.
3
2.Literature review
 In the mesoscopic models which are used in DTA
systems
k  k 0,
v 0,


vu   
k  k0  
)  , others,
v 0 1  (
kjam 
 
vu
v0
0 k
0
4
kjam
k
3.Data mining
 LWR(Locally weighted regression)
Step 1: Take x (densities or both densities and flows make up the x) as a
center to form a space. The width of the space isdescribed by the
q = fn
Step 2: Define the weights of all points in specific sectors. The weight of
any point is the height of a weight function. The common weight
function is selected:
(1  u 3 )3 , 0  u  1,
W (u )  
otherwise,
0,
The weight for the observation (xi, yi) is:
wi  W ( p( x, xi) d ( x)),
Step 3: Fit a polynomial for each point in an independent variable space
by using the weighted least square algorithm
Step 4: Acquire the value of yi.
5
3.Data mining
x
q=fn
(1  u 3 )3 , 0  u  1,
W (u )  
otherwise,
0,
(1  u 3 )3 , 0  u  1,
W (u )  
otherwise,
0,
wi  W ( p( x, xi) d ( x)),
p(x,xi)＜d(x)→W(u)=(1-u3)3
p(x,xi) ≧d(x)→W(u)=0
6
3.Data mining
 Agglomerative hierarchical clustering algorithm based on Kmeans
 The proposed algorithm can be summarized as follows
Step 1: Use K-means to cluster the sensor data which is taken as training
instances, and these k clusters are named as constraint- clusters.
Densities, flows and speeds contain abundant information about the
traffic status, so they are chosen to cluster.
7
3.Data mining
K-means is executed in the following steps:
 1. Randomly select k clustering centers from n training cases.
 2. Find the nearest clustering center to each xi (density or both density and
flow), then put xi in it.
 3. Compute the objective function E. If the value of E is unchanged, we
should consider that the results of the clustering are also unchanged. Then
the iteration stops.
8
3.Data mining
k=3

50
40
40

50
30
20
10
0
30
20
10
0
0
10
20
30

40
50
60
0
10
20
40
50
60
50
60
(b)
(a)

50
50
40
40

30

30
20
10
0
30
20
10
0
0
10
9
20
30

(c)
40
50
60
0
10
20
30

(d)
40
3.Data mining

50
40
40

50
30
20
10
0
30
20
10
0
0
10
20
30

(e)
10
40
50
60
0
10
20
30

(f)
40
50
60
3.Data mining
 Step 2:For each constraint-cluster, use the agglomerative
hierarchical clustering to build a clustering tree.
 The basic steps of the complete-link algorithm are:
1. Place each instance in its own cluster. Then, compute the distances between
these points.
2. Step thorough the sorted list of distances, forming for each distinct threshold
value dk a graph of the samples where pairs of samples closer than dk are
connected into a new cluster by a graph edge. If all the samples are members of
a connects graph, stop. Otherwise, repeat this step.
3. The output of the algorithm is a nested hierarchy of graphs, which can be
cut at the desired dissimilarity level forming a partition (clusters) identified by
simple connected components in the corresponding subgraph.
11
3.Data mining
12
3.Data mining

13

3.Data mining
 Step 3: These k clustering trees are combined as an integrated clustering tree
by using the complete-link algorithm. After all samples are clustered, a
separate local regression will be run for the observation in each cluster.
 Step 4: The new densities and flows are classified to the most appropriate
cluster by using k-nearest neighbors. The k- nearest neighbor sorter
uses Euclidean distance to search k densities and flows samples
completed clustering.
14
4. Experiments and results
 The sensor data are preprocessed to eliminate erroneous data
and repair missing ones.
 Step1: Define data in some cycles as data it is in some phase and scan the
sending time of data one by one to find out the missing ones. Check it is
erroneous or not according to the criteria in Table.
Step 2: Repair the missing data and the erroneous data. The average value
in the neighboring phase is used to repair these data.
15
4. Experiments and results
16
4. Experiments and results
17
4. Experiments and results
Estimated speed by the classical
speed–density relationship
18
4. Experiments and results
19
 Table presents the RMSPE obtained by each approach.
20
5. Conclusions
 The proposed method overcomes the limitations of classic
models of speed–density relationships.
 KHCA obtained the highest precision in capturing traffic
dynamics compared to other existing clustering algorithms.
21

```