Transcript Document
Time-focused density-based
clustering of trajectories
of moving objects
Margherita D’Auria
Mirco Nanni
Dino Pedreschi
Plan of the talk
Introduction
Density-based clustering on trajectories
Trajectory data model distance measure
Results
Temporal Focusing
Motivations
Problem & context
Density-based Clustering (OPTICS)
A clustering quality measure
Heuristics for optimal temporal interval
Conclusions & future work
2
Motivations
Plenty of actual and future data sources for
spatio-temporal data
Sophisticated analysis method are required, in
order to fully exploit them
Data
mining methods
Which kind of patterns/models?
Main objectives
A better understanding of
the application domain
An improvement for private and public services
3
Problem & context
A distinguishing case: Mobile devices
PDAs
Mobile phones
LBS-enabled devices (may include the two above)
They (can) yield traces of their movement
An important problem:
Discovering groups of individuals that (approx.) move together in some
period of time
E.g.: detection of traffic jams during rush hours
A candidate Data Mining reformulation of the problem
Clustering of individuals’ trajectories
4
Which kind of clustering?
Several alternatives are available
General requirements:
Non-spherical
clusters should be allowed
E.g.: A traffic jam along a road
It should be represented as a cluster which individuals form a
“snake-shaped” cluster
Tolerance
to noise
Low computational cost
Applicability to complex, possibly non-vectorial data
A suitable candidate: Density-based clustering
In
particular, we adopt OPTICS
5
A crushed intro to OPTICS
A density threshold is defined through two parameters:
ε: A neighborhood radius
MinPts: Minimum number of points
Key concepts:
Core objects
Reachability-distance reach-d( p, q )
Objects with a ε-Neighborhood that contains at least MinPts objects
(simplified definition:) Distance between objects p and q
Example:
Object “q” is a core object if MinPts=2
Object “p” is not
Their reach-d() is shown
ε
q
reach-d(p,q)
p
ε –neighborhood of q
6
A crushed intro to OPTICS
The algorithm:
1.
Repeatedly choose a non-visited random object, until a core object
is selected
2.
Select the core object having the smallest reachability distance
from all the visited core objects. If none can be found, go to step 1
Order of visit
Output: reach-d() of all visited points
(reachability plot)
“jump” from left-hand group (0-9)
to right-hand one (10-18)
Reachability
threshold
Cluster 1
Cluster 2
7
Applying OPTICS to trajectories
Two key issues have to be solved
A suitable representation for
Which data model for trajectories?
A mean
trajectories is needed
for comparing trajectories has to be provided
Which distance between objects?
OPTICS needs to define one to perform range queries
8
A trajectory data model
Raw input data:
Each trajectory is represented as a set of time-stamped coordinates
T=(t1,x1,y1), …, (tn, xn, yn) => Object position at time ti was (xi,yi)
Data model
Parametric-spaghetti: linear interpolation between consecutive points
9
A distance between trajectories
Adopted distance = average distance
D( 1 , 2 ) |T
d ( (t ),
T
1
2
(t )) dt
|T |
It is a metric => efficient indexing methos allowed
10
A sample dataset
Set of trajectories forming 4 clusters + noise
Generated by the CENTRE system (KDDLab software)
11
OPTICS vs.
HAC & K-means
K-means
HAC-average
OPTICS
12
Temporal focusing
Different time intervals can show different
behaviours
E.g.:
objects that are close to each other within a time
interval can be much distant in other periods of time
The time interval becomes a parameter
E.g.:
rush hours vs. low traffic times
Problem: significant time intervals are not always
known a priori
An
automated mechanism is needed to find them
13
Temporal focusing
The proposed method
1.
Provide a notion of interestingness to be
associated with time intervals
2.
We define it in terms of estimated quality of the clustering
extracted on the given time interval
Formalize the Temporal focusing task as an
optimization problem
Discover the time interval
interestingness measure
that
maximizes
the
14
A quality measure for
density-based clustering
General principle
High-density clusters separated by
low-density noise are preferred
The method
High-density clusters correspond to
low dents in the reachability plot
=> Evaluate the global quality Q of the
clustering output as the average
reachability within clusters (noise
is discarded)
HIGH
DENSITY
MEDIUM
DENSITY
LOW
DENSITY
Definition: given ε and dataset D, compute QD, ε as:
QD, ε = - R (D, ε’) = - AVGo in D’ reach-d(o)
D’ = D – {noise objects}
15
FAQs
How Q() is computed for a given time interval I ?
How is the reachability threshold set for each interval?
Step 1: trajectory segments out of I are clipped away
Step 2: OPTICS is run on the clipped trajectories
Step 3: Q(I) is computed on the output reachability plot
A reachability threshold is needed in order to locate clusters (and noise)
The threshold for the largest I is manually set by the user
Thresholds for other intervals I’ I are computed from the first one by
proportionally rescaling w.r.t. average reachability
Is the optimal Q(I) biased towards tiny intervals?
Yes. The problem has been fixed by defining Q’(I) = Q(I) / log |I|
=> A small decrease in Q(I) is accepted when it yields a much larger I
16
Esperiments
A more complex sample dataset (generated by CENTRE)
Clear clusters in the central time interval vs. dispersion on the borders
17
Optimizing Q()
Find the optimal Q() by plotting values for all time intervals
The optimum corresponds to the central time interval
18
Heuristics for optimum search
Each Q() value computation requires a run of the OPTICS algorithm
Computing all O(N2) values is too expensive
Alternative approaches are needed
Preliminary tests with hill-climbing (i.e., greedy) approach:
starting
points
local
optima
global
optimum
(N=|{sub-intervals}|)
Test on the same dataset
Global optimum found in the
70,7% of runs
Avg. number of steps: 17
Avg. OPTICS runs: 49
19
Conclusions & Future works
Summary of the work
Extension
of OPTICS to a trajectory data model & distance
Definition of the Temporal Focusing problem
Definition of a clustering quality measure
(Preliminary) Tests with exhaustive & greedy optimization
Future work
Experimental validation over
broader benchmarks
Tighter integration between OPTICS and search strategy
Alternative, domain-specific definition of quality measures
20