Trajectory-Based Clustering
Download
Report
Transcript Trajectory-Based Clustering
Lunch Talk
A Brief Overview of
Trajectory Data Mining
May 26, 2009
Jae-Gil Lee
* This work was done while I was at University of Illinois at UrbanaChampaign for the years from 2006 through 2008.
1
Outline
Introduction to Trajectory Data
Clustering
Classification
Outlier Detection
2
Trajectory Data
A trajectory is a sequence of the location and
timestamp of a moving object
Hurricanes
Vessels
Turtles
Vehicles
3
Importance of Analysis on Trajectory Data
The world becomes more and more mobile
Prevalence of mobile devices such as cell phones,
smart phones, and PDAs
Satellite, sensor, RFID, and wireless technologies
have been improved rapidly
Tremendous amounts of trajectory data of moving
objects
4
Research Impacts
Trajectory data mining has many important, realworld applications driven by the real need
Homeland security (e.g., border monitoring)
Law enforcement (e.g., video surveillance)
Weather forecast
Traffic control
Location-based service
…
5
Significance of Our Research
Previous Frameworks of Trajectory Mining
The whole trajectory is considered as an atomic unit
Our Frameworks of Trajectory Mining
A trajectory is processed after it is partitioned into
a set of line segments (trajectory partitions)
Our frameworks can take advantage of the
knowledge of partial trajectories
Open a new paradigm of trajectory mining
6
Major Categories in Data Mining
Clustering
Classification
Discover common behaviors (i.e.,
motions) from trajectory data of
moving objects
Classify moving objects based on
their trajectories and other
information
SIGMOD’07, SSTD’07
VLDB’08, Submitted to a journal
Pattern Mining
Outlier Detection
Find synchronous (i.e., chase)
patterns from trajectory data of
moving objects
Uncover suspicious behaviors
from trajectory data of moving
objects
Submitted to a journal
ICDE’08, ICDE’09
7
Example Applications
Clustering
Classification
Classify vessel types (e.g., tankers and containers) in radar
images
Most systems use a size estimate for classifying vessels, so
small vessels are very hard to classify
Pattern Mining
Find common behaviors of animal migration from animal
movement data
Find popular sequences and travel times of tourists in sightseeing
places
Outlier Detection
Detect suspicious persons in parking garages from video
surveillance data (trajectories can be extracted from video data)
8
Movebank Project
For Animal Tracking and Photo Monitoring Data
NSF-funded project initiated in the summer of 2007
Aim: Facilitate long-term comparisons of animal movement data
making it possible to address pressing questions such as the
effects of global climate change and human-caused landscape
change
A data repository, a live data pipeline, online mapping and
analysis tools, an educational tool, a community of
collaborators
Homepage: http://www.movebank.org/
Participants: New York State Museum, Princeton University, San
Diego Supercomputer Center, UIUC Data Mining Group
Our clustering algorithm will be open to public via web
services
9
Example: Oil Bird Study
Study animal behavior and movement with new toys: “Franzmitters”
848 MHz radio GPS, 3D acceleration
Optional: Hand tracking, power amplifier, solar power
10
Outline
Introduction to Trajectory Data
Clustering
Classification
Outlier Detection
15
Trajectory Clustering
Group similar trajectories into the same cluster
1. Whole Trajectory Clustering
Probabilistic Clustering
Density-Based Clustering: TF-OPTICS
2. Partial Trajectory Clustering
The Partition-and-Group Framework
16
Partition-and-Group Framework
(Lee et al. 07)
Existing algorithms group trajectories as a whole
They might not be able to find similar
portions of trajectories
e.g.) The common behavior cannot be discovered
since TR1~TR5 move to totally different directions
TR3
TR4
TR5
A common sub-trajectory
TR2
TR1
The partition-and-group framework is
proposed to discover common sub-trajectories
17
Usefulness of Common Sub-Trajectories
Discovering common sub-trajectories is very
useful, especially if we have regions of special
interest
Hurricane Landfall Forecasts
Meteorologists will be interested in the common
behaviors of hurricanes near the coastline or at sea
(i.e., before landing)
Effects of Roads and Traffic on Animal Movements
Zoologists will be interested in the common behaviors
of animals near the road where the traffic rate has
been varied
18
Overall Procedure
Two phases: partitioning and grouping
(1) Partition
TR3
TR4
TR5
A set of trajectories
TR2
TR1
A representative trajectory
(2) Group
A cluster
A set of line segments
Note: a representative trajectory is a common sub-trajectory
19
Partitioning Phase
Identify the points where the behavior of a
trajectory changes rapidly characteristic points
An optimal set of characteristic points is found by using
the minimum description length (MDL) principle
pc
pc
3
p6
p7
p8
p5
pc
p1
p4
2
p3
: characteristic point
4
p2
pc
TRi
1
: trajectory partition
Partition a trajectory at every characteristic point
20
Overview of the MDL Principle
The MDL principle has been widely used in information
theory
The MDL cost consists of two components: L(H) and
L(D|H), where H means the hypothesis, and D the data
L(H) is the length, in bits, of the description of the hypothesis
L(D|H) is the length, in bits, of the description of the data when
encoded with the help of the hypothesis
The best hypothesis H to explain D is the one that
minimizes the sum of L(H) and L(D|H)
21
MDL Formulation
Finding the optimal partitioning translates to finding the
best hypothesis using the MDL principle
H → a set of trajectory partitions, D → a trajectory
L(H) → the sum of the length of all trajectory partitions
L(D|H) → the sum of the difference between a trajectory and a set
of its trajectory partitions
p5
p3
p2
p4
p1
pc
1
pc
2
L( H )
log 2(len ( p1 p 4))
L( D | H ) log 2(d ( p1 p 4, p1 p 2) d ( p1 p 4, p 2 p 3) d ( p1 p 4, p 3 p 4))
log 2(d ( p1 p 4, p1 p 2) d ( p1 p 4, p 2 p 3) d ( p1 p 4, p 3 p 4))
L(H) measures conciseness; L(D|H) preciseness
22
Grouping Phase (1/2)
Find the clusters of trajectory partitions using
density-based clustering (i.e., DBSCAN)
A density-connect component forms a cluster, e.g., { L1,
L2, L3, L4, L5, L6 }
MinLns = 3
L5
L3
L4
L2
L1
L6
L6
L5
L3
L1
L2
L4
23
Grouping Phase (2/2)
Describe the overall movement of the trajectory
partitions that belong to the cluster
A red line: a representative trajectory,
A blue line: an average direction vector,
Pink lines: line segments in a density-connected set
24
Sample Clustering Results
7 Clusters from Hurricane Data
570 Hurricanes (1950~2004)
A red line: a representative trajectory
25
2 Clusters from Deer Data
26
Outline
Introduction to Trajectory Data
Clustering
Classification
Outlier Detection
27
Trajectory Classification
Predict the class labels of moving objects based
on their trajectories and other features
1. Machine learning techniques
Studied mostly in pattern recognition, bioengineering,
and video surveillance
The hidden Markov model (HMM)
2. TraClass: Trajectory classification using
hierarchical region-based and trajectory-based
clustering
28
Classification
Training data
Each data element has its class label
e.g., <deer, a sequence of points>
<elk, a sequence of points>
Extract features from the training data and build a
classification model
Predict a class label of a new data element
e.g., <?, a sequence of points> → class label = deer
29
Common Characteristics of Previous Methods
Use the shapes of whole trajectories to do
classification
Encode a whole trajectory into a feature vector;
Convert a whole trajectory into a string or a sequence
of the momentary speed; or
Model a whole trajectory using the HMM
Note: Although a few methods segment
trajectories, the main purpose is to approximate
or smooth trajectories before using the HMM
30
TraClass: Trajectory Classification
Based on Clustering (Lee et al. 08)
Motivation
Discriminative features are likely to appear at parts of
trajectories, not at whole trajectories
Discriminative features appear not only as common
movement patterns, but also as regions
Solution
Extract features in a top-down fashion, first by regionbased clustering and then by trajectory-based
clustering
31
Intuition and Working Example
Container Port
Refinery
Port A
Port B
Fishery
Container Ships
Tankers
Fishing Boats
Parts of trajectories near the container port and near the refinery
enable us to distinguish between container ships and tankers
even if they share common long paths
Those in the fishery enable us to recognize fishing boats even if
they have no common path there
32
Trajectory Partitions
Region-Based Clustering
Region-Based Clustering
Trajectory-Based Clustering
Features
Trajectory-Based Clustering
Class-Conscious Trajectory Partitioning
1. Trajectories are partitioned based on their shapes as in
the partition-and-group framework
2. Trajectory partitions are further partitioned by the class
labels
The real interest here is to guarantee that trajectory partitions do
not span the class boundaries
Non-discriminative
Discriminative
Class A
Class B
Additional partitioning points
34
Region-Based Clustering
Objective: Discover regions that have trajectories mostly
of one class regardless of their movement patterns
Algorithm: Find a better partitioning alternately for the X
and Y axes as long as the MDL cost decreases
The MDL cost is formulated to achieve both homogeneity and
conciseness
35
Trajectory-Based Clustering
Objective: Discover sub-trajectories that indicate
common movement patterns of each class
Algorithm: Extend the partition-and-group
framework for classification purposes so that the
class labels are incorporated into trajectory
clustering
If an ε-neighborhood contains trajectory partitions
mostly of the same class, it is used for clustering;
otherwise, it is discarded immediately
36
Selection of Trajectory-Based Clusters
After trajectory-based clusters are found, highly
discriminative clusters are selected for effective
classification
If the average distance from a specific cluster to other
clusters of different classes is high, the discriminative
power of the cluster is high
e.g.)
C1
C2
Class A
Class B
C1 is more discriminative than C2
37
Overall Procedure of TraClass
1. Partition trajectories
2. Perform region-based clustering
3. Perform trajectory-based clustering
4. Select discriminative trajectory-based clusters
5. Convert each trajectory into a feature vector
Each feature is either a region-based cluster or a
trajectory-based cluster
The i-th entry of a feature vector is the frequency that
the i-th feature occurs in the trajectory
6. Feed feature vectors to the SVM
38
Classification Results
Datasets
Methods
Animal: Three classes ← three species: elk, deer, and cattle
Vessel: Two classes ← two vessels
Hurricane: Two classes ← category 2 and 3 hurricanes
TB-ONLY: Perform trajectory-based clustering only
RB-TB: Perform both types of clustering
Results
39
Extracted Features
Features:
10 Region-Based Clusters
37 Trajectory-Based Clusters
Data (Three Classes)
Accuracy = 83.3%
40
Outline
Introduction to Trajectory Data
Clustering
Classification
Outlier Detection
41
Trajectory Outlier Detection
Detect trajectory outliers that are grossly different
from or inconsistent with the remaining set of
trajectories
1. Whole Trajectory Outlier Detection
A unsupervised method
A supervised method based on classification
2. Integration with multi-dimensional information
3. Partial Trajectory Outlier Detection
The Partition-and-Detect Framework
42
Sample Trajectory Outliers
Detect outliers from person trajectories in a room
The entire data set
The outliers only
Our framework can detect the person whose behavior is only
partially unusual, whereas previous frameworks cannot
43
Partition-and-Detect Framework
(Lee et al. 08)
Existing algorithms compare trajectories as a
whole They might not be able to detect
outlying portions of trajectories
e.g.) TR3 is not detected as an outlier since its overall
behavior is similar to those of neighboring trajectories
TR5
TR
TR3 4
TR
TR1 2
An outlying sub-trajectory
The partition-and-detect framework is
proposed to detect outlying sub-trajectories
44
Usefulness of Outlying Sub-Trajectories
Example: Sudden changes in hurricane’s path
Usual trajectories
Sudden change
Since Hurricane Charley (Aug. 2004) was expected to
hit the land closer to Tampa, many residents around
Punta Gorda, Fla., were caught unprepared
45
Overall Procedure
Two phases: partitioning and detection
TR5
TR4
TR3
TR
TR1 2
(1) Partition
A set of trajectories
A set of trajectory partitions
(2) Detect
TR3
An outlier
Outlying trajectory partitions
46
Simple Trajectory Partitioning
A trajectory is partitioned at a base unit: the
smallest meaningful unit of a trajectory in a given
application
e.g.) The base unit can be every single point
A trajectory TRout
An outlying t-partition
A t-partition
Neighboring Trajectories
Pros: High detection quality in general
Cons: Poor performance due to a large number of tpartitions Propose a two-level partitioning strategy
47
Two-Level Trajectory Partitioning
Objective
Achieves much higher performance than the simple
strategy
Obtains the same result as that of the simple strategy;
i.e., does not lose the quality of the result
Basic idea
1. Partition a trajectory in coarse granularity first
2. Partition a coarse t-partition in fine granularity only
when necessary
Main benefit
Narrows the search space that needs to be inspected in
fine granularity Many portions of trajectories can be
pruned early on
48
Intuition to Two-Level Trajectory Partitioning
If the distance between coarse t-partitions is very
large (or small), the distances between their fine
t-partitions are also very large (or small)
TRi
Coarse-Granularity Partitioning
Fine-Granularity Partitioning
TRj
The lower and upper bounds for fine t-partitions
are derived in the paper
49
Outlier Detection
Once trajectories are partitioned, trajectory outliers are
detected based on both distance and density
An outlying t-partition is defined as
Not close
Close
TRi
≤ 1‒p
Li
Li is an outlying t-partition
> 1‒p
TRi
Li
Li is not an outlying t-partition
A trajectory is an outlier if it contains a sufficient amount of
outlying t-partitions
50
Incorporation of Density
The number of close trajectories is adjusted by the
density of a t-partition
T-Partitions in dense regions are favored!
Dense region Decreased, Sparse region Increased
51
Sample Detection Results
13 Outliers from Hurricane Data
52
3 Outliers from Elk Data
53
Conclusions
Trajectory mining becomes increasingly important
as the world becomes more and more mobile
We have established a new paradigm of
trajectory mining that enables us to extract
knowledge of partial trajectories
Trajectory Clustering
Trajectory Outlier Detection
Trajectory Classification
Trajectory Pattern Mining
54