Trajectory-Based Clustering

Download Report

Transcript Trajectory-Based Clustering

Lunch Talk
A Brief Overview of
Trajectory Data Mining
May 26, 2009
Jae-Gil Lee
* This work was done while I was at University of Illinois at UrbanaChampaign for the years from 2006 through 2008.
1
Outline

Introduction to Trajectory Data

Clustering

Classification

Outlier Detection
2
Trajectory Data

A trajectory is a sequence of the location and
timestamp of a moving object
Hurricanes
Vessels
Turtles
Vehicles
3
Importance of Analysis on Trajectory Data

The world becomes more and more mobile


Prevalence of mobile devices such as cell phones,
smart phones, and PDAs
Satellite, sensor, RFID, and wireless technologies
have been improved rapidly

Tremendous amounts of trajectory data of moving
objects
4
Research Impacts

Trajectory data mining has many important, realworld applications driven by the real need






Homeland security (e.g., border monitoring)
Law enforcement (e.g., video surveillance)
Weather forecast
Traffic control
Location-based service
…
5
Significance of Our Research
Previous Frameworks of Trajectory Mining
The whole trajectory is considered as an atomic unit
Our Frameworks of Trajectory Mining
A trajectory is processed after it is partitioned into
a set of line segments (trajectory partitions)
Our frameworks can take advantage of the
knowledge of partial trajectories
Open a new paradigm of trajectory mining

6
Major Categories in Data Mining
Clustering
Classification
Discover common behaviors (i.e.,
motions) from trajectory data of
moving objects
Classify moving objects based on
their trajectories and other
information
SIGMOD’07, SSTD’07
VLDB’08, Submitted to a journal
Pattern Mining
Outlier Detection
Find synchronous (i.e., chase)
patterns from trajectory data of
moving objects
Uncover suspicious behaviors
from trajectory data of moving
objects
Submitted to a journal
ICDE’08, ICDE’09
7
Example Applications

Clustering


Classification


Classify vessel types (e.g., tankers and containers) in radar
images
 Most systems use a size estimate for classifying vessels, so
small vessels are very hard to classify
Pattern Mining


Find common behaviors of animal migration from animal
movement data
Find popular sequences and travel times of tourists in sightseeing
places
Outlier Detection

Detect suspicious persons in parking garages from video
surveillance data (trajectories can be extracted from video data)
8
Movebank Project
For Animal Tracking and Photo Monitoring Data

NSF-funded project initiated in the summer of 2007




Aim: Facilitate long-term comparisons of animal movement data
making it possible to address pressing questions such as the
effects of global climate change and human-caused landscape
change
 A data repository, a live data pipeline, online mapping and
analysis tools, an educational tool, a community of
collaborators
Homepage: http://www.movebank.org/
Participants: New York State Museum, Princeton University, San
Diego Supercomputer Center, UIUC Data Mining Group
Our clustering algorithm will be open to public via web
services
9
Example: Oil Bird Study

Study animal behavior and movement with new toys: “Franzmitters”


848 MHz radio GPS, 3D acceleration
Optional: Hand tracking, power amplifier, solar power
10
Outline

Introduction to Trajectory Data

Clustering

Classification

Outlier Detection
15
Trajectory Clustering

Group similar trajectories into the same cluster
1. Whole Trajectory Clustering


Probabilistic Clustering
Density-Based Clustering: TF-OPTICS
2. Partial Trajectory Clustering

The Partition-and-Group Framework
16
Partition-and-Group Framework
(Lee et al. 07)

Existing algorithms group trajectories as a whole
 They might not be able to find similar
portions of trajectories

e.g.) The common behavior cannot be discovered
since TR1~TR5 move to totally different directions
TR3
TR4
TR5
A common sub-trajectory
TR2
TR1

The partition-and-group framework is
proposed to discover common sub-trajectories
17
Usefulness of Common Sub-Trajectories

Discovering common sub-trajectories is very
useful, especially if we have regions of special
interest


Hurricane Landfall Forecasts
Meteorologists will be interested in the common
behaviors of hurricanes near the coastline or at sea
(i.e., before landing)
Effects of Roads and Traffic on Animal Movements
Zoologists will be interested in the common behaviors
of animals near the road where the traffic rate has
been varied
18
Overall Procedure

Two phases: partitioning and grouping
(1) Partition
TR3
TR4
TR5
A set of trajectories
TR2
TR1
A representative trajectory
(2) Group
A cluster
A set of line segments
Note: a representative trajectory is a common sub-trajectory
19
Partitioning Phase

Identify the points where the behavior of a
trajectory changes rapidly  characteristic points

An optimal set of characteristic points is found by using
the minimum description length (MDL) principle
pc
pc
3
p6
p7
p8
p5
pc
p1
p4
2
p3
: characteristic point

4
p2
pc
TRi
1
: trajectory partition
Partition a trajectory at every characteristic point
20
Overview of the MDL Principle

The MDL principle has been widely used in information
theory

The MDL cost consists of two components: L(H) and
L(D|H), where H means the hypothesis, and D the data



L(H) is the length, in bits, of the description of the hypothesis
L(D|H) is the length, in bits, of the description of the data when
encoded with the help of the hypothesis
The best hypothesis H to explain D is the one that
minimizes the sum of L(H) and L(D|H)
21
MDL Formulation

Finding the optimal partitioning translates to finding the
best hypothesis using the MDL principle



H → a set of trajectory partitions, D → a trajectory
L(H) → the sum of the length of all trajectory partitions
L(D|H) → the sum of the difference between a trajectory and a set
of its trajectory partitions
p5
p3
p2
p4
p1
pc
1
pc
2
L( H )
 log 2(len ( p1 p 4))
L( D | H )  log 2(d ( p1 p 4, p1 p 2)  d  ( p1 p 4, p 2 p 3)  d  ( p1 p 4, p 3 p 4)) 
log 2(d ( p1 p 4, p1 p 2)  d ( p1 p 4, p 2 p 3)  d ( p1 p 4, p 3 p 4))

L(H) measures conciseness; L(D|H) preciseness
22
Grouping Phase (1/2)

Find the clusters of trajectory partitions using
density-based clustering (i.e., DBSCAN)

A density-connect component forms a cluster, e.g., { L1,
L2, L3, L4, L5, L6 }
MinLns = 3
L5
L3
L4
L2
L1
L6
L6
L5
L3
L1
L2
L4
23
Grouping Phase (2/2)

Describe the overall movement of the trajectory
partitions that belong to the cluster
A red line: a representative trajectory,
A blue line: an average direction vector,
Pink lines: line segments in a density-connected set
24
Sample Clustering Results
7 Clusters from Hurricane Data
570 Hurricanes (1950~2004)
A red line: a representative trajectory
25
2 Clusters from Deer Data
26
Outline

Introduction to Trajectory Data

Clustering

Classification

Outlier Detection
27
Trajectory Classification

Predict the class labels of moving objects based
on their trajectories and other features
1. Machine learning techniques


Studied mostly in pattern recognition, bioengineering,
and video surveillance
The hidden Markov model (HMM)
2. TraClass: Trajectory classification using
hierarchical region-based and trajectory-based
clustering
28
Classification

Training data


Each data element has its class label
e.g., <deer, a sequence of points>
<elk, a sequence of points>

Extract features from the training data and build a
classification model

Predict a class label of a new data element

e.g., <?, a sequence of points> → class label = deer
29
Common Characteristics of Previous Methods

Use the shapes of whole trajectories to do
classification




Encode a whole trajectory into a feature vector;
Convert a whole trajectory into a string or a sequence
of the momentary speed; or
Model a whole trajectory using the HMM
Note: Although a few methods segment
trajectories, the main purpose is to approximate
or smooth trajectories before using the HMM
30
TraClass: Trajectory Classification
Based on Clustering (Lee et al. 08)

Motivation



Discriminative features are likely to appear at parts of
trajectories, not at whole trajectories
Discriminative features appear not only as common
movement patterns, but also as regions
Solution

Extract features in a top-down fashion, first by regionbased clustering and then by trajectory-based
clustering
31
Intuition and Working Example
Container Port
Refinery
Port A
Port B
Fishery
Container Ships


Tankers
Fishing Boats
Parts of trajectories near the container port and near the refinery
enable us to distinguish between container ships and tankers
even if they share common long paths
Those in the fishery enable us to recognize fishing boats even if
they have no common path there
32
Trajectory Partitions
Region-Based Clustering
Region-Based Clustering
Trajectory-Based Clustering
Features
Trajectory-Based Clustering
Class-Conscious Trajectory Partitioning
1. Trajectories are partitioned based on their shapes as in
the partition-and-group framework
2. Trajectory partitions are further partitioned by the class
labels

The real interest here is to guarantee that trajectory partitions do
not span the class boundaries
Non-discriminative
Discriminative
Class A
Class B
Additional partitioning points
34
Region-Based Clustering

Objective: Discover regions that have trajectories mostly
of one class regardless of their movement patterns

Algorithm: Find a better partitioning alternately for the X
and Y axes as long as the MDL cost decreases

The MDL cost is formulated to achieve both homogeneity and
conciseness
35
Trajectory-Based Clustering

Objective: Discover sub-trajectories that indicate
common movement patterns of each class

Algorithm: Extend the partition-and-group
framework for classification purposes so that the
class labels are incorporated into trajectory
clustering

If an ε-neighborhood contains trajectory partitions
mostly of the same class, it is used for clustering;
otherwise, it is discarded immediately
36
Selection of Trajectory-Based Clusters

After trajectory-based clusters are found, highly
discriminative clusters are selected for effective
classification


If the average distance from a specific cluster to other
clusters of different classes is high, the discriminative
power of the cluster is high
e.g.)
C1
C2
Class A
Class B
C1 is more discriminative than C2
37
Overall Procedure of TraClass
1. Partition trajectories
2. Perform region-based clustering
3. Perform trajectory-based clustering
4. Select discriminative trajectory-based clusters
5. Convert each trajectory into a feature vector


Each feature is either a region-based cluster or a
trajectory-based cluster
The i-th entry of a feature vector is the frequency that
the i-th feature occurs in the trajectory
6. Feed feature vectors to the SVM
38
Classification Results

Datasets




Methods



Animal: Three classes ← three species: elk, deer, and cattle
Vessel: Two classes ← two vessels
Hurricane: Two classes ← category 2 and 3 hurricanes
TB-ONLY: Perform trajectory-based clustering only
RB-TB: Perform both types of clustering
Results
39
Extracted Features
Features:
10 Region-Based Clusters
37 Trajectory-Based Clusters
Data (Three Classes)
Accuracy = 83.3%
40
Outline

Introduction to Trajectory Data

Clustering

Classification

Outlier Detection
41
Trajectory Outlier Detection

Detect trajectory outliers that are grossly different
from or inconsistent with the remaining set of
trajectories
1. Whole Trajectory Outlier Detection


A unsupervised method
A supervised method based on classification
2. Integration with multi-dimensional information
3. Partial Trajectory Outlier Detection

The Partition-and-Detect Framework
42
Sample Trajectory Outliers

Detect outliers from person trajectories in a room
The entire data set
The outliers only
Our framework can detect the person whose behavior is only
partially unusual, whereas previous frameworks cannot
43
Partition-and-Detect Framework
(Lee et al. 08)

Existing algorithms compare trajectories as a
whole  They might not be able to detect
outlying portions of trajectories

e.g.) TR3 is not detected as an outlier since its overall
behavior is similar to those of neighboring trajectories
TR5
TR
TR3 4
TR
TR1 2

An outlying sub-trajectory
The partition-and-detect framework is
proposed to detect outlying sub-trajectories
44
Usefulness of Outlying Sub-Trajectories

Example: Sudden changes in hurricane’s path
Usual trajectories
Sudden change

Since Hurricane Charley (Aug. 2004) was expected to
hit the land closer to Tampa, many residents around
Punta Gorda, Fla., were caught unprepared
45
Overall Procedure

Two phases: partitioning and detection
TR5
TR4
TR3
TR
TR1 2
(1) Partition
A set of trajectories
A set of trajectory partitions
(2) Detect
TR3
An outlier
Outlying trajectory partitions
46
Simple Trajectory Partitioning

A trajectory is partitioned at a base unit: the
smallest meaningful unit of a trajectory in a given
application

e.g.) The base unit can be every single point
A trajectory TRout
An outlying t-partition
A t-partition
Neighboring Trajectories
Pros: High detection quality in general
Cons: Poor performance due to a large number of tpartitions  Propose a two-level partitioning strategy
47
Two-Level Trajectory Partitioning

Objective



Achieves much higher performance than the simple
strategy
Obtains the same result as that of the simple strategy;
i.e., does not lose the quality of the result
Basic idea
1. Partition a trajectory in coarse granularity first
2. Partition a coarse t-partition in fine granularity only
when necessary

Main benefit

Narrows the search space that needs to be inspected in
fine granularity  Many portions of trajectories can be
pruned early on
48
Intuition to Two-Level Trajectory Partitioning

If the distance between coarse t-partitions is very
large (or small), the distances between their fine
t-partitions are also very large (or small)
TRi
Coarse-Granularity Partitioning
Fine-Granularity Partitioning
TRj

The lower and upper bounds for fine t-partitions
are derived in the paper
49
Outlier Detection

Once trajectories are partitioned, trajectory outliers are
detected based on both distance and density

An outlying t-partition is defined as
Not close
Close
TRi
≤ 1‒p
Li
Li is an outlying t-partition

> 1‒p
TRi
Li
Li is not an outlying t-partition
A trajectory is an outlier if it contains a sufficient amount of
outlying t-partitions
50
Incorporation of Density

The number of close trajectories is adjusted by the
density of a t-partition
T-Partitions in dense regions are favored!

Dense region  Decreased, Sparse region  Increased
51
Sample Detection Results
13 Outliers from Hurricane Data
52
3 Outliers from Elk Data
53
Conclusions

Trajectory mining becomes increasingly important
as the world becomes more and more mobile

We have established a new paradigm of
trajectory mining that enables us to extract
knowledge of partial trajectories




Trajectory Clustering
Trajectory Outlier Detection
Trajectory Classification
Trajectory Pattern Mining
54