TrajectoryPatternMining - Georgia Institute of Technology

Download Report

Transcript TrajectoryPatternMining - Georgia Institute of Technology

Trajectory Pattern Mining
Fosca Giannotti
Dino Pedreschi
Mirco Nanni
Fabio Pinelli
Chris Andrews
Georgia Institute of Technology
B.S. Computer Science
5th Year Undergraduate
Concepts
 Analyze trajectory of moving objects
A
3mins
B
5mins
C
10mins
D
 Trajectory Patterns – description of frequent behavior relating
to space and time
 Frequent Sequence Pattern (FSP)
 Determine if trajectory sequence matches any trajectory patterns
in a given set
 Study different methods of preparing a Temporally Annotated
Sequence (TAS) for data mining
Trajectory Patterns (T-Patterns)
 Trajectory Pattern
 sequence of time-stamped locations
 S = { ( x0, y0, t0 ) , … , ( xn, yn, tn ) }
 Temporal Annotation
 set of times relating to trajectories
 A = { a1 , a2, … an }
 Temporally Annotated Sequence
 (S,A) = (x0,y0) a1 (x1,y1) a2 … an (xn,yn)
Neighborhood Function
 Neighborhood Function N : R2 -> P (R2)
 Calculates spatial containment of regions
 Input point to find enclosing Region of Interest
 Defines the necessary proximity to fall into a region
 Parameters:

e – radius or necessary proximity of points
Regions of Interest (RoI)
 Performing these comparisons on points is costly
 A simple preprocessing step can alleviate this
 Utilize the Neighborhood Function NR()
 Translate each set of points into regions
 Timestamp is selected from when the trajectory first entered
the region
 Now compare sequence of regions and timestamps using the
TAS mining algorithm presented in [2].
Static RoI
 Neighborhood Function NR()
 Initially receives set of R disjoint spatial regions
 R regions are predefined based on prior knowledge
 Each represents relevant place for processing
 Static NR() simplifies problem of mining patterns
 Sequence of points become grouped
 Result: sequence of regions
 (x,y) a1 (x’,y’) becomes X a1 Y
Dynamic RoI
 Data sets often do not possess predetermined regions
 Instead need to formulate regions based on criteria of
density of the trajectories
 Preprocessing now must determine set R of popular
regions from the data set
 R is now the set of Region of Interests from used by the
Neighborhood Function NR() to translate points into
Regions of Interest
Popular Regions
Grid G of n x m cells
Each cell with density G(i,j)
Density Threshold d
Set R of popular regions
 Each region in R forms rectangular region
 Sets in R are pair wise distinct
 Dense cells always contained in some region in R
 All regions in R have average density above d
 All regions in R cannot expand without their average
density decreasing below d
Grid Density Preparation
 Split space into n x m grid with small cells
 Increment cells where trajectory passes
 Neighborhood Function NR() determines which surrounding cells
 Regression - increment continuously along trajectory
Popular Regions Algorithm
 Algorithm:
PopularRegions( G, d )
 Complexity: O ( |G| log |G| )
 Iteratively consider each dense cell
 For each:
 Expands in all four directions
 Select expansion that maximizes density
 Repeat until expansion would decrease below density
threshold
Results
Evaluating the T-Patterns
 Compute density of each cell of grid
 Compute set of RoI’s by determining Popular Regions
 Translate the input trajectories into sequence of RoI’s
and timestamps for the transitions
 Input the trajectories and times into TAS mining
algorithm[2]
Experiments
 GPS Data
 Fleet of 273 trucks in Athens, Greece
 112,203 total points recorded
 Running both static & dynamic pattern algorithms
 Various parameter settings
 Performance Analysis
 Synthetic Data by CENTRE synthesizer
 50% random & 50% predetermined
Pattern Mining Results
Static found:
Dynamic found:
A t1 B t2
A t1 B’ t2
B
B’’
Execution Time Results
• Increase linearly with increasing
number of input trajectories (both
algorithms)
• Grow when density threshold
decreases
• Static performs better with extreme
threshold
• Static does not perform with middle
threshold
Additional Results
 Increasing radius of spatial neighborhood obtains irregular
performance and large values lead to poor execution times
 Changing time tolerance (t) obtains results similar to
TAS’s
 Increasing the number of points in each trajectory causes
linear growth of execution times
Works Cited
 [1] Trajectory pattern mining, Fosca Giannotti, Mirco
Nanni, Fabio Pinelli, Dino Pedreschi, Proceedings of the
13th ACM SIGKDD international conference on
Knowledge discovery and data mining KDD. ACM,
2007.
 [2] Efficient Mining of Sequences with Temporal
Annotations. F. Giannotti, M. Nanni, and D. Pedreschi. In
Proc. SIAM Conference on Data Mining, pages 346–357.
SIAM, 2006.