Predicting the Location and Time of Mobile Phone Users by Using

Download Report

Transcript Predicting the Location and Time of Mobile Phone Users by Using

Predicting the Location and Time
of Mobile Phone Users
by Using Sequential Pattern Mining
Techniques
The Computer Journal 2015
Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu,
Pinar Karagoz, Hasan Davulcu
CONTENTS





Introduction
Data & Problem Definition
Proposed Methods
Evaluation & Experimental Results
Conclusion & Discussion
INTRODUCTION



Location Prediction
Sequential Pattern Mining
Motivation
Motivation

Mobile phone operator companies are eager to know
the location flow of their users



to build more reasonable advertisement strategies.
to build more reasonable base station installation plans.
can be used by city administrators to determine mass
people movement patterns around the city.
PROBLEM DEFINITIONS

Three Sub-Problem Definitions of Broader Location
Prediction Problem



Next Location and Time Prediction Using SpatioTemporal Data
Next Location Change Prediction Using Spatial Data
Next Location Change and Time Prediction Using
Spatio-Temporal Data
Problem Definition

Next Location and Time Prediction Using SpatioTemporal Data
• to predict the location and the time of the next
action in the next time interval of the user
• divide a day into time intevals
• cluster base stations according to their locations
into regions
Training Data Definition(Call Detail Data)





Have 11 attributes
base station id#1, phone number#1, city plate#1,
base station id#2, phone number#2, city plate#2,
call time, cdr type, url, duration, call date.
The real data is obtained from one of the largest
mobile
phone operators in Turkey.
Training Data




The data corresponds to an area roughly 25,000 km2
with a population around 5 million.
Almost 70% of the population is concentrated in a large
urban area of approximately 1/3 of the region.
The data contains roughly 1 million users' log records
for a period of 1 month.
The whole area contains 13281 base stations.
Method 1 - Next Location and Time
Prediction Using Spatio-Temporal Data




Preprocessing
Extracting Regions
Extracting Frequent Patterns
Prediction
Method 1 - Preprocessing



This paper filters unnecessary attributes.
Daily call data records of each user are merged into
one row in a temporal order.
Daily sequences structured as <base station id, time of
the day> pairs are created.
Method 1 - Preprocessing
Method 1 – Extracting Regions



Under high number of base stations, it is not practical
to consider each as the center of movement and predict
accordingly.
The paper clustered 13281 base stations into 100
regions by using K-Means algorithm.
Base station ids in the preprocessed data are replaced
with the corresponding region ids in the daily
sequences.
Method 1 – Extracting Regions
Extracted Regions
Method 1 – Extracting Frequent Patterns

Work with four parameters;
• preprocessed training data
• pattern length (the length of the desired frequent
pattern)
• minimum support (the minimum ratio of the
pattern to occur in order to be identified as
frequent)
• time interval length (is used to discretize the time
of the day, defines the length of each interval)
Method 1 – Extracting Frequent Patterns


The method is very similar to AprioriAll algorithm.
Frequent pattern generation.
• The paper traverses the data to extract all
candidate desired length patterns.
• The ones that fall below the minimum support
threshold are eliminated.
Method 1 – Sample Frequent Patterns

Three sample frequent patterns with the length 4 are
presented below.
Method 1 - Prediction



Test sequence is length of (k-1) and we want to predict
kth element.
Then this (k-1) length pattern is searched in frequent
pattern set.
If pattern starting with test sequence have been found,
the last element of the matching pattern with the
maximum support is generated as prediction.
Method 1 - Prediction
Method 1 – Prediction – Time Tolerance




Difficult to find exact matches between the current user
navigation sequence and existing frequent sequences.
Base station id and time interval pairs can be moved
forward and backward in time with tolerance value.
Test instance: <(91,1015),(95,1230),(45,1630)>
Frequent pattern set:
{...,<(91,1000),(95,1245),(45,1630),(52,1700)>,...}
Time tolerance value: 15 minutes
Prediction: (52,1700)
EVALUATION & RESULTS


This paper validated the results with real data
obtained from one of the largest mobile phone
operators in Turkey.
Results are very encouraging, and we have obtained
very high accuracy results in predicting the next
location change and time of users.
Evaluation Metrics
This paper introduced 2 metrics to evaluate our
methods;

• g-accuracy:
g-accuracy =
• p-accuracy:
p-accuracy=

The reason for using two different accuracy calculation
is due to the fact that maybe there is no matching
frequent pattern found for the queried instance.
Results of Method 1
This paper analyzes the effect of length of the frequent
patterns and support threshold using the following
parameter values.

• Pattern Length is 6
• Minimum Support is 1.00E-6
• Cluster Count is 100
• Time Interval Length is 15 min
• Time Tolerance is 75 min
Results – Pattern Length


When the pattern length increases, predicting gaccuracy decreases.
This is due to the fact that the number of longer
frequent patterns is much fewer than the number of
shorter frequent patterns.
Results – Minimum Support
Results – Minimum Support


When minimum support threshold value increases,
prediction g-accuracy drops.
The reason for this result is that as minimum support
threshold increases the number of generated frequent
pattern decreases.
CONCLUSION & DISCUSSION


This work shows that determining the potential
change of location of mobile phone users through
sequential pattern mining techniques is possible
with quite high accuracy.
This paper elaborated the effect of several factors
such as pattern length tolerance and multi
prediction limit and further improved the
prediction performance.
Thank you !