A Data Mining Approach for Location Prediction in Mobile
Download
Report
Transcript A Data Mining Approach for Location Prediction in Mobile
A DATA MINING APPROACH FOR
LOCATION PREDICTION IN
MOBILE ENVIRONMENTS*
by
Gökhan Yavaş
Feb 22, 2005
*: To appear in Data and Knowledge Engineering, Elsevier
1
Outline
Introduction
Background Work
Mobility Prediction Based On Mobility Rules
Experimental Results
Conclusion
Future Work
2
Introduction
Personal Communication Systems are becoming more
popular
Dynamic relocation of users gives rise to the problem of
Mobility Management
Methods for storing and updating the location information
of users
Mobility Prediction: the prediction of a user’s next intercell movement
3
Motivation
Predicted movement can be used for effectively allocating resources
instead of blindly allocating excessive resources
Benefit to the broadcast program generation [1], data items can be
broadcast to the predicted cell
Location prediction is crucial in processing of location dependent
queries [2], since answer depends on the location of user
Queries depending on future positions can be answered by effective
location prediction
[1] Y. Saygin and O. Ulusoy. Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments. IEEE Transactions on
Knowledge and Data Engineering, 14(6): 1387-1399, 2002. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE
Conference on Data Engineering (ICDE’95), pages 3–14, 1995.
[2] G. Gok and O. Ulusoy. Transmission of Continuous Query Results in Mobile Computing Systems. Information Sciences, 125(1-4): 37-63, 2000
4
Network Model
PCS network partitioned into smaller areas called cells
Each cell has a Base Station (BS), used for broadcasting and
receiving information
Home Location Register (HLR): database which keeps the inter-cell
movement history of user
Visitor Location Register (VLR): each BS has a database which
keeps the profiles of the users located in this cell.
5
Problem Definition
It is possible for us to get the movement history of a mobile user
from HLR of a user
Movement trajectories in the form of T=<(id1, t1) ... (idk, tk)>
Partitioned into subsequences, named user actual paths, UAPs
UAPs have the form of U=<c1, c2, ..., cn>
We mine UAPs to find user mobility patterns, UMPs
6
Related Work
The roots of our method go back to the Apriori algorithm [3]
Association rule mining
Sequential pattern mining problem [4]
Ordering of the items in an itemset must be taken into consideration
Not appropriate for our domain, because does not take into account the
network topology
[3] R. Agrawal, R. Srikant, Fast Algorithms for mining association rules. In Proceedings of Very Large Databases Conference
(VLDB’94), pages 487-499, 1994.
[4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95),
pages 3–14, 1995.
7
Mobility Prediction Based On Mobility Rules
1.
Mining UMPs from Graph Traversals: Movement data mined for
discovering regularities (UMP) in inter-cell movements
2.
Generation of Mobility Rules: Mobility rules are extracted from
UMPs
3.
Mobility Prediction: Prediction of next inter-cell movement based
on mobility rules
8
Mining UMPs from Graph Traversals
An example coverage region and corresponding graph G
Vertices of G: the cells in the coverage region
Edges of G: if two cells, A and B, are neighbors in the coverage
region, then there are two edges in G, A B and B A
9
Mining UMPs from Graph Traversals
Subsequence definition:
Assume we have two UAPs,
A = <a1, a2, ... , an> and
B = <b1, b2, ... , bm>. B is a
subsequence of A, iff all cells
in B also exist in A while
keeping their order in B
Example: A=<c3, c4, c0, c1, c6,
c5>, then B=<c4, c5> is a
length-2 subsequence of A. In
other words, B is contained by
A
10
Mining UMPs from Graph Traversals
Every candidate has a count value that keeps the support given to
this candidate by UAPs
This is the point our work extends algorithm in [5, 6]
Method in [5, 6] increments the count value of a candidate by 1 if
this candidate is contained by a UAP
Unfair !!!
Treats in the same way
a highly corrupted candidate pattern
a slightly corrupted (or even not corrupted at all) candidate pattern
[5] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Transactions on
Knowledge and Data Engineering, 15(5): 1155-1169, 2003.
[6] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, Effective Prediction of Web User Accesses: A Data Mining Approach, In
Proceedings of the WebKDD Workshop (WebKDD’01), 2001.
11
Mining UMPs from Graph Traversals
Should consider the degree of corruption for the mobile motion
prediction context
Support assigned to a candidate pattern B by a UAP A (i.e.,
suppInc)
12
Mining UMPs from Graph Traversals
Define totDist value by means of the notion of string alignment
Definition 2.1: If x and y are each single character or space, then
(x, y) denotes the score of aligning x and y. In our case, the scoring
function is defined as follows:
13
Mining UMPs from Graph Traversals
Definition 2.3: Let A be a UAP and B be a pattern. A containment
alignment X' maps A and B into strings A‘ and B‘ where:
|A'| = |B'|
B is contained by A, and
Removal of all spaces from A' and B' leaves A and B
Total score of the alignment X':
14
Mining UMPs from Graph Traversals
For any two patterns, there may be more than one alignment
Ex: Consider A=<c3, c4, c0, c1, c6, c5, c8, c5>, B=<c4, c5>
15
Mining UMPs from Graph Traversals
Definition 2.4: An optimal containment alignment of UAP A and
pattern B is one that has the minimum possible value for these two
patterns
Total score of an alignment: sum of penalties
An optimal alignment should have the minimum number of mismatches,
which means the minimum score of alignment
totDist(A, B) = Score of the optimal alignment for the UAP A and
pattern B
16
Mining UMPs from Graph Traversals
Example: Given UAP A=<c3, c4, c0, c1, c6, c5, c8> and pattern B=<c4,
c5 , c8 > , optimal containment alignment for these:
Score of the alignment = totDist (A, B) = 3
Support assigned to the candidate pattern B by the UAP A:
17
Mining UMPs from Graph Traversals
The quality of the patterns will improve since this method is a more
accurate way of support counting
Degree of corruption taken into account
This will give rise to more accurate mobility rules
Resulting in the prediction accuracy improved compared to the
accuracy by using the rules that are generated with the former way
of support counting
Application of different methods for totDist will affect the quality of
rules
18
Mining UMPs from Graph Traversals
Candidate Generation:
Example: C = <c1, c2, ..., ck>
N+(ck) : the set of all nodes in G, which have an incoming edge from
the cell ck
A cell from N+(ck) is attached to the end of C to generate C'
Add C' to the set of Candidates
19
Mining UMPs from Graph Traversals
Apriori Pruning can be used?
NO due to the nature of our new support counting method
Support is no longer monotonically decreasing with the increasing
size of the pattern
A length-(k-1) subpattern S of a length-k pattern P doesn’t need to
be large even if P is large
Ex: UAP <1, 6, 0, 3, 2>, P1 = <1, 0, 2> and its subpattern P2 = <1, 2>
UAP assigns a support
1 to P and
1
(1 2 )
1 to P2
(1 3)
20
Mining UMPs from Graph Traversals
Example: Use suppmin= 1.33
Database of UAPs
Set of all large Patterns (UMPs)
21
Generation of Mobility Rules
Extract rules from the UMPs
For a rule:
R: < c1, c2, …, ci-1 > < ci, ci+1, ... ck >
Head
Tail
A confidence value is calculated:
22
Generation of Mobility Rules
The rules which have confidence higher than confmin are selected
All possible mobility rules for the UMPs given in previous example
are:
23
Mobility Prediction
User has followed a path
P=< c1, c2, …, ci-1 > up to now
Find the rules whose head parts are contained in P and the last cell
in their head is ci-1
Store the first cell of tail along with the (confidence + support) of rule
as a tuple
Sort these tuples w.r.t. the (confidence + support) values in
descending order
Select the first m tuples
24
Mobility Prediction
Example: Assume that the current trajectory of the user is P=<2, 3,
0, 4>
Matching Rules:
<4> <0>
<4> < 5>
<3, 4> <0>
< 3, 4 > <5>
Sorted tuple array is: TupleArray = [(5, 85.83), (0, 76.5)]
If m=1, then Predicted Cells Set = {5}
If m=2, then Predicted Cells Set = {5, 0}
25
Simulation Design
Mobile users travel on a 15 by 15 hexagonal shaped network
To generate UAPs, first UMPs are generated
UMPs are taken as a random walk over the network
Two types of UAPs:
Outliers: a random walk over the network
Non-outliers: those which follow a UMP
o (outlier percentage): ratio of the number of outliers to the number
of non-outliers
26
Simulation Design
Corruption mechanism: insert random cells between the consecutive
cells of an UMP
c (corruption ratio): denotes the ratio of the number of such random
cells to the number of cells in the corresponding UMP
Three possible outcomes of a prediction
Correct prediction
Incorrect prediction
No prediction
Two performance measures:
27
Algorithms Used for Comparison
Mobility Prediction Based on Transition Matrix (TM)
A cell-to-cell transition matrix formed
Select the m most probable cells from the transition matrix
Ignorant Prediction
Randomly select the m neighboring cells of the current cell
28
Impact of m on Precision and Recall
Decreasing precision for both
0.7
0.6
our algorithm and TM
Precision
Increasing probability of
0.5
UMP-Based
Ignorant
TM
0.4
0.3
making some incorrect
0.2
0.1
predictions as m increases
1
2
3
4
5
6
m
1
Increasing recall for all
0.9
0.8
algorithms, but more significant
Recall
increase for TM and Ignorant
0.7
UMP-Based
Ignorant
TM
0.6
0.5
0.4
prediction
0.3
0.2
1
2
3
4
5
6
m
29
Impact of m on Precision and Recall
0.7
Setting m as small as possible
0.6
is convenient for our method
Precision
The increase rate in the recall
0.5
0.3
value from m values 1 to 2 is
0.2
0.1
maximum for TM
1
2
3
4
5
6
m
m ≥ 3 would cause excessive
1
0.9
0.8
network resource waste
0.7
Recall
Thus choose m = 2
UMP-Based
Ignorant
TM
0.4
UMP-Based
Ignorant
TM
0.6
0.5
0.4
0.3
0.2
1
2
3
4
5
6
m
30
Impact of Suppmin
Reduced recall and precision
value leads to a decrease in
0.6375
0.625
0.6125
Precision
The increase in the suppmin
0.65
0.6
0.5875
0.575
the number of mined mobility
0.5625
0.55
rules
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Min Support
0.51
Number of correct predictions
0.505
0.5
0.495
is reduced
Recall
Choose suppmin=0.1
0.49
0.485
0.48
0.475
0.47
0.465
0.46
0.455
0.45
0.1
0.2
0.3
0.4
0.5
0.6
Min Support
0.7
0.8
0.9
1
31
Impact of Confmin
Increasing precision
1
Higher quality rules with the
increasing confmin
Leading to a higher decrease
rate in number of predictions
when compared to the
decrease rate in number of
correct predictions
Precision
0.9
0.8
0.7
0.6
0.5
50
60
70
80
90
100
Min Conf
Decreasing recall
Choose confmin=80
0.5
0.4
Recall
The number of mined rules is
reduced leading to a decrease
in the number of correct
predictions
0.6
0.3
0.2
0.1
0
50
60
70
80
Min Conf
90
100
32
Impact of Corruption Factor
Decreasing precision and
recall for our method and TM
1
0.9
0.8
For all c, better precision than
Precision
0.7
UMP-Based
Ignorant
TM
0.6
0.5
0.4
TM but worse recall than TM
0.3
0.2
For our method, as c
0.1
0
0.4
0.6
0.8
corruption factor
increases:
0.9
0.85
0.8
0.75
0.7
Recall
The number of mined mobility
rules decreases
No prediction in some cases
because no matching rules due to
the corrupted UAPs
0.2
0.65
UMP-Based
Ignorant
TM
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0
0.2
0.4
corruption factor
0.6
0.8
33
Impact of Outlier Percentage
0.8
Both performance measures
methods
0.6
Precision
not affected significantly for all
0.7
0.5
UMP-Based
Ignorant
TM
0.4
0.3
Rules extracted from outlier
0.2
0.1
20
30
40
50
60
outlier percentage
UAPs not used commonly,
0.8
0.7
thus not reducing recall and
Recall
precision significantly
0.6
0.5
UMP-Based
Ignorant
TM
0.4
0.3
0.2
0.1
20
30
40
outlier percentage
50
60
34
Conclusion
A data mining algorithm for the prediction of user movements in a
mobile computing system
Algorithm is based on
Mining the mobility patterns of users
Then forming mobility rules from these patterns
Finally predicting a mobile user’s next movements by using the mobility
rules
A good performance when compared to the performance of Ignorant
Method
35
Conclusion
Performance when compared to the TM
Better Precision:
More accurate predictions
Most of its predictions made at each request are correct
Worse Recall:
Our method may not make prediction in response to some of the prediction
requests
Because there may not be any matching rule for the current trajectory of the
user when a prediction request is made
36
Future Work
For calculating the totDist value, our method:
Decrease the support given to pattern by a UAP as the number of
corrupted cells increases in pattern
Other methods may be employed for calculating totDist value
No time domain of the mobility patterns and mobility rules
considered
In real life, mobility patterns might be related to time
Some specific rules valid for a specific time interval
Extend our algorithm to include the time domain of mobility rules
A candidate pruning criterion suitable for our support counting
method may be employed
37
Questions & Comments
38