A Data Mining Approach for Location Prediction in Mobile

Download Report

Transcript A Data Mining Approach for Location Prediction in Mobile

A DATA MINING APPROACH FOR
LOCATION PREDICTION IN
MOBILE ENVIRONMENTS*
by
Gökhan Yavaş
Feb 22, 2005
*: To appear in Data and Knowledge Engineering, Elsevier
1
Outline
 Introduction
 Background Work
 Mobility Prediction Based On Mobility Rules
 Experimental Results
 Conclusion
 Future Work
2
Introduction
 Personal Communication Systems are becoming more
popular
 Dynamic relocation of users gives rise to the problem of
Mobility Management
 Methods for storing and updating the location information
of users
 Mobility Prediction: the prediction of a user’s next intercell movement
3
Motivation
 Predicted movement can be used for effectively allocating resources
instead of blindly allocating excessive resources
 Benefit to the broadcast program generation [1], data items can be
broadcast to the predicted cell
 Location prediction is crucial in processing of location dependent
queries [2], since answer depends on the location of user
 Queries depending on future positions can be answered by effective
location prediction
[1] Y. Saygin and O. Ulusoy. Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments. IEEE Transactions on
Knowledge and Data Engineering, 14(6): 1387-1399, 2002. [2] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE
Conference on Data Engineering (ICDE’95), pages 3–14, 1995.
[2] G. Gok and O. Ulusoy. Transmission of Continuous Query Results in Mobile Computing Systems. Information Sciences, 125(1-4): 37-63, 2000
4
Network Model
 PCS network partitioned into smaller areas called cells
 Each cell has a Base Station (BS), used for broadcasting and
receiving information
 Home Location Register (HLR): database which keeps the inter-cell
movement history of user
 Visitor Location Register (VLR): each BS has a database which
keeps the profiles of the users located in this cell.
5
Problem Definition
 It is possible for us to get the movement history of a mobile user
from HLR of a user
 Movement trajectories in the form of T=<(id1, t1) ... (idk, tk)>
 Partitioned into subsequences, named user actual paths, UAPs
 UAPs have the form of U=<c1, c2, ..., cn>
 We mine UAPs to find user mobility patterns, UMPs
6
Related Work
 The roots of our method go back to the Apriori algorithm [3]
 Association rule mining
 Sequential pattern mining problem [4]
 Ordering of the items in an itemset must be taken into consideration
 Not appropriate for our domain, because does not take into account the
network topology
[3] R. Agrawal, R. Srikant, Fast Algorithms for mining association rules. In Proceedings of Very Large Databases Conference
(VLDB’94), pages 487-499, 1994.
[4] R. Agrawal and R. Srikant. Mining sequential patterns. In Proceedings of the IEEE Conference on Data Engineering (ICDE’95),
pages 3–14, 1995.
7
Mobility Prediction Based On Mobility Rules
1.
Mining UMPs from Graph Traversals: Movement data mined for
discovering regularities (UMP) in inter-cell movements
2.
Generation of Mobility Rules: Mobility rules are extracted from
UMPs
3.
Mobility Prediction: Prediction of next inter-cell movement based
on mobility rules
8
Mining UMPs from Graph Traversals
An example coverage region and corresponding graph G
 Vertices of G: the cells in the coverage region
 Edges of G: if two cells, A and B, are neighbors in the coverage
region, then there are two edges in G, A  B and B  A
9
Mining UMPs from Graph Traversals
 Subsequence definition:
Assume we have two UAPs,
A = <a1, a2, ... , an> and
B = <b1, b2, ... , bm>. B is a
subsequence of A, iff all cells
in B also exist in A while
keeping their order in B
 Example: A=<c3, c4, c0, c1, c6,
c5>, then B=<c4, c5> is a
length-2 subsequence of A. In
other words, B is contained by
A
10
Mining UMPs from Graph Traversals
 Every candidate has a count value that keeps the support given to
this candidate by UAPs
 This is the point our work extends algorithm in [5, 6]
 Method in [5, 6] increments the count value of a candidate by 1 if
this candidate is contained by a UAP
 Unfair !!!
 Treats in the same way
 a highly corrupted candidate pattern
 a slightly corrupted (or even not corrupted at all) candidate pattern
[5] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, A Data Mining Algorithm for Generalized Web Prefetching, IEEE Transactions on
Knowledge and Data Engineering, 15(5): 1155-1169, 2003.
[6] A. Nanopoulos, D. Katsaros, Y. Manolopoulos, Effective Prediction of Web User Accesses: A Data Mining Approach, In
Proceedings of the WebKDD Workshop (WebKDD’01), 2001.
11
Mining UMPs from Graph Traversals
 Should consider the degree of corruption for the mobile motion
prediction context
 Support assigned to a candidate pattern B by a UAP A (i.e.,
suppInc)
12
Mining UMPs from Graph Traversals
 Define totDist value by means of the notion of string alignment
 Definition 2.1: If x and y are each single character or space, then
(x, y) denotes the score of aligning x and y. In our case, the scoring
function is defined as follows:
13
Mining UMPs from Graph Traversals
 Definition 2.3: Let A be a UAP and B be a pattern. A containment
alignment X' maps A and B into strings A‘ and B‘ where:
 |A'| = |B'|
 B is contained by A, and
 Removal of all spaces from A' and B' leaves A and B
 Total score of the alignment X':
14
Mining UMPs from Graph Traversals
 For any two patterns, there may be more than one alignment
 Ex: Consider A=<c3, c4, c0, c1, c6, c5, c8, c5>, B=<c4, c5>
15
Mining UMPs from Graph Traversals
 Definition 2.4: An optimal containment alignment of UAP A and
pattern B is one that has the minimum possible value for these two
patterns
 Total score of an alignment: sum of penalties
 An optimal alignment should have the minimum number of mismatches,
which means the minimum score of alignment
 totDist(A, B) = Score of the optimal alignment for the UAP A and
pattern B
16
Mining UMPs from Graph Traversals
 Example: Given UAP A=<c3, c4, c0, c1, c6, c5, c8> and pattern B=<c4,
c5 , c8 > , optimal containment alignment for these:
 Score of the alignment = totDist (A, B) = 3
 Support assigned to the candidate pattern B by the UAP A:
17
Mining UMPs from Graph Traversals
 The quality of the patterns will improve since this method is a more
accurate way of support counting
 Degree of corruption taken into account
 This will give rise to more accurate mobility rules
 Resulting in the prediction accuracy improved compared to the
accuracy by using the rules that are generated with the former way
of support counting
 Application of different methods for totDist will affect the quality of
rules
18
Mining UMPs from Graph Traversals
 Candidate Generation:
 Example: C = <c1, c2, ..., ck>
 N+(ck) : the set of all nodes in G, which have an incoming edge from
the cell ck
 A cell from N+(ck) is attached to the end of C to generate C'
 Add C' to the set of Candidates
19
Mining UMPs from Graph Traversals
 Apriori Pruning can be used?
 NO due to the nature of our new support counting method
 Support is no longer monotonically decreasing with the increasing
size of the pattern
 A length-(k-1) subpattern S of a length-k pattern P doesn’t need to
be large even if P is large
 Ex: UAP <1, 6, 0, 3, 2>, P1 = <1, 0, 2> and its subpattern P2 = <1, 2>
 UAP assigns a support

1 to P and
1
(1  2 )
1 to P2
(1  3)
20
Mining UMPs from Graph Traversals
Example: Use suppmin= 1.33
Database of UAPs
Set of all large Patterns (UMPs)
21
Generation of Mobility Rules
 Extract rules from the UMPs
 For a rule:
R: < c1, c2, …, ci-1 >  < ci, ci+1, ... ck >
Head
Tail
 A confidence value is calculated:
22
Generation of Mobility Rules
 The rules which have confidence higher than confmin are selected
 All possible mobility rules for the UMPs given in previous example
are:
23
Mobility Prediction
 User has followed a path
P=< c1, c2, …, ci-1 > up to now
 Find the rules whose head parts are contained in P and the last cell
in their head is ci-1
 Store the first cell of tail along with the (confidence + support) of rule
as a tuple
 Sort these tuples w.r.t. the (confidence + support) values in
descending order
 Select the first m tuples
24
Mobility Prediction
 Example: Assume that the current trajectory of the user is P=<2, 3,
0, 4>
 Matching Rules:
 <4>  <0>
 <4>  < 5>
 <3, 4>  <0>
 < 3, 4 >  <5>
 Sorted tuple array is: TupleArray = [(5, 85.83), (0, 76.5)]
 If m=1, then Predicted Cells Set = {5}
 If m=2, then Predicted Cells Set = {5, 0}
25
Simulation Design
 Mobile users travel on a 15 by 15 hexagonal shaped network
 To generate UAPs, first UMPs are generated
 UMPs are taken as a random walk over the network
 Two types of UAPs:
 Outliers: a random walk over the network
 Non-outliers: those which follow a UMP
 o (outlier percentage): ratio of the number of outliers to the number
of non-outliers
26
Simulation Design
 Corruption mechanism: insert random cells between the consecutive
cells of an UMP
 c (corruption ratio): denotes the ratio of the number of such random
cells to the number of cells in the corresponding UMP
 Three possible outcomes of a prediction
 Correct prediction
 Incorrect prediction
 No prediction
 Two performance measures:
27
Algorithms Used for Comparison
 Mobility Prediction Based on Transition Matrix (TM)
 A cell-to-cell transition matrix formed
 Select the m most probable cells from the transition matrix
 Ignorant Prediction
 Randomly select the m neighboring cells of the current cell
28
Impact of m on Precision and Recall
 Decreasing precision for both
0.7
0.6
our algorithm and TM
Precision
 Increasing probability of
0.5
UMP-Based
Ignorant
TM
0.4
0.3
making some incorrect
0.2
0.1
predictions as m increases
1
2
3
4
5
6
m
1
 Increasing recall for all
0.9
0.8
algorithms, but more significant
Recall
increase for TM and Ignorant
0.7
UMP-Based
Ignorant
TM
0.6
0.5
0.4
prediction
0.3
0.2
1
2
3
4
5
6
m
29
Impact of m on Precision and Recall
0.7
 Setting m as small as possible
0.6
is convenient for our method
Precision
 The increase rate in the recall
0.5
0.3
value from m values 1 to 2 is
0.2
0.1
maximum for TM
1
2
3
4
5
6
m
 m ≥ 3 would cause excessive
1
0.9
0.8
network resource waste
0.7
Recall
 Thus choose m = 2
UMP-Based
Ignorant
TM
0.4
UMP-Based
Ignorant
TM
0.6
0.5
0.4
0.3
0.2
1
2
3
4
5
6
m
30
Impact of Suppmin
 Reduced recall and precision
value leads to a decrease in
0.6375
0.625
0.6125
Precision
 The increase in the suppmin
0.65
0.6
0.5875
0.575
the number of mined mobility
0.5625
0.55
rules
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Min Support
0.51
 Number of correct predictions
0.505
0.5
0.495
is reduced
Recall
 Choose suppmin=0.1
0.49
0.485
0.48
0.475
0.47
0.465
0.46
0.455
0.45
0.1
0.2
0.3
0.4
0.5
0.6
Min Support
0.7
0.8
0.9
1
31
Impact of Confmin
 Increasing precision
1
 Higher quality rules with the
increasing confmin
 Leading to a higher decrease
rate in number of predictions
when compared to the
decrease rate in number of
correct predictions
Precision
0.9
0.8
0.7
0.6
0.5
50
60
70
80
90
100
Min Conf
 Decreasing recall
 Choose confmin=80
0.5
0.4
Recall
 The number of mined rules is
reduced leading to a decrease
in the number of correct
predictions
0.6
0.3
0.2
0.1
0
50
60
70
80
Min Conf
90
100
32
Impact of Corruption Factor
 Decreasing precision and
recall for our method and TM
1
0.9
0.8
 For all c, better precision than
Precision
0.7
UMP-Based
Ignorant
TM
0.6
0.5
0.4
TM but worse recall than TM
0.3
0.2
 For our method, as c
0.1
0
0.4
0.6
0.8
corruption factor
increases:
0.9
0.85
0.8
0.75
0.7
Recall
 The number of mined mobility
rules decreases
 No prediction in some cases
because no matching rules due to
the corrupted UAPs
0.2
0.65
UMP-Based
Ignorant
TM
0.6
0.55
0.5
0.45
0.4
0.35
0.3
0
0.2
0.4
corruption factor
0.6
0.8
33
Impact of Outlier Percentage
0.8
 Both performance measures
methods
0.6
Precision
not affected significantly for all
0.7
0.5
UMP-Based
Ignorant
TM
0.4
0.3
 Rules extracted from outlier
0.2
0.1
20
30
40
50
60
outlier percentage
UAPs not used commonly,
0.8
0.7
thus not reducing recall and
Recall
precision significantly
0.6
0.5
UMP-Based
Ignorant
TM
0.4
0.3
0.2
0.1
20
30
40
outlier percentage
50
60
34
Conclusion
 A data mining algorithm for the prediction of user movements in a
mobile computing system
 Algorithm is based on
 Mining the mobility patterns of users
 Then forming mobility rules from these patterns
 Finally predicting a mobile user’s next movements by using the mobility
rules
 A good performance when compared to the performance of Ignorant
Method
35
Conclusion
 Performance when compared to the TM
 Better Precision:
 More accurate predictions
 Most of its predictions made at each request are correct
 Worse Recall:
 Our method may not make prediction in response to some of the prediction
requests
 Because there may not be any matching rule for the current trajectory of the
user when a prediction request is made
36
Future Work
 For calculating the totDist value, our method:
 Decrease the support given to pattern by a UAP as the number of
corrupted cells increases in pattern
 Other methods may be employed for calculating totDist value
 No time domain of the mobility patterns and mobility rules
considered
 In real life, mobility patterns might be related to time
 Some specific rules valid for a specific time interval
 Extend our algorithm to include the time domain of mobility rules
 A candidate pruning criterion suitable for our support counting
method may be employed
37
Questions & Comments
38