Transcript s946-xia
On Computing Top-t Most
Influential Spatial Sites
Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du
Northeastern University
Boston, USA
9/2/2005
VLDB 2005, Trondheim, Norway
1
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
2
Problem Definition
Given:
Top-t most influential sites query:
a set of sites S
a set of weighted objects O
a spatial region Q
an integer t.
find t sites in Q with the largest influences.
influence of a site s = total weight of
objects that consider s as the nearest site.
9/2/2005
VLDB 2005, Trondheim, Norway
3
Motivation
Which supermarket in Boston is the most
influential among residential buildings?
Sites: supermarkets;
Objects: residential buildings;
Weight: # people in a building;
Query region: Boston;
Which wireless station in Boston is the
most influential among mobile users?
9/2/2005
VLDB 2005, Trondheim, Norway
4
Example
o2
o1
s1
s2
o4
s3
o5
s4
o3
o6
Suppose all objects have weight = 1, Q is
the whole space, and t = 1.
The most influential site is s1, with
influence = 3.
9/2/2005
VLDB 2005, Trondheim, Norway
5
Example
o2
o1
s1
s2
o4
s3
o5
s4
o3
o6
Now that Q is the shadowed rectangle and
t = 2.
Top-2 most influential sites: s4 and s2.
9/2/2005
VLDB 2005, Trondheim, Norway
6
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
7
Related Work
Bi-chromatic RNN query: considers two
datasets, sites and objects.
The RNNs of a site s S are the objects
that consider s as the nearest site.
o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
8
Related Work
Solutions to the RNN query based on precomputation [KM00, YL01].
o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
9
Related Work
Solution to RNN query based on Voronoi
diagram [SRAE01].
9/2/2005
Compute the Voronoi cell of s: a region
enclosing the locations closer to s than to any
other sites.
Querying the object R-tree using the Voronoi
cell.
VLDB 2005, Trondheim, Norway
10
Related Work [SRAE01]
o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
11
Our Problem vs. RNN Query
RNN query:
A single site as an input.
Interested in the actual set of the RNNs.
Top-t most influential sites query:
9/2/2005
A spatial region as an input.
Interested in the aggregate weight of RNNs.
VLDB 2005, Trondheim, Norway
12
Straightforward Solution 1
For each site, pre-compute its influence.
At query time, find the sites in Q and
return the t sites with max influences.
Drawback 1: Costly maintenance upon
updates.
Drawback 2: binding a set of sites closely
with a set of objects.
9/2/2005
VLDB 2005, Trondheim, Norway
13
Straightforward Solution 2
An extension of the Voronoi diagram
based solution to the RNN query.
1.
2.
3.
9/2/2005
Find all sites in Q.
For each such site, find its RNNs by using the
Voronoi cell, and compute its influence.
Return the t sites with max influences.
VLDB 2005, Trondheim, Norway
14
Straightforward Solution 2
Drawback 1: All sites in Q need to be
retrieved from the leaf nodes.
Drawback 2: The object R-tree and the
site R-tree are browsed multiple times.
9/2/2005
For each site in Q, browse the site R-tree to
compute the Voronoi Cell.
For each such Voronoi Cell, browse the object
R-tree to compute the influence.
VLDB 2005, Trondheim, Norway
15
Features of Our Solution
Systematically browse both trees once.
Pruning techniques are provided based on
a new metric, minExistDNN.
No need to compute the influences for all
sites in Q, or even to locate all sites in Q.
9/2/2005
VLDB 2005, Trondheim, Norway
16
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
17
Motivation
Intuitively, if some object in Oi may consider
some site in Sj as an NN, Oi affects Sj.
To estimate the influences of all sites in a site
MBR Sj, we need to know whether an object MBR
Oi will affect Sj.
O1
O2
S1
S2
O1 only affects S1, while O2 affects both S1 and S2.
9/2/2005
VLDB 2005, Trondheim, Norway
18
maxDist – A Loose Estimation
If maxDist(O1, S1) < minDist(O1, S2), O1
does not affect S2.
Why not good enough?
minDist(O1,S2)=8
S2
O1
S1
9/2/2005
maxDist(O1,S1)=10
VLDB 2005, Trondheim, Norway
19
minMaxDist – A Tight Estimation?
minDist(o1,S2) = 6
S1
o1
S2
minMaxDist(o1, S1) = 5
An object o does not affect S2, if there
exists S1 such that
minMaxDist(o1, S1) < minDist(o1, S2)
9/2/2005
VLDB 2005, Trondheim, Norway
20
minMaxDist – A Tight Estimation?
minDist(O1,S2) = 6
s1
S1
S2
O1
7
6
o1
s2
minMaxDist(O1, S1) = 5
Not true for an object MBR O1.
9/2/2005
VLDB 2005, Trondheim, Norway
21
A Tight Estimation?
A metric m(O1, S1) should:
1)
2)
9/2/2005
guarantee that, each location in O1 is within
m(O1, S1) of a site in S1,
and be the smallest distance with this
property.
VLDB 2005, Trondheim, Norway
22
New Metric – minExistDNNS1(O1)
Definition: minExistDNNS1(O1) =
max {minMaxDist(l, S1) | location l O1}
O1 does not affect S2, if there exists S1, s.t.
minExistDNNS1(O1) < minDist(O1, S2).
9/2/2005
VLDB 2005, Trondheim, Norway
23
Examples of minExistDNNS1(O1)
O1
O1
S1
S1
How to calculate it?
9/2/2005
VLDB 2005, Trondheim, Norway
24
Calculating minExistDNNS1(O1)
Step 1: Space partitioning
P1:b
P2:c P3:a
a
P4:d
c
S1
b
P8:a
9/2/2005
d
P7:d P6:b
P5:c
Every location l in
the same partition is
associated with the
second closest
corner of S1 – the
distance is
minMaxDist(l, S1)!
VLDB 2005, Trondheim, Norway
25
Space Partitioning
O1 is divided into multiple sub-regions,
one in each partition.
P1:b
P2:c
O1
a
c
S1
b
9/2/2005
d
VLDB 2005, Trondheim, Norway
26
Calculating minExistDNNS1(O1)
Step 2: Choose up-to 8 locations on O1’ border
and compute the minMaxDist’s to S1.
minExistDNN is the largest one!
P1:b
P2:c
O1
minExistDNNS1(O1)
a
c
S1
b
9/2/2005
VLDB 2005, Trondheim, Norway
d
27
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
28
Data Structure
Two R-trees: S of sites, O of objects.
Three queues:
9/2/2005
queueSIN: entries of S inside Q.
queueSOUT: entries of S outside Q.
queueO: entries of O.
VLDB 2005, Trondheim, Norway
29
Data Structure
S3
O2
O1
S1
Q
S4
S2
O3
O4
queueSIN: S1 S2
queueO: O1
queueSOUT: S3
9/2/2005
VLDB 2005, Trondheim, Norway
30
maxInfluence and minInfluence
For each entry Sj in queueSIN,
maxInfluence: total weight of entries in queueO
that affect Sj.
minInfluence: total weight of entries in queueO
that ONLY affect Sj, divided by the number of
objects in Sj.
queueSIN is sorted in decreasing order of
maxInfluence.
9/2/2005
VLDB 2005, Trondheim, Norway
31
Algorithm Overview
Expand an entry from one of the three
queues.
Remove the entry from the queue.
Retrieve the referenced node, and insert the
(unpruned) entries into the same queue.
Update maxInfluence and minInfluence if
necessary.
If top-t entries in queueSIN are sites, with
minInfluences ≥ maxInfluences of all
remaining entries, return.
9/2/2005
VLDB 2005, Trondheim, Norway
32
Example
S3
O5
S8
S9
O6
O1
S5
S1
Q
S6
S7
queueSIN: S1
queueO: O1
queueSOUT: S3
queueSIN: S5, S7
queueO: O6
queueSOUT: S9
S6 is not affected by O1, prune S6.
O5 does not affect S5 and S7, prune O5.
9/2/2005
VLDB 2005, Trondheim, Norway
33
A Pruning Case
minExistDNNS3(O1)=4 minDist(S2, O1)=5
Expand S1
S1
S4
S3
S2
O1
minExistDNNS1(O1)=7
S2 is pruned because of minExistDNNS3(O1)
< minDist(S2, O1)
9/2/2005
VLDB 2005, Trondheim, Norway
34
Choosing an Entry to Expand
Expand top entries in queueSIN.
Expand the most important Oi.
Importance: |Oi| * #affected entries * area(Oi)
Expand Sj that contains the most
important Oi.
9/2/2005
VLDB 2005, Trondheim, Norway
35
Choosing an Entry to Expand
Estimate the probability of pruning Oi using
some Sj in queueSOUT.
Q
Q
S1
minDist(S1, O1)=5
minExistDNNS2(O1)=6
S1
O1
minDist(S1, O1)=5
O1
S2
minExistDNNS2(O1)=6
S’2
After expanding S2, O1 is likely not to affect S1.
9/2/2005
VLDB 2005, Trondheim, Norway
36
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
37
Experimental Setup
Data sets:
24,493 populated places in North America
9,203 cultural landmarks in North America
R-tree page size: 1 KB
LRU buffer: 128 disk pages.
t = 4.
Comparing to the solution using Voronoi
diagram.
9/2/2005
VLDB 2005, Trondheim, Norway
38
Selected Experimental Results
#sites : #objects
= 1 : 2.5
9/2/2005
VLDB 2005, Trondheim, Norway
39
Selected Experimental Results
#sites : #objects
= 2.5 : 1
9/2/2005
VLDB 2005, Trondheim, Norway
40
Outline
Problem Definition
Related Work
The New Metric: minExistDNN
Data Structures and Algorithm
Experimental Results
Conclusions
9/2/2005
VLDB 2005, Trondheim, Norway
41
Conclusions
We addressed a new problem: Top-t most
influential sites query.
We proposed a new metric: minExistDNN.
It can be used to prune search space in
NN/RNN related problems.
We carefully designed an algorithm which
systematically browses both R-trees once.
Experiments showed more than an order
of magnitude improvement.
9/2/2005
VLDB 2005, Trondheim, Norway
42
Thank you!
Q&A
9/2/2005
VLDB 2005, Trondheim, Norway
43