Transcript s946-xia

On Computing Top-t Most
Influential Spatial Sites
Tian Xia, Donghui Zhang, Evangelos Kanoulas, Yang Du
Northeastern University
Boston, USA
9/2/2005
VLDB 2005, Trondheim, Norway
1
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
2
Problem Definition

Given:





Top-t most influential sites query:


a set of sites S
a set of weighted objects O
a spatial region Q
an integer t.
find t sites in Q with the largest influences.
influence of a site s = total weight of
objects that consider s as the nearest site.
9/2/2005
VLDB 2005, Trondheim, Norway
3
Motivation

Which supermarket in Boston is the most
influential among residential buildings?





Sites: supermarkets;
Objects: residential buildings;
Weight: # people in a building;
Query region: Boston;
Which wireless station in Boston is the
most influential among mobile users?
9/2/2005
VLDB 2005, Trondheim, Norway
4
Example
o2
o1
s1
s2
o4
s3
o5
s4
o3
o6
Suppose all objects have weight = 1, Q is
the whole space, and t = 1.
 The most influential site is s1, with
influence = 3.

9/2/2005
VLDB 2005, Trondheim, Norway
5
Example
o2
o1
s1
s2
o4
s3
o5
s4
o3
o6
Now that Q is the shadowed rectangle and
t = 2.
 Top-2 most influential sites: s4 and s2.

9/2/2005
VLDB 2005, Trondheim, Norway
6
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
7
Related Work
Bi-chromatic RNN query: considers two
datasets, sites and objects.
 The RNNs of a site s  S are the objects
that consider s as the nearest site.

o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
8
Related Work

Solutions to the RNN query based on precomputation [KM00, YL01].
o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
9
Related Work

Solution to RNN query based on Voronoi
diagram [SRAE01].


9/2/2005
Compute the Voronoi cell of s: a region
enclosing the locations closer to s than to any
other sites.
Querying the object R-tree using the Voronoi
cell.
VLDB 2005, Trondheim, Norway
10
Related Work [SRAE01]
o2
o1
s1
9/2/2005
s2
o4
s3
o5
s4
o3
VLDB 2005, Trondheim, Norway
o6
11
Our Problem vs. RNN Query

RNN query:



A single site as an input.
Interested in the actual set of the RNNs.
Top-t most influential sites query:


9/2/2005
A spatial region as an input.
Interested in the aggregate weight of RNNs.
VLDB 2005, Trondheim, Norway
12
Straightforward Solution 1
For each site, pre-compute its influence.
 At query time, find the sites in Q and
return the t sites with max influences.

Drawback 1: Costly maintenance upon
updates.
 Drawback 2: binding a set of sites closely
with a set of objects.

9/2/2005
VLDB 2005, Trondheim, Norway
13
Straightforward Solution 2

An extension of the Voronoi diagram
based solution to the RNN query.
1.
2.
3.
9/2/2005
Find all sites in Q.
For each such site, find its RNNs by using the
Voronoi cell, and compute its influence.
Return the t sites with max influences.
VLDB 2005, Trondheim, Norway
14
Straightforward Solution 2

Drawback 1: All sites in Q need to be
retrieved from the leaf nodes.

Drawback 2: The object R-tree and the
site R-tree are browsed multiple times.


9/2/2005
For each site in Q, browse the site R-tree to
compute the Voronoi Cell.
For each such Voronoi Cell, browse the object
R-tree to compute the influence.
VLDB 2005, Trondheim, Norway
15
Features of Our Solution
Systematically browse both trees once.
 Pruning techniques are provided based on
a new metric, minExistDNN.
 No need to compute the influences for all
sites in Q, or even to locate all sites in Q.

9/2/2005
VLDB 2005, Trondheim, Norway
16
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
17
Motivation


Intuitively, if some object in Oi may consider
some site in Sj as an NN, Oi affects Sj.
To estimate the influences of all sites in a site
MBR Sj, we need to know whether an object MBR
Oi will affect Sj.
O1
O2
S1
S2
O1 only affects S1, while O2 affects both S1 and S2.
9/2/2005
VLDB 2005, Trondheim, Norway
18
maxDist – A Loose Estimation
If maxDist(O1, S1) < minDist(O1, S2), O1
does not affect S2.
 Why not good enough?

minDist(O1,S2)=8
S2
O1
S1
9/2/2005
maxDist(O1,S1)=10
VLDB 2005, Trondheim, Norway
19
minMaxDist – A Tight Estimation?
minDist(o1,S2) = 6
S1
o1
S2
minMaxDist(o1, S1) = 5

An object o does not affect S2, if there
exists S1 such that
minMaxDist(o1, S1) < minDist(o1, S2)
9/2/2005
VLDB 2005, Trondheim, Norway
20
minMaxDist – A Tight Estimation?
minDist(O1,S2) = 6
s1
S1
S2
O1
7
6
o1
s2
minMaxDist(O1, S1) = 5

Not true for an object MBR O1.
9/2/2005
VLDB 2005, Trondheim, Norway
21
A Tight Estimation?

A metric m(O1, S1) should:
1)
2)
9/2/2005
guarantee that, each location in O1 is within
m(O1, S1) of a site in S1,
and be the smallest distance with this
property.
VLDB 2005, Trondheim, Norway
22
New Metric – minExistDNNS1(O1)
Definition: minExistDNNS1(O1) =
max {minMaxDist(l, S1) |  location l O1}


O1 does not affect S2, if there exists S1, s.t.
minExistDNNS1(O1) < minDist(O1, S2).
9/2/2005
VLDB 2005, Trondheim, Norway
23
Examples of minExistDNNS1(O1)
O1
O1
S1

S1
How to calculate it?
9/2/2005
VLDB 2005, Trondheim, Norway
24
Calculating minExistDNNS1(O1)

Step 1: Space partitioning
P1:b
P2:c P3:a
a
P4:d
c
S1
b
P8:a
9/2/2005
d
P7:d P6:b
P5:c
Every location l in
the same partition is
associated with the
second closest
corner of S1 – the
distance is
minMaxDist(l, S1)!
VLDB 2005, Trondheim, Norway
25
Space Partitioning

O1 is divided into multiple sub-regions,
one in each partition.
P1:b
P2:c
O1
a
c
S1
b
9/2/2005
d
VLDB 2005, Trondheim, Norway
26
Calculating minExistDNNS1(O1)


Step 2: Choose up-to 8 locations on O1’ border
and compute the minMaxDist’s to S1.
minExistDNN is the largest one!
P1:b
P2:c
O1
minExistDNNS1(O1)
a
c
S1
b
9/2/2005
VLDB 2005, Trondheim, Norway
d
27
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
28
Data Structure
Two R-trees: S of sites, O of objects.
 Three queues:




9/2/2005
queueSIN: entries of S inside Q.
queueSOUT: entries of S outside Q.
queueO: entries of O.
VLDB 2005, Trondheim, Norway
29
Data Structure
S3
O2
O1
S1
Q
S4
S2



O3
O4
queueSIN: S1 S2
queueO: O1
queueSOUT: S3
9/2/2005
VLDB 2005, Trondheim, Norway
30
maxInfluence and minInfluence

For each entry Sj in queueSIN,



maxInfluence: total weight of entries in queueO
that affect Sj.
minInfluence: total weight of entries in queueO
that ONLY affect Sj, divided by the number of
objects in Sj.
queueSIN is sorted in decreasing order of
maxInfluence.
9/2/2005
VLDB 2005, Trondheim, Norway
31
Algorithm Overview

Expand an entry from one of the three
queues.




Remove the entry from the queue.
Retrieve the referenced node, and insert the
(unpruned) entries into the same queue.
Update maxInfluence and minInfluence if
necessary.
If top-t entries in queueSIN are sites, with
minInfluences ≥ maxInfluences of all
remaining entries, return.
9/2/2005
VLDB 2005, Trondheim, Norway
32
Example
S3
O5
S8

S9


O6
O1
S5
S1
Q
S6
S7





queueSIN: S1
queueO: O1
queueSOUT: S3
queueSIN: S5, S7
queueO: O6
queueSOUT: S9
S6 is not affected by O1, prune S6.
O5 does not affect S5 and S7, prune O5.
9/2/2005
VLDB 2005, Trondheim, Norway
33
A Pruning Case
minExistDNNS3(O1)=4 minDist(S2, O1)=5
Expand S1
S1
S4
S3
S2
O1
minExistDNNS1(O1)=7

S2 is pruned because of minExistDNNS3(O1)
< minDist(S2, O1)
9/2/2005
VLDB 2005, Trondheim, Norway
34
Choosing an Entry to Expand

Expand top entries in queueSIN.

Expand the most important Oi.


Importance: |Oi| * #affected entries * area(Oi)
Expand Sj that contains the most
important Oi.
9/2/2005
VLDB 2005, Trondheim, Norway
35
Choosing an Entry to Expand

Estimate the probability of pruning Oi using
some Sj in queueSOUT.
Q
Q
S1
minDist(S1, O1)=5
minExistDNNS2(O1)=6

S1
O1
minDist(S1, O1)=5
O1
S2
minExistDNNS2(O1)=6
S’2
After expanding S2, O1 is likely not to affect S1.
9/2/2005
VLDB 2005, Trondheim, Norway
36
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
37
Experimental Setup

Data sets:


24,493 populated places in North America
9,203 cultural landmarks in North America
R-tree page size: 1 KB
 LRU buffer: 128 disk pages.
 t = 4.


Comparing to the solution using Voronoi
diagram.
9/2/2005
VLDB 2005, Trondheim, Norway
38
Selected Experimental Results
#sites : #objects
= 1 : 2.5
9/2/2005
VLDB 2005, Trondheim, Norway
39
Selected Experimental Results
#sites : #objects
= 2.5 : 1
9/2/2005
VLDB 2005, Trondheim, Norway
40
Outline
Problem Definition
 Related Work
 The New Metric: minExistDNN
 Data Structures and Algorithm
 Experimental Results
 Conclusions

9/2/2005
VLDB 2005, Trondheim, Norway
41
Conclusions
We addressed a new problem: Top-t most
influential sites query.
 We proposed a new metric: minExistDNN.
It can be used to prune search space in
NN/RNN related problems.
 We carefully designed an algorithm which
systematically browses both R-trees once.
 Experiments showed more than an order
of magnitude improvement.

9/2/2005
VLDB 2005, Trondheim, Norway
42
Thank you!
Q&A
9/2/2005
VLDB 2005, Trondheim, Norway
43