children of korn
Download
Report
Transcript children of korn
SIGMOD 2000
Reverse Nearest Neighbor Queries
for Dynamic Databases
SHOU Yu Tao
Jan. 10th, 2003
1
Outline of the Presentation
Background
Nearest neighbor (NN) search algorithm
[RKV95]
Reverse nearest neighbor (RNN) search
algorithm [SAA00]
Other NN related problems – CNN, RNNa,
etc.
Conclusions
References
Q &A
2
Background
RNN(q) – returns a set of data points that
have the query point q as the nearest
neighbor.
Receives much interests during recent years
due to its increased importance in advanced
database applications:
• fixed wireless telephone access application –
“load” detection problem:
count how many users are currently using a specific base
station q if q’s load is too heavy activating an inactive
base station to lighten the load of that over loaded base
station
3
Nonsymmetrical property of RNN queries
•
NN(q) = p
•
If p is the nearest neighbor of q, then q need not be the nearest
neighbor of p (in this case the nearest neighbor of p is r).
•
those efficient NN algorithms cannot directly applied to solve the RNN
problems. Algorithms for RNN problems are needed.
NN(p) = q
p
r
q
A straight forward solution:
-- check for each point whether it has q as its nearest neighbor
-- not suitable for large data set!
4
Two versions of RNN problem
monochromatic version:
-- the data points are of two categories, say red and blue. The RNN
query point q is in one of the categories, say blue. So RNN(q)
must determine the red points which have the query point q as
the closest blue point.
-- e.g. fixed wireless telephone access application:
clients/red (e.g. call initiation or termination)
servers/blue (e.g. fixed wireless base stations)
bichromatic version:
-- all points are of the same color is the monochromatic version.
Static vs Dynamic:
-- whether insertions or deletions of the data points are allowed.
5
RNN problem this paper concerns
Monochromatic case
Dynamic case
Whole Algorithm is based on:
(1). Geometric observations enable a reduction of
the RNN problem to the NN problem.
(2). NN search algorithm [RKV95].
* Both RNN(q) and NN(q) are sets of points in the
databases, while query point q may or may not
correspond to an actual data point in the data base.
6
Geometric Observations
Let the space around a query point q be divided into six equal
regions Si (1<=i<=6) by straight lines intersecting q. Si therefore is
the space between two space dividing lines.
L3
L2
s2
s3
s1
q
s4
L1
s6
s5
Proposition 1: For a given 2-dimensional dataset, RNN(q) will
return at most six data points. And they are must be on the same
circle centered at q.
7
Geometric Observations
Proposition 2:
In each region Si:
(1). There exists at most two RNN
points
(2). If there exist exactly two RNN
points in a region Si, then each
point must be on one of the space
dividing lines through q delimiting
Si.
L3
L2
s2
s3
q
s4
p
s1
L1
s6
s5
Proposition 3:
In each region Si, let p = NN(q) in Si,
if p is not on a space dividing line,
then either NN(p) = q (and then
RNN(q) = p) or RNN(q) = null.
8
Important result from Observations
Implications:
In a region Si, if the number of results of NN(q) is:
(1) one point only:
If NN(q) is not on the space dividing lines: either the
nearest neighbor is also the reverse nearest neighbor, or there
is no RNN(q) in Si.
(2) more than one point, (but the NN(q) of each region will return
at most two points for each region):
These two points must be on the two dividing lines and on
the same circle centered at q.
Allow us to have a criterion for limiting the choice of RNN(q) to
one or two points in each of the six regions Si.
The RNN query has been reduced to the NN query
9
Basic NN Search Algorithm
y axis
E7
10
8
E1
e
6
b
0
i
E9
query point contents
omitted
E4
a
c
E3
2
• return single NN(q) result only
g
h
E6
4
2
E8
E5
d
• This is based on MINDIST metric only
E2
f
4
6
Root
E
1
1
x axis
8
E
1
E
4
5
10
a
5
E
4
b
13
c
18
E
2
2
E
3
8
E
2
E
5
5
E
6
9
d
13
E
5
e
13
E
7
13
f
10
E
8
2
E
9
17
h
2
g
13
i
10
E
8
10
Algorithms in [RKV95]
Two metrics introduced – effectively directing
and pruning the NN search
• MINDIST (optimistic)
• MINMAXDIST (pessimistic)
DFS Search
11
MINDIST(Optimistic)
MINDIST(RECT,q): the shortest distance from
RECT to query point q
This provides a lower bound for distance from
q to objects in RECT
MINDIST guarantees that all points in the
RECT have at least “MINDIST” distance from
the query point q.
12
MINMAXDIST(Pessimistic)
MBR property: Every face (edge in 2D, rectangle in
3D, hyper-face in high D) of any MBR contains at
least one point of some spatial object in the DB.
MINMAXDIST: Calculate the maximum dist to each
face, and choose the minimal.
Upper bound of minimal distance
MINMAXDIST guarantees that at least 1 object with
distance less or equal to MINMAXDIST in the MBR
13
Illustration of MINMAXDIST
(t1,t2)
MINDIST
(t1,p2)
(q1,q2)
MINMAXDIST
y
(s1,s2)
(t1,s2)
x
14
Pruning
Downward Pruning – during the descending phase
• MINDIST(q, M) > MINMAXDIST(q, M’) :
M can be pruned
• Distance(q, O) > MINMAXDIST(q, M’) :
O can be discarded
Upward Pruning – when return from the recursion
• MINDIST(q, M) > Distance(q, O)
• M can be pruned
15
DFS Search on R-Tree
Traversal: DFS
• Expanding non-leaf node during the descending phase: Order
all its children by the metrics (MINDIST or MINMAXDIST) and
sort them into an Active Branch List (ABL). Apply downward
pruning techniques to the ABL to remove unnecessary
branches.
• Expanding leaf node: Compare objects to the nearest
neighbor found so far. Replace it if the new object is closer.
• At the return from the recursion phase: Using upward pruning
tech.
16
RNN Algorithm
Algorithm Outline for RNN(q) query:
1. Construct the space dividing lines so that space has
been divided into 6 regions based on the query point q.
2. (a) * Traverse R-tree and find one or two points in
each region Si that satisfy the nearest neighbor condition
NN(q). -- this part is also called “conditional NN queries”
(b) The candidate points are tested for the condition
whether their nearest neighbor is q and add to answer
list if condition is fulfilled.
3. Eliminate duplicates in RNN(q)
17
How to find NN(q) in Si
p2
Si
p1
q
p3
p4
p7
p6
p5
Brute-force Algorithm:
finds all the nearest neighbors until there is one in the queried region Si.
inefficient! (as shown in the figure)
18
How to find NN(q) in Si
The only difference between the NN algorithm
proposed by [RKV95] and conditional NN
algorithm resides only in the metric used to sort
and prune the list of candidate nodes.
19
New MINMAXDIST definition
Mindist(q, M)
Mindist(q, M)
Minmaxdist(q, M)
Minmaxdist(q, M)
queried region S
queried region S
MINMAXDIST(q, M, Si) = distance to furthest vertex on closest face IN Si
MINDIST(q, M, Si) = MINDIST(q, M)
20
New metric definition
Number of vertices in Si
Mindist(q, M, Si) Minmaxdist(q, M, Si)
0 (no intersection of M with Si)
Infinite
Infinite
0 (M intersects Si, Case E)
1 (case D)
Mindist(q, M)
Infinite because cannot
guarantee there are data
points in both M and S
2 (case C), 3 (case B) , 4 (case A)
Mindist(q, M)
Distance to furthest vertex
on closest face IN Si
Mindist(q, M, Si) = Mindist(q, M)
Because mindist(q, M) is valid for all case,
since it provides a definite lower bound on
the location of data points inside an MBR,
although a little bit looser.
21
CNN/NN algorithm difference
When expanding non-leaf node during the descending
phase:
• NN Search:
Order all its children by the metrics (MINDIST or MINMAXDIST) and
sort them into an Active Branch List (ABL). Apply downward pruning
techniques to the ABL to remove unnecessary branches.
• CNN Search:
-- build a set of lists branchList[i][nodecard]
0<=i<=num_section-1 the list whose pointer points
to the children of that node and overlaps with
region (i+1)
i = num_section the list contains the counter (for each child)
the total number of sections overlaps with this child
child with higher counter is visited first for I/O
optimization.
22
Other NN related researches
1. NN and RNN for moving objects [BJKS02]
2. CNN [PTS02]
3. RNNA over data streams [KMS02]
23
Conclusions
The RNN algorithm proposed is based on using the underling
indexing data structure (R-tree), also necessary to answer NN
queries.
By integrating RNN queries in the framework of already existing
access structures, the approach developed in this paper is
therefore algorithmic and independent of data structures
constructed especially for a set of such queries.
No additional data structures are necessary, therefore the space
requirement does not increase.
24
References
•
[RKV95] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neighbor
queries. In SIGMOD, 1995.
•
[SAA00] I. Stanoi, D. Agrawal, and A. El Abbadi. Reverse nearest neighbor
queries for dynamic databases. In Proceedings of the ACM SIGMOD
Workshop on Data Mining and Knowledge Discovery (DMKD), 2000.
•
[KM00] Korn, F. and Muthukrishnan, S., Influence Set Based on Reverse
Nearest Neighbor Queries. SIGMOD, 2000.
•
[BJKS02] Benetis, R., Jensen, C., Karciauskas, G., Saltenis, S. Nearest
Neighbor and Reverse Nearest Neighbor Queries for Moving Objects. IDEAS,
2002
•
[PTS02] Papadias, Tao, Y. and Shen, D., Continuous Nearest Neighbor Search.
VLDB, 2002.
•
[KMS02] Korn, F., Muthukrishnan, S. and Srivastava, D., Reverse nearest
neighbor aggregates over data streams. VLDB, 2002.
25
Questions and Answers
Any Questions?
26
Thank you for attending
this presentation!
27