Transcript R-Trees

Spatial Indexing
Many slides taken from George
Kollios, Boston University
Spatial Indexing

Point Access Methods can index only
points. What about regions?



Use the transformation technique
Z-ordering and quadtrees
New methods: Spatial Access Methods
SAMs

R-tree and variations
Problem


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer
spatial queries (range, nn, etc)
R-tree




z-ordering: cuts regions to pieces ->
dup. elim.
how could we avoid that?
Idea: try to extend/merge B-trees and
k-d trees
R-tree: a generalization of the B+-tree
for multidimensional spaces
Kd-Btrees

[Robinson, 81]: if f is the fanout, split
point-set in f parts; and so on, recursively
Kd-Btrees


But: insertions/deletions are tricky
(splits may propagate downwards and
upwards)
no guarantee on space utilization
R-trees

[Guttman 84] Main idea: allow parents to
overlap!



=> guaranteed 50% utilization
=> easier insertion/split algorithms.
(only deal with Minimum Bounding Rectangles
- MBRs)
R-trees





A multi-way external memory tree
Index nodes and data (leaf) nodes
All leaf nodes appear on the same level
Every node contains between m and M
entries
The root node has at least 2 entries
(children)
Example

eg., w/ fanout 4: group nearby rectangles
to parent MBRs; each group -> disk page
I
AC
G
F
B
E
D
H
J
Example

F=4
P1
P3
AC
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
Example

F=4
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees - format of nodes

{(MBR; obj_ptr)} for leaf nodes
P1 P2 P3 P4
x-low; x-high
obj
y-low; y-high
ptr ...
...
A B C
R-trees - format of nodes

{(MBR; node_ptr)} for non-leaf nodes
x-low; x-high
y-low; y-high node
ptr
...
P1 P2 P3 P4
...
A B C
R-trees:Search
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees:Search
P1
P3
AC
P1 P2 P3 P4
G
F
B
E
P2 D
I
H
P4 J
A B C
D E
H I
F G
J
R-trees:Search

Main points:




every parent node completely covers its ‘children’
a child MBR may be covered by more than one
parent - it is stored under ONLY ONE of them. (ie.,
no need for dup. elim.)
a point query may follow multiple branches.
everything works for any(?) dimensionality
R-trees:Insertion
Insert X
P1
P3
AC
P1 P2 P3 P4
G
F
B
X
P2 D
I
E
H
P4 J
A B C
D E X
H I
F G
J
R-trees:Insertion
Insert Y
P1
P3
AC
P1 P2 P3 P4
G
F
B
Y
P2 D
I
E
H
P4 J
A B C
D E
H I
F G
J
R-trees:Insertion

Extend the parent MBR
P1
P3
AC
P1 P2 P3 P4
G
F
B
Y
P2 D
I
E
H
P4 J
A B C
D E Y
H I
F G
J
R-trees:Insertion

How to find the next node to insert the
new object?


Using ChooseLeaf: Find the entry that
needs the least enlargement to include Y.
Resolve ties using the area (smallest)
Other methods (later)
R-trees:Insertion

P1
If node is full then Split : ex. Insert w
P3
K
AC
W
B
E
P2 D
I
P1 P2 P3 P4
G
F
H
P4 J
A B C K
H I
D E
F G
J
R-trees:Split

Split node P1: partition the MBRs into two groups.
• (A1: plane sweep,
P1
K
AC
B
until 50% of rectangles)
W
• A2: ‘linear’ split
• A3: quadratic split
• A4: exponential split:
2M-1 choices
R-trees:Split


pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’ ‘seed’
seed2
R
seed1
R-trees:Split



pick two rectangles as ‘seeds’;
assign each rectangle ‘R’ to the ‘closest’ ‘seed’:
‘closest’: the smallest increase in area
seed2
R
seed1
R-trees:Split

How to pick Seeds:
 Linear:Find the highest and lowest side in each
dimension, normalize the separations, choose the
pair with the greatest normalized separation
 Quadratic: For each pair E1 and E2, calculate the
rectangle J=MBR(E1, E2) and d= J-E1-E2. Choose
the pair with the largest d
R-trees:Insertion


Use the ChooseLeaf to find the leaf
node to insert an entry E
If leaf node is full, then Split, otherwise
insert there


Propagate the split upwards, if necessary
Adjust parent nodes
R-Trees:Deletion




Find the leaf node that contains the entry E
Remove E from this node
If underflow:
 Eliminate the node by removing the node entries
and the parent entry
 Reinsert the orphaned (other entries) into the tree
using Insert
Other method (later)
R-trees: Variations



R+-tree: DO not allow overlapping, so
split the objects (similar to z-values)
R*-tree: change the insertion, deletion
algorithms (minimize not only area but
also perimeter, forced re-insertion )
Hilbert R-tree: use the Hilbert values to
insert objects into the tree
R-trees - variations


what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
R-trees - variations



what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
terrible for ‘y’
R-trees - variations



what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
bad for ‘y’
R-trees - variations




what about static datasets (no
ins/del/upd)?
Q: Best way to pack points?
A1: plane-sweep
great for queries on ‘x’;
terrible for ‘y’
Q: how to improve?
R-trees - variations

A: plane-sweep on HILBERT curve!
R-trees - variations


A: plane-sweep on HILBERT curve!
In fact, it can be made dynamic (how?),
as well as to handle regions (how?)
R-trees - variations

Dynamic (‘Hilbert Rtree):



each point has an ‘h’value (hilbert value)
insertions: like a B-tree
on the h-value
but also store MBR, for
searches
R-trees - variations



what about other bounding shapes? (and
why?)
A1: arbitrary-orientation lines (cell-tree,
[Guenther]
A2: P-trees (polygon trees) (MB polygon:
0, 90, 45, 135 degree lines)
R-trees - variations



A3: L-shapes; holes (hB-tree)
A4: TV-trees [Lin+, VLDB-Journal 1994]
A5: SR-trees [Katayama+, SIGMOD97]
(used in Informedia)
R-trees - conclusions




Popular method; like multi-d B-trees
guaranteed utilization
good search times (for low-dim. at
least)
Informix ships DataBlade with R-trees
Spatial Queries
Spatial Queries


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Queries


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Queries


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Queries


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
Spatial Queries


Given a collection of geometric objects
(points, lines, polygons, ...)
organize them on disk, to answer




point queries
range queries
k-nn queries
spatial joins (‘all pairs’ queries)
R-trees - Range search
pseudocode:
check the root
for each branch,
if its MBR intersects the query rectangle
apply range-search (or print out, if this
is a leaf)
R-trees - NN search
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - NN search

Q: How? (find near neighbor; refine...)
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - NN search

A1: depth-first search; then, range
query
P1
AC
G
F
B
q
E
P2 D
I
P3
H
P4 J
R-trees - NN search

A1: depth-first search; then, range
query P1
P3
I
AC
G
F
B
q
E
P2 D
H
P4 J
R-trees - NN search

A1: depth-first search; then, range
query P1
P3
I
AC
G
F
B
q
E
P2 D
H
P4 J
R-trees - NN search

A2: [Roussopoulos+, sigmod95]:


priority queue, with promising MBRs, and
their best and worst-case distance
main idea: Every face of any MBR
contains at least one point of an actual
spatial object!
R-trees - NN search
consider only P2 and P4, for illustration
P1
P3
AC
G
F
B
q
E
P2 D
I
H
P4 J
R-trees - NN search
best of P4
=> P4 is useless
for 1-nn
worst of P2
H
q
E
P2 D
P4 J
R-trees - NN search

what is really the worst of, say, P2?
worst of P2
q
E
P2 D
R-trees - NN search


what is really the worst of, say, P2?
A: the smallest of the two red
segments!
q
P2
Nearest-neighbor searching



Branch and bound strategy
Compute MINDIST and MINMAXDIST [RKV95]
MINDIST(p,R) is the minimum distance between
p and R with corner points l and u

the closest point in R is at least this distance away
R
u
Sqrt sum (pi-ri)2 where
p
MINDIST = 0
l
p
p
ri = li if pi < li
= ui if pi > ui
= pi otherwise
Nearest-neighbor searching

MINMAXDIST(p,R) is the minimum of the maximum distance
to each pair of faces of R
 MaxDistanceToFace(p,R,k) = distance between p and
Maxk= (M1,M2,..,Mk-1mk,MK+1,..,Mn)
 mi = closer of the two boundary points along i axis
 Mi = farther of the two boundary points along i axis
Max2
u
p
l
Max1
Nearest-neighbor searching

MINMAXDIST(p,R) is the minimum of the maximum distance
to each pair of faces of R
 MaxDistanceToFace(p,R,k) = distance between p and
Maxk= (M1,M2,..,Mk-1mk,MK+1,..,Mn)
 mi = closer of the two boundary points along i axis
 Mi = farther of the two boundary points along i axis
Max1
u
l
p
Max2
Pruning



ESTIMATE := smallest MINMAXDIST(p,R)
Prune an MBR R’ for which MINDIST(p,R’) is
greater than ESTIMATE.
Generalize to k-nearest neighbor searching



Maintain kth largest MINMAXDIST
Prune an MBR if MINDIST to it is larger than the
current estimate of kth MINMAXDIST
Can use objects to refine estimate
Order of searching

Depth first order




Inspect children in MINDIST order
For each node in the tree keep a list of
nodes to be visited
Prune some of these nodes in the list
Continue until the lists are empty
Another NN search

Global order [HS99]



Maintain distance to all entries in a
common list
Order the list by MINDIST
Repeat



Inspect the next MBR in the list
Add the children to the list and reorder
Until all remaining MBRs can be pruned
Spatial Join



Find all parks in a city
Find all trails that go through a forest
Basic operation


Single-scan queries


find all pairs of objects that overlap
nearest neighbor queries, range queries
Multiple-scan queries

spatial join
Algorithms

No existing index structures

Transform data into 1-d space [O89]


Partition-based spatial-merge join [PW96]





z-transform; sensitive to size of pixel
partition into tiles that can fit into memory
plane sweep algorithm on tiles
Spatial hash joins [LR96, KS97]
Sort data [BBKK01]
With index structures [BKS93, HJR97]


k-d trees and grid files
R-trees
R-tree based Join [BKS93]
S
R
Join1(R,S)



Repeat
 Find a pair of intersecting entries E in R and F in
S
 If R and S are leaf pages then add (E,F) to
result-set
 Else Join1(E,F)
Until all pairs are examined
CPU and I/O bottleneck
Reducing CPU bottleneck
S
R
Join2(R,S,IntersectedVol)

Repeat






Until all pairs are examined
14+6 comparisons instead of 49
In general, number of comparisons equals


Find a pair of intersecting entries E in R and F in S that
overlap with IntersectedVol
If R and S are leaf pages then add (E,F) to result-set
Else Join2(E,F,CommonEF)
size(R) + size(S) + relevant(R)*relevant(S)
Reduce the product term
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Consider the extents along x-axis
Start with the first entry r1
sweep a vertical line
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s1) intersect along y-dimension
Add (r1,s1) to result set
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if (r1,s2) intersect along y-dimension
Add (r1,s2) to result set
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r1
Start with next entry r2
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reposition sweep line
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Check if r2 and s1 intersect along y
Do not add (r2,s1) to result
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Reached the end of r2
Start with next entry s1
Using Plane Sweep
S
R
s1
s2
r1
r2
r3
Total of 2(r1) + 1(r2) + 0 (s1)+ 1(s2)+ 0(r3) = 4 comparisons
Reducing I/O



Read schedule r1, s1, s2, r3
Every subtree examined only once
Consider a slightly different layout
Reducing I/O
S
R
r2
s1
r1
s2
r3
Read schedule is r1, s2, r2, s1, s2, r3
Subtree s2 is examined twice
Pinning of nodes

After examining a pair (E,F), compute the degree
of intersection of each entry




degree(E) is the number of intersections between E and
unprocessed rectangles of the other dataset
If the degrees are non-zero, pin the pages of the
entry with maximum degree
Perform spatial joins for this page
Continue with plane sweep
Reducing I/O
R
r2
S
s1
r1
s2
r3
After computing join(r1,s2),
degree(r1) = 0
degree(s2) = 1
So, examine s2 next
Read schedule = r1, s2, r3, r2, s1
Subtree s2 examined only once
References



[SK98] Optimal multi-step k-nearest neighbor
search, T. Seidl and H. Kriegel, SIGMOD 1998: 154-165.
[BBKK01] Epsilon Grid Order: An Algorithm for the
Similarity Join on Massive High-Dimensional Data,
C. Bohm, B. Braunmüller, F. Krebs and H.-P.
Kriegel, SIGMOD 2001.
[RKV95] Roussopoulos N., Kelley S., Vincent F.
Nearest Neighbor Queries. Proceedings of the
ACM-SIGMOD International Conference on
Management of Data, pages 71-79, 1995.
References




[HS99] G. R. Hjaltason and H. Samet, Distance
browsing in spatial databases, ACM Transactions on
Database Systems 24, 2 (June 1999), 265-318
[O89] Jack A. Orenstein: Redundancy in Spatial
Databases. SIGMOD Conference 1989: 294-305
[PW96] Jignesh M. Patel, David J. DeWitt: Partition
Based Spatial-Merge Join. SIGMOD Conference
1996: 259-270
[LR96] Ming-Ling Lo, Chinya V. Ravishankar:
Spatial Hash-Joins. SIGMOD Conference 1996:
247-258
References



[KS97] Nick Koudas, Kenneth C. Sevcik: Size
Separation Spatial Join. SIGMOD Conference 1997:
324-335
[HJR97] Yun-Wu Huang, Ning Jing, Elke A.
Rundensteiner: Spatial Joins Using R-trees:
Breadth-First Traversal with Global Optimizations.
VLDB 1997, 396-405
[BKS93] Thomas Brinkhoff, Hans-Peter Kriegel,
Bernhard Seeger: Efficient Processing of Spatial
Joins Using R-Trees. SIGMOD Conference 1993:
237-246