Transcript Powerpoint
R-Trees (Rectangle-Trees)
• Extension of B+-trees.
Collection of d-dimensional rectangles.
A point in d-dimensions is a trivial rectangle.
Non-rectangular Data
• Non-rectangular data may be represented by
minimum bounding rectangles (MBRs).
Operations
• Insert
• Delete
• Find all rectangles that intersect a query
rectangle.
• Good for large rectangle collections stored
on disk.
R-Trees—Structure
• Data nodes (leaves) contain rectangles.
• Index nodes (non-leaves) contain MBRs for
data in subtrees.
• MBR for rectangles or MBRs in a non-root
node is stored in parent node.
R-Trees—Structure
• R-tree of order M.
Each node other than the root has between m
<= ceil(M/2) and M rectangles/MBRs.
• Assume m = ceil(M/2) henceforth.
Typically, m = ceil(M/2).
Root has between 2 and M rectangles/MBRs.
Each index node has as many MBRs as
children.
All data nodes are at the same level.
Example
• R-tree of order 4.
Each node may have up to 4 rectangles/MBRs.
Example
• Possible partitioning of our example data
into 12 leaves.
Example
• Possible R-tree of order 4 with 12 leaves.
mn op
a bc d
ef
g h i
j k l
Leaves are data nodes that contain 4 input rectangles each.
a-p are MBRs
mn o p
Example
• Possible corresponding
grouping.
a
m
b
c
d
a b cd
e f
g h i
jk l
mn o p
Example
• Possible corresponding
grouping.
a
m
b
c
d
a b cd
e
e f
n
g h i
f
jk l
mn o p
Example
• Possible corresponding
grouping.
a
c
e f
n
e
m
b
a b cd
g h i
jk l
f
g
o
h
d
i
p
Query
• Report all rectangles that intersect a given
rectangle.
Query
• Start at root and find all MBRs that overlap query.
• Search corresponding subtrees recursively.
mn o p
Query
a b cd
e f
g h i
jk l
n
m
a
a
a
o
x
p
mn o p
Query
a b cd
• Search m.
e f
g h i
jk l
n
a
m
a
b
a
c
x
o
d
x
p
Insert
• Similar to insertion into B+-tree but may
insert into any leaf; leaf splits in case
capacity exceeded.
Which leaf to insert into?
How to split a node?
Insert—Leaf Selection
• Follow a path from root to leaf.
• At each node move into subtree whose MBR area
increases least with addition of new rectangle.
n
m
o
p
Insert—Leaf Selection
• Insert into m.
m
Insert—Leaf Selection
• Insert into n.
n
Insert—Leaf Selection
• Insert into o.
o
Insert—Leaf Selection
• Insert into p.
p
Insert—Split A Node
• Split set of M+1 rectangles/MBRs into 2 sets A
and B.
A and B each have at least m rectangles/MBRs.
Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node
• Split set of M+1 rectangles/MBRs into 2 sets A
and B.
A and B each have at least m rectangles/MBRs.
Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node
• Split set of M+1 rectangles/MBRs into 2 sets A
and B.
A and B each have at least m rectangles/MBRs.
Sum of areas of MBRs of A and B is minimum.
M = 8, m = 4
Insert—Split A Node
• Exhaustive search for best A and B.
Compute area(MBR(A)) + area(MBR(B)) for each
possible A.
Note—for each A, the B is unique.
Select partition that minimizes this sum.
• When |A| = m = ceil(M/2), number of choices for
A is
(M+1)!
m!(M+1-m)!
Impractical for large M.
Insert—Split A Node
• Grow A and B using a clustering strategy.
Start with a seed rectangle a for A and b for B.
Grow A and B one rectangle at a time.
Stop when the M+1 rectangles have been
partitioned into A and B.
Insert—Split A Node
• Quadratic Method—seed selection.
Let S be the set of M+1 rectangles to be
partitioned.
Find a and b in S that maximize
area(MBR(a,b)) – area(a) – area(b)
M = 8, m = 4
Insert—Split A Node
• Quadratic Method—seed selection.
Let S be the set of M+1 rectangles to be
partitioned.
Find a and b in S that maximize
area(MBR(a,b)) – area(a) – area(b)
M = 8, m = 4
Insert—Split A Node
• Quadratic Method—assign remaining
rectangles/MBRs.
Find an unassigned rectangle c that maximizes
|area(MBR(A,c)) – area(MBR(A))
- (area(MBR(B,c)) – area(MBR(B)))|
M = 8, m = 4
Insert—Split A Node
• Quadratic Method—assign remaining
rectangles/MBRs.
Find an unassigned rectangle c that maximizes
|area(MBR(A,c)) – area(MBR(A))
- (area(MBR(B,c)) – area(MBR(B)))|
M = 8, m = 4
Insert—Split A Node
• Quadratic Method—assign remaining
rectangles/MBRs.
Assign c to partition whose area increases least.
M = 8, m = 4
Insert—Split A Node
• Quadratic Method—assign remaining
rectangles/MBRs.
Continue assigning in this way until all remaining
rectangles must necessarily be assigned to one of the
two partitions for that partition to have m rectangles.
M = 8, m = 4
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Separation in xdimension
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Rectangles with max
x-separation
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Divide by x-width
to normalize
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Separation in ydimension
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Rectangles with max
y-separation
Insert—Split A Node
• Linear Method—seed selection.
Choose a and b to have maximum normalized
separation.
M = 8, m = 4
Divide by y-width
to normalize
Insert—Split A Node
• Linear Method—assign remainder.
Assign remaining rectangles in random order.
Rectangle is assigned to partition whose MBR area
increases least.
Stop when all remaining rectangles must be assigned to
one of the partitions so that the partition has its
minimum required m rectangles.
M = 8, m = 4
Delete
• If leaf doesn’t become deficient, simply
readjust MBRs in path from root.
• If leaf becomes deficient, get from nearest
sibling (if possible) and readjust MBRs.
• Combine with sibling as in B+ tree.
• Could instead do a more global
reorganization to get better R-tree.
Variants
• R*-tree
Leaf selection and node overflows in insertion
handled differently.
• Hilbert R-tree
Related Structures
• R+-tree
Index nodes have non-overlapping rectangles.
A data object may be represented in several
data nodes.
No upper bound on size of a data node.
No bounds (lower/upper) on degree of an index
node.
Related Structures
• Cell tree
Combines BSP and R+-tree concepts.
Index nodes have non-overlapping convex
polyhedrons.
No lower/upper bound on size of a data node.
Lower bound (but not upper) on degree of an
index node.