Index Structure for One

Download Report

Transcript Index Structure for One

The Present
Outline
 Index structures for in-memory
 Quad trees
 kd trees
 Index structures for databases
 kdB trees
 Grid files
2301474
II. Index Structure for One-dimensional Data
2
Quad Trees, kD Trees
2301474
II. Index Structure for One-dimensional Data
3
2301474
II. Index Structure for One-dimensional Data
4
Point Quadtrees
 Point quad trees always split regions into 4 parts
 In a 2-d tree, a node N splits a region into two by drawing
one line through the point (N.XVAL,N.YVAL)
 In a point quadtree, a node N splits the region it
represents by drawing both horizontal and vertical line
through the point (N.XVAL,N.YVAL)
 These 4 parts are called the NW, SW, NE, SE quadrants
determined by node N; each of these corresponds to a
child of N.
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Insertion into Point Quadtrees
City
(XVAL,YVAL)
Banja Luka
(19,45)
Derventa
(40,50)
Toslic
(38,38)
Tuzla
(54,35)
Sinj
(4,4)
Deletion in Point Quadtrees
 If the node being deleted is a leaf node, deletion is trivial: we
just set the appropriate link filed of node N’s parent to NIL and
return node to storage
 otherwise, as in the case of 2-d trees, we need to find an
appropriate replacement node
Deletion in Point Quadtrees
 When deleting an interior node N, we must find a replacement
node R in one of the subtrees of N such that
 every other node R1 in N.NW is to the northwest of R
 every other node R2 in N.SW is to the southwest of R
 every other node R3 in N.NE is to the northeast of R
 every other node R4 in N.SE is to the southeast of R
N
R
Deletion in Point Quadtrees
 In general, it may not always be possible to find such a
replacement node
 deletion of an interior node N may require reinsertion of all
nodes in the subtrees of N
 In the worst case, this may require almost all nodes to be
reinserted
Range Searches in Point Quadtree
 Similar to that of 2-d trees
 each node in a point quadtree represents a region
 do not search regions that do not intersect the circle defined
by the query
Region Quadtree
 Representing image
2
3
4
5
13
14
A
1
6
11
7
8
9
10
12
15 16
17 18
NW
1
19
2
3
B
4
SE
C
5
6
7
2301474
SW
NE
F
D
8
II. Index Structure for One-dimensional Data
11 12
9
10
13 14
15
16
E
19
17 18
16
2301474
II. Index Structure for One-dimensional Data
17
Definition
Let k be a positive integer. Let t be a k-d tree, with
a root node p.
Then, for any node n in t with the key Kj as a
discriminator:
 The value of Kj of any node q in the left subtree
of n is smaller than that of node p,
 The value of Kj of any node q in the right
subtree of n is larger than that of node p.
Jaruloj Chongstitvatana
k-d trees
18
Example
20,31
15,15
36,10
6,6
31,40
25,16
Jaruloj Chongstitvatana
k-d trees
40,36
19
Insertion
20,31
15,15
36,10
6,6
31,40
25,16
Jaruloj Chongstitvatana
k-d trees
40,36
20
Insertion Algorithm
insert(subtree T, node N)
/* T is the root node of a subtree, including the dividing axis, and
N is the structure of the node to be inserted including its key values */
{ if (T.axis is ‘X’)
if (T.key.X > N.key.X)
then if (T is leaf)
then addLeftChild(T, N)
else insert(T.leftChild, N)
else if (T is leaf)
then addRightChild(T, N)
else insert(T.rightChild, N)
else if (T.key.Y > N.key.Y)
then if (T is leaf)
then addLeftChild(T, N)
else insert(T.leftChild, N)
else if (T is leaf)
then addRightChild(T, N)
else insert(T.rightChild, N)
}
Jaruloj Chongstitvatana
k-d trees
21
Exact Search
20,31
(40, 36)
15,15
36,10
6,6
31,40
25,16
Jaruloj Chongstitvatana
k-d trees
40,36
22
Search Algorithm
search(subtree T, node N)
//T is the root node of a subtree, including the dividing axis, and
// N is the structure of the node including its key values to be searched
{ if (equal(T, N) then return(T).
if (T.axis is ‘X’)
{
if (T.key.X > N.key.X)
then search(T.leftChild, N)
else search(T.rightChild, N)
}
else if (T.key.Y > N.key.Y)
then search(T.leftChild, N)
else search(T.rightChild, N)
}
Jaruloj Chongstitvatana
k-d trees
23
Range search
20,31
15,15
36,10
6,6
31,40
25,16
Jaruloj Chongstitvatana
k-d trees
40,36
24
Range Search Algorithm
rangeSearch(subtree T, box N)
// T is the root node of a subtree, including the dividing axis, and
// N is the structure of the box including its two opposite corner points
{ if (T is null) then return.
if (in(T, N) then print(T).
if (T.axis is ‘X’)
if (inLeft(T.key.X, N)) then rangeSearch(T.leftChild, N)
if (inRight(T.key.X, N)) then rangeSearch(T.rightChild, N)
else if (T.key.Y > N.key.Y)
if (inLeft(T.key.Y, N)) then rangeSearch(T.leftChild, N)
if (inRight(T.key.Y, N)) then rangeSearch(T.rightChild, N)}
Jaruloj Chongstitvatana
k-d trees
25
Deletion
20,31
15,15
36,10
28,5
38,40
45,8
32,16
40,36
Delete
Copy the
thepink
bluepoint
pointup
Jaruloj Chongstitvatana
k-d trees
26
Deletion
28,5
15,15
36,10
38,40
45,8
32,16
40,36
Delete the old pink point
Jaruloj Chongstitvatana
k-d trees
27
Deletion Algorithm
delete(subtree T, node N)
// T is the root node of a subtree, including the dividing axis, and
// N is the structure of the node including its key values to be deleted
{ D = search(T, N).
if ( D is not null)
{
S = next(T, D)
delete(S, T)
replace(D, S)
}
}
Jaruloj Chongstitvatana
k-d trees
28
kdB trees, Grid Files
2301474
II. Index Structure for One-dimensional Data
29
2301474
II. Index Structure for One-dimensional Data
30
Characteristics
 Multi-way branch
 Height-balanced tree
 Repeatedly divide area of the domain into
disjoint sub-area
 A node in a tree corresponds to a (set of
consecutive) disk page(s)
Jaruloj Chongstitvatana 2006
K-D-B Tree
31
Example of Data Records
Table (stdntID, courseID, grade, year, smstr)
Table (accID, branchID, saving, name, addr)
Table (custID, age, gender, occupation, salary, children,
promotion, since)
Jaruloj Chongstitvatana 2006
K-D-B Tree
32
Nodes = Pages
 Region pages
 Contain a set of <region, ptr. to page>
 Internal nodes
 Point pages
 Contain a set of <point, ptr. to data record>
 Leaf nodes
Jaruloj Chongstitvatana 2006
K-D-B Tree
33
Region Pages
Xmax Xmin Ymax Ymin
Region
…
Xmax Xmin Ymax Ymin
PAGE
PAGE
The branching factor is determined by the
page size and the size of each entry.
Jaruloj Chongstitvatana 2006
K-D-B Tree
34
Point Pages
X
Y
X
DATA RECORD
Y
DATA RECORD
…
X
Y
DATA RECORD
POINT
The branching factor of a point page is
usually larger than that of a region page.
Jaruloj Chongstitvatana 2006
K-D-B Tree
35
Example
Point page
Jaruloj Chongstitvatana 2006
Point page
K-D-B Tree
Point page
Point page
36
Search Point
query
Point page
Jaruloj Chongstitvatana 2006
Point page
K-D-B Tree
Point page
Point page
37
Insert
Insert a point here
and the point page
overflows.
Jaruloj Chongstitvatana 2006
K-D-B Tree
38
Split
 Split a region r with page id p along xi
If r is on the right/left page of xi then put <r, p> in the
right/left page.
Otherwise;
For each children pc of p , split pc along xi
Split r along xi into rleft and rright.
Create 2 new pages with page id pleft and pright.
Move children of p in the left region into pleft and children
in the right region into pright.
Return <rleft, pleft> and <rright, pright> .
Jaruloj Chongstitvatana 2006
K-D-B Tree
39
Split: Example
The page overflows,
This region
and isissplitted.
also splitted.
This region is splitted.
Jaruloj Chongstitvatana 2006
K-D-B Tree
40
Split: Example
The region page is splitted.
The point page is also splitted.
Create a new
Children
pagesregion
are transferred.
page.
Jaruloj Chongstitvatana 2006
K-D-B Tree
41
How to find split axis
 Cyclic: x -> y -> x -> y -> …
 Priority: x -> x -> y -> x -> x -> y -> …
 Possible one
Jaruloj Chongstitvatana 2006
K-D-B Tree
42
Insert
 Insert a record with point a and location l in a tree with
root r
If r is NIL, then create a point page p and insert the record
with <a,l> in p and return p.
Otherwise;
Search for a in the tree with root r until a point page, say
p, is reached.
Insert the record in the point page p.
Jaruloj Chongstitvatana 2006
K-D-B Tree
43
Insert (cont’d)
 Insert a record with point a and location l in a tree with
root r
If the point page p is overflowed, then find an appropriate
axis to split p into pleft and pright.
If p is not the root, then change to p and pleft, and insert
pright into the parent of p.
If p is the root, then create a new root node with two
children of pleft and pright.
Jaruloj Chongstitvatana 2006
K-D-B Tree
44
Insert: Example
Search for the given point
until the point page is found.
Insert here and split
point page if overflows.
Divide region.
Jaruloj Chongstitvatana 2006
K-D-B Tree
45
Insert: Example
Parent page overflows,
then split the page.
This region is splitted.
Jaruloj Chongstitvatana 2006
K-D-B Tree
46
Insert: Example
The point page is splitted.
The region page is splitted.
Jaruloj Chongstitvatana 2006
K-D-B Tree
47
Insert: Example
Insert
The
root
thenode
new isregion
overflowed,
page in and
its parent.
then splitted.
Jaruloj Chongstitvatana 2006
K-D-B Tree
48
Insert: Example
Create the new root node
Jaruloj Chongstitvatana 2006
K-D-B Tree
49
Delete
 Simple, if storage utilization is ignored.
 Otherwise, an underfull page should be merged with another
page.
 When 2 pages are merged, the region of the new page must
be a valid region.
 A number of regions are joinable if their union is also a region.
Jaruloj Chongstitvatana 2006
K-D-B Tree
50
Joinable Regions
Jaruloj Chongstitvatana 2006
K-D-B Tree
51
Unjoinable Regions
Jaruloj Chongstitvatana 2006
K-D-B Tree
52
Delete (cont’d)
 If a page p is underfull, merge sibling pages of p whose regions
are joinable.
 If the newly-created page is overflowed, then split the page.
Jaruloj Chongstitvatana 2006
K-D-B Tree
53
2301474
II. Index Structure for One-dimensional Data
54
Properties of Grid Files
 Support multi-dimensional data, but not high-dimension.
 Every key is treated as primary key.
 The index structure adapts itself dynamically to maintain
storage efficiency.
 Guarantee two disk accesses for point queries
 Values of key must be in lineraly-ordered domain.
Jaruloj Chongstitvatana 2006
Grid Files
55
Structure of Index Structure
 Grid directories: for k-dimensional data
 Grid array


A k-dimensional array
Each element is a pointer to a data page
 Linear scales
 k 1-dimensional array
 Each array defines the partition of values in each dimension.
 Data buckets/ pages
Jaruloj Chongstitvatana 2006
Grid Files
56
Grid Directory
Linear
Pointers
scales
to
data buckets/pages
Grid
array
Jaruloj Chongstitvatana 2006
Grid Files
57
Point Query
A
Y
X
0
6
25
32
F
K
O
Z
Find x=9 and y=“Rat”
Jaruloj Chongstitvatana 2006
Grid Files
58
Range Query
A
Y
X
0
6
25
32
F
K
O
Z
Find 5<x<9 and “Mat”<y<“Rat”
Jaruloj Chongstitvatana 2006
Grid Files
59
Insertion
Also,
Update
splitthe
the
linear
data
page
scale
Overflow,
then
split
the region
Jaruloj Chongstitvatana 2006
Grid Files
60
Insertion
This data page is not split.
Update
Overflow,
Only
thelinear
overflowed
thenscale
split the
data
region
page is split
Jaruloj Chongstitvatana 2006
Grid Files
61
Insertion
The
Therefore,
data page
splitisonly
overflowed,
the databut
page
the directory is not
Jaruloj Chongstitvatana 2006
Grid Files
62
Insertion
Jaruloj Chongstitvatana 2006
Grid Files
63
Merging
A
B
C
D
1
3
E
1
2
F
3
2
Jaruloj Chongstitvatana 2006
Grid Files
A
B
C
E
D
F64
Deletion
A
B
C
A
B
C
D
E D F
Delete point
E
F
Merge does
not occur
data pages
A and B, but directory pages cannot
be merged yet.
Jaruloj Chongstitvatana 2006
Grid Files
65
Deletion
A
B
C
A
B
C
D
E D F
Delete point
E
F
Merge data pages D and F.
Directory pages are also merged.
Jaruloj Chongstitvatana 2006
Grid Files
66
Deletion
A
A
B
CE
DF
B
Delete point
CE
DF
Merge data pages CE and DF.
Directory pages are also merged.
Jaruloj Chongstitvatana 2006
Grid Files
67
THE END
2301474
II. Index Structure for One-dimensional Data
68