Multiway Trees

Download Report

Transcript Multiway Trees

CIS265
Multiway Trees
Searching and B-Trees
Advanced Tree Structures
Multiway Trees

We now look at trees that are not restricted to
one entry per node

Used for internal and external indices and
search trees, among other things
Multiway Trees
2
Multiway search trees
Basic Theory

“A multi-way search tree of order n is a
general tree in which each node refers to n or
few sub-trees and contains one few keys
than it has sub-trees.”
Pointer to
Keys <
Key1
Key 1
Pointer
to keys
between
Key1 and
Key2
Key 2
Multiway Trees
Pointer
to keys
between
Key2 and
Key3
Key 3
Pointer to
Keys >
Key3
3
Multiway search trees
Basic Theory
A node in a serach tree with pointers to subtrees below it
Multiway Trees
4
Say That Again, Please…




Each node has n children and n-1 keys
The keys in each node are in ascending
order
The keys in the first i children are smaller
than the ith key
The keys in the last n-i children are larger
than the ith key
Multiway Trees
5
How about an example…?
This is a 4-way tree. n=4
50 60 80
30 35
Each node has n
children and n-1
keys.
58 59
52 54
63
61 62
57
This node has 4 children,
and 3 keys
70 73
100
After that, it depends
on your data. Of
course, you also have
to follow all the other
rules.
55 56
Multiway Trees
6
50 60 80
30 35
The keys in
each node
are in
ascending
order
58 59
52 54
63
61 62
57
70 73
100
This holds true for
each node.
55 56
Multiway Trees
7
i=2. The first &
second child all have
i=1. The first
keys smaller than this
child has keys all one.
smaller than this
key.
50 60 80
30 35
The keys in the
first i children are
smaller than the ith
key.
58 59
52 54
63
i=3. The first
three children
all have keys
smaller than
this key.
70 73
100
61 62
57
55 56
Multiway Trees
8
i=1. The last
three children
have keys all
larger than this
key.
i=2. The third & fourth
children all have keys
larger than this one.
i=3. The
last child
has keys
larger than
this key.
50 60 80
30 35
The keys in the last
n-i children are
larger than the ith
key.
58 59
52 54
63
70 73
100
61 62
57
55 56
Multiway Trees
9
Implementing a Multiway Tree

When nodes are not full, multiway search
trees waste storage.

Used primarily to store data on external
direct-access device such as a disk.

Also used as indices for non-ordered files
stored on direct access devices.
Multiway Trees
10
Implementing a Multiway Tree
Simple rule to improve operations:

Try & keep as many keys as possible in each
node (faster to search)
Multiway Trees
11
B-Trees

Reduces the time required for accessing
secondary storage

The nodes can become very large - typically
the size of a block
Multiway Trees
12
B-Trees
A B-tree of order n is a multi-way search tree
with the following properties:

The root has at least one key.

Each non-root node holds (k-1) keys and k pointers
to sub-trees where: n/2  k  n (ceiling function)

All leaves (terminal nodes) are at the same level.
Multiway Trees
13
B-Trees
According to these conditions, a B-Tree is
 always at least half full,
 has relatively few levels, and
 ‘perfectly’ balanced.
Multiway Trees
14
B-Trees
B3-Tree (non-root nodes at full and lowest capacity)
ptr1
key1
ptr2
ptr1
key1
ptr2
key2
ptr3
B5-Tree (non-root nodes at full and lowest capacity)
ptr1
key1
ptr2
key2
ptr3
ptr1
key1
ptr2
key2
ptr3
key3
Multiway Trees
ptr4
key4
ptr5
15
B-Trees
Multiway Trees
16
A B-Tree of Order 5
root
Minimum
Entries
16
42
21
58
76
81
93
The 5 Subtrees are not shown..
11
14
17
19
20
21
22
23
24
Maximum
Entries
Multiway Trees
17
What’s inside the node?
42
A very simplistic drawing.
Num
Entries
Key Data
K D
K D
entry
Entry
key
<key type>
data
<data type>
rightPtr <node pointer>
end entry
node
node
firstPtr
<pointer to node>
numEntries <integer>
entries
<array [1 .. M-1] of entry>
end node
Multiway Trees
18
Inserting a key in a B-Tree

Remember that all leaves in a B-Tree have to
be at the same level


This can present several challenges
There are three cases we need to consider
Multiway Trees
19
Case 1: B-tree Insertion
A key is placed in a leaf which still has some room.
12
5
8
13
12
5
7
8
need to preserve order, so move 8
down by 1, and insert the 7.
13 15
15
Case 2: B-tree Insertion
The leaf in which a key should be placed is full.
In this case, the leaf is split, creating a new leaf, and
half the keys are moved from the full leaf to the new
leaf.
Then, the last key of the new left leaf is moved to the
parent, and a pointers to the new left and right leaves is
placed in the parent as well.
Multiway Trees
21
12
2
5
7
8
13 15
we want to insert a 6
in this leaf
12
2 5
6
7
8
13 15
split the leaf, and add the new value
2 5
6
12
7
8
13 15
the last key of the old leaf is moved to the parent and a pointer to
the new leaf is added to the parent as well.
Case 3: B-tree Insertion
A special case arises if the root of the B-tree is full.
In this case, a new root and a new sibling of the existing
overfloding root is created.
This is the only case in which a B-tree increases in height.
Multiway Trees
23
6
2
3 4
5
7
8 10 11
12
20 30
1415 18 19
21 23 25 28
31 33 34 35
we want to insert a 13
6
2
3 4 5
12 20 30
7 8 10 11
18 19
21 23 25 28
31 33 34 35
split the node as case #2 13 14 15
the last key of the old leaf is moved to the parent and a pointer to the new
leaf is added to the parent as well.
Multiway Trees
24
6
2
3 4 5
12 15
20
7 8 1011
30
18 19
21 23 25 28
31 33 34 35
21 23 25 28
31 33 34 35
13 14
split the root - as there will be
more keys than slots
15
6
2
3 4 5
12
7 8 1011
add a new root, increasing 13 14
the tree’s depth
20 30
18 19
Multiway Trees
25
B-Tree Insert
A B-Tree grows from the bottom up
Multiway Trees
26
Deleting a Key from a B-Tree



Deleting a key is similar to inserting a key
More special cases
Merging rather than splitting nodes
Multiway Trees
27
Deleting a node - Simple Case

If, after deleting a key K, the leaf is at least
half full and only keys greater than K are
moved to the left to fill the hole.

Opposite of insertion
Multiway Trees
28
Deleting a node - Underflow
Case

If after deleting K, the number of keys in
the leaf is less than [m/2]-1, causing an
underflow

If there is a left or right sibling with the number
of keys exceeding the minimum ([m/2]-1), then
all keys from this leaf and this sibling are
redistributed between them by moving the
separator key from the parent to the leaf and
moving one key from the sibling to the parent
Multiway Trees
29
Deleting a node - Underflow
Case (Cont.)


If the leaf underflows and the number of
siblings is [m/2]-1, then the leaf and a sibling
are merged; the keys from the leaf, from its
sibling, and the separating key from the parent
are all put in the leaf, and the sibling node is
discarded.
If we merge siblings, and the parent is the root
with only one key, all the keys are combined in
to a new root, and the old root and old nodes
are discarded
Multiway Trees
30
16
3
1
2
8
22
5 6 7
25
18 20
23 24
27 37
13 14 15
Delete 6
16
3
1 2
8
5 7
13 14 15
Remove the 6, move the 7 over
22
18 20
25
23 24
27 37
16
3
1 2
8
22
5 7
25
18 20
23 24
27 37
23 24
27 37
13 14 15
Delete 7
16
3
1 2
13
22
5 8
18 20
25
14 15
Note the re-arrangement! Nodes can not be less than 50% full
16
3
1 2
13
22 25
5 8
18 20
23 24
27 37
14 15
Delete 8
16
3
1 2
5 13 1415
22
18 20
25
23 24
part 1. We merge 2 cells - deleting the empty one
27 37
16
3
1 2
22 25
5 13 1415
18 20
23 24
27 37
23 24
27 37
continuing...
3
1 2
5 1314 15
16 22 25
18 20
part 2. We merge 3 cells - deleting the empty ones
Multiway Trees
34
3
1 2
5 13 14 15
16 22 25
18 20
23 24
27 37
23 24
27 37
Delete 16
3
1 2
5 13 14
15 22 25
18 20
We adjust keys so that the root remains full
Multiway Trees
35
Another look at Deleting
Nodes from a B-Tree
We will look at 5 more examples to make
sure this is all clear.
Multiway Trees
36
21
78
21
42
11
14
42
45
63
74
85
97
Delete 11: We need to borrow from the right, and move the 21
down, and the 42 up. This insures all the B-Tree rules are still being
followed.
42
14
21
45
78
63
Multiway Trees
74
85
97
37
42
78
74
14
21
45
63
74
78
85
97
Delete 97: We need to borrow from the left, and move the 78 down,
and the 74 up. This insures all the B-Tree rules are still being
followed.
14
21
42
74
45
63
Multiway Trees
78
85
38
14
21
42
74
45
63
78
85
Delete 45: We need to “combine” 63, 74, 78, 85. This insures all
the B-Tree rules are still being followed.
42
14
21
63
Multiway Trees
74
78
85
39
42
14
21
63
74
78
85
Delete 63 & 78: Shift 74 and 85 down to positions 1 & 2 in the
node. This insures all the B-Tree rules are still being followed.
42
14
21
74
Multiway Trees
85
40
42
14
21
74
85
Finally, we delete 42: We need to remove the original root, and
combine all the remaining entries in to one node. This insures all
the B-Tree rules are still being followed.
42
14
21
21
74
85
14
74
85
Step 1: Copy 21 to the parent and delete from the leaf
Multiway Trees
41
21
14
74
85
14
21
74
85
Step 2: Combine 14, 21, 74, 85 and delete original root.
Multiway Trees
42
B* Trees




Variant of the B-Tree
Introduced by Donald Knuth, named by
Douglas Comer
All nodes required to bit at least 2/3 full
Two nodes are split in to three, rather than
one into two
Multiway Trees
43
B+ Trees





Enhancement of B-Trees
Allows all data to be accessed sequentially
References to data are made only from
leaves
Internal nodes are indexed for faster access
Leaves are indexed sequentially
Multiway Trees
44
B+ Trees

Internal Nodes store keys, pointers and a key
count

Leaves store keys, references to records in a
data file associated with the keys, and a
pointer to the next leaf
Multiway Trees
45
1 CD244
2 BF90
index set
BQ322
2 CF04
2 DR300 DR305
2 BQ322 CD123
2 BF90
3 AB203 AS09
BC26
DR300
BF130
3 CF04
DP102
CF05
2 CD244 CF03
leaves
A B+ tree, order of 4
Multiway Trees
46
Inserting in to a B+ tree

Case 1: There is still room in the leaf:



The keys still must be in order
No changes are made to the index set
Case 2: The leaf is full




The leaf is split
The new leaf node is included in the sequence
set
keys are distributed evenly
first key from the new node is copied (not
moved) to the parent
Multiway Trees
47
29
11
1
2
8
10
19
11
13
15
19
26
part of a B+ tree
29
1
2
6
11
19
6
8
10
11
13
15
19
26
after inserting a 6 - note the copy of the new key in the index set
Multiway Trees
48
Deleting from a B+Tree

Case 1: No underflow after removal



Insure keys are still in order
No changes made to index set
Case 2: Removal causes underflow
Keys from leaf & keys from sibling are redistributed
-or leaf is deleted and its keys are added to sibling



Need to update parent
May require changes to index set
Multiway Trees
49
29
1
2
6
11
19
6
8
10
11
13
15
19
26
Delete the 6 - note there is no change made to the index
set!
29
1
2
6
11
8
10
19
11
Multiway Trees
13
15
19
26
50
29
1
2
6
11
8
10
19
11
13
15
19
26
Delete the 2 - note we need to merge 2 leaves, and
update the index set
29
11
1
8
10
11
19
13
15
19
Multiway Trees
26
51
Multiway Trees
53
Prefix B+ Trees



A prefix B+ tree is a B+ tree in which the
chosen separators are the shortest
prefixes that allow us to distinguish two
neighboring index keys
Operations are generally the same as
“standard” B+ trees
Not much difference in execution times
between B+ and Prefix B+ trees

Largely of theoretical interest
Multiway Trees
54
a B+ tree
1 CD244
2 BF90
BQ322
2 CF04
2 DR300 DR305
2 BQ322 CD123
2 BF90
BF130
3 CF04
3 AB203 AS09 BC26
1 CD2
BQ
2 CF04
2 BF90
BF130
DR
2 DR300 DR305
2 BQ322 CD123
3 AB203 AS09 BC26
CF05 DP102
2 CD244 CF03
a Prefix B+ tree
2 BF
DR300
3 CF04
2 CD244 CF03
CF05 DP102
Bit Trees


Bit Trees take Prefix B+ Trees to the extreme
Rather than using bytes of data as
separators, we use bits of data
Multiway Trees
56
R-Trees



R-Trees are used to store spatial data
The “R” is for rectangle
Based on B-trees


A leaf in an R-Tree contains entries of the form
(rect, id) where rect = ([x0, y0], … [xn-1, yn-1])
and is an n-dimensional rectangle and id is a
pointer to the data file
A non-leaf has the form (rect, id) where rect is
the smallest rectangle encompassing all the
rectangles found in child
Multiway Trees
57
2-4 Trees




A special case of B Trees
Rather than storing an entire block of data,
we store 1, 2, or at most 3 elements.
Wastes space
Transform to binary trees

Uses two types of links


links between nodes representing keys belong to
same node
links representing regular parent-child links
Multiway Trees
58
Digital Search Trees

Forms a general tree based on the
symbols of which the keys are composed



Example: If keys are integers, each digit
position determines one of ten possible sons of
a given node
If the keys are alphabetic, each letter of the
alphabet determines a branch in the tree
Every leaf contains the special symbol
“eok” - signifying end of key
Multiway Trees
59
Digital Search Trees
Keys in this
tree:
180
185
186
195
1
8
9
0
5
6
5
eok
eok
eok
eok
Multiway Trees
60