Transcript 25-btrees

CSE 373
Data Structures and Algorithms
Lecture 25: B-Trees
Assumptions of Algorithm Analysis
Big-Oh assumes all operations take the same amount of
time


2
Is this really true?
Disk Based Data Structures
All data structures we have examined are limited to main
memory


Have been assuming data fits into main memory
Counter-example: Transaction data of a bank > 1 GB per
day



3
Uses secondary storage (e.g., hard disks)
Operations: insert, delete, searches
Cycles* to access:
CPU
Registers
1
Cache
tens
Main memory
hundreds
Disk
millions
*cycle: time it takes to
execute an instruction
Hard Disks
Large amount of storage
but slow access
Identifying a particular block
takes a long time



Read or write data in chunks
(“a page”) of 0.5 – 8 KB in
size
(Newer technology) Solidstate drives are 50 – 100
times faster


5
Still “slow”
Algorithm Analysis
Running time of disk-based data structures measured
in terms of



Computing time (CPU)
Number of disk accesses
Regular main-memory algorithms that work one data
element at a time can not be "ported" to secondary
storage in a straight forward way

6
Principles

Almost all of our data structure is on disk.

Every time we access a node in the tree it amounts to a
random disk access.

How can we address this problem?
7
M-ary Search Tree

Suppose we devised a search tree with branching factor
M:

M – 1 keys needed to decide branch to take

Complete tree has height: (logMn)

# Nodes accessed for search: (logMn)
8
B-Trees
Internal nodes store (up to)
M  1 keys

M=7
Order property:



Subtree between two keys x
and y contain leaves with
values v such that x  v < y
Note the “”
Leaf nodes contain
up to L sorted values/
data items (“records”).

9
3 7 12 21
x<3
3x<7
7x<12
12x<21
21x
B-Tree Structure Properties

Root (special case)


Internal nodes



Has between 2 and M children (or could be a leaf)
Stores up to M1 keys
Have between ceiling(M/2) and M children
Leaf nodes


Nodes are at least ½ full
Leaves are at least ½ full
Where data is stored
Contains between ceiling(L/2) and L data items
The tree is perfectly balanced !
10
B-Tree: Example

B-Tree with M = 4 (# pointers in internal node)
and L = 5
(# data items in leaf)
12 44
Data objects…
which we’ll ignore
in slides
6
1, AB..
2, GH..
4, XY..
6
8
9
10
20 27 34
12
14
16
17
19
20
22
24
27
28
32
50
34
38
39
41
44
47
49
Definition for later: “neighbor” is the next sibling to the left or right.
11
50
60 All leaves
70 at the same
depth
Disk Friendliness

Many keys stored in a node


Each node is one disk page/block.
All brought to memory/cache in one disk access.

Internal nodes contain only keys; only leaf nodes contain
actual data

What is limiting you from increasing the number of keys
stored in each node?

12
The page size
Exercise: Computing M and L

Exercise: If disk block is 4000 bytes, key size is 20 bytes,
pointer size is 4 bytes, and data/value size is 200 bytes, what
should M (# of branches) and L (# of data items per leaf) be
for our B-Tree?
Solve for M:
M - 1 keys + M pointers = 20M - 20 + 4M = 24M – 20
24M - 20 <= 4000
M = 167
Solve for L:
L = 4000 / 200
L = 20
13
B-trees vs. AVL trees
Suppose we have n = 109 (billion) data items:

Depth of AVL Tree: log2 109 = 30

Depth of B-Tree with M = 256, L = 256:
~ log M / 2109 = log 128109 = 4.3
(M/2 because keys half-full)
14
14
Building a B-Tree with Insertions
3
Insert(3)
3
Insert(18)
3
Insert(14)
18
14
18
The empty B-Tree
M=3
15
L=3
3
3
3
18
14
14
14
30
18
18
Insert(30)
30
M=3
16
L=3
18
3
18
18
3
18
18
Insert(32)
14
30
32
3
18
32
14
30
36
Insert(36)
14
30
32
18
M=3
L=3
Insert(15)
3
18
32
14
30
36
15
17
32
18
3
32
18
18
32
32
3
18
32
14
30
36
Insert(16)
14
30
36
15
15
16
18
15
32
18
15
18
32
3
15
18
32
14
16
30
36
M=3
L=3
Insert(12,40,45,38)
18
18
15
32
15
32
40
3
15
18
32
3
15
18
32
40
14
16
30
36
12
16
30
36
45
14
19
38
Insertion Algorithm: The Overflow Step
K0
K1
K2
K3
Too big
M=5
20
K0
K6
K4
K5
K1
K3
K2
K6
K4
K5
Insertion Algorithm
1.
Insert the key in its leaf in sorted order
2.
If the leaf ends up with L+1 items, overflow!



21
Split the leaf into two nodes with each half of the data.
Add the new leaf to the parent.
If the parent ends up with M+1 children, overflow!
Insertion Algorithm
3. If an internal node ends up with M+1 children, overflow!



4.
Split the internal node into two nodes each with half the keys.
Add the new child to the parent
If the parent ends up with M+1 items, overflow!
If the root ends up with M+1 children, split it in two, and
create new root with two children
This makes the tree deeper!
22
And Now for Deletion…
Delete(32)
18
15
32
18
15
40
40
3
15
18
32
40
3
15
18
36
40
12
16
30
36
45
12
16
30
38
45
14
38
M=3
23
L=3
14
Delete(15)
18
15
36
40
18
16
3
15
18
36
40
3
12
16
30
38
45
12
14
36
16
40
18
36
40
30
38
45
14
Are we okay?
M=3
24
L=3
Leaf not half full!
Are you using that 14?
Can I borrow it?
18
18
16
3
36
16
12
14
M=3
25
L=3
40
14
36
40
18
36
40
3
14
18
36
40
30
38
45
12
16
30
38
45
Delete(16)
18
14
36
40
14
3
14
18
36
40
3
12
16
30
38
45
12
M=3
L=3
Are you using that 14?
26
18
36
14
40
18
36
40
30
38
45
18
18
14
3
36
14
12
36
40
40
18
36
40
3
18
36
40
30
38
45
12
30
38
45
14
M=3
27
L=3
Are you using the node18/30?
18
36
36
40
18
40
3
18
36
40
3
18
36
40
12
30
38
45
12
30
38
45
14
14
M=3
28
L=3
Delete(14)
36
36
18
40
18
40
3
18
36
40
3
18
36
40
12
30
38
45
12
30
38
45
14
M=3
29
L=3
Delete(18)
36
36
18
18
40
3
18
36
40
3
12
30
38
45
12
M=3
30
L=3
40
30
36
40
38
45
36
36
18
3
40
30
12
40
36
40
3
36
40
38
45
12
38
45
30
M=3
31
L=3
36
40
36
40
3
36
40
3
36
40
12
38
45
12
38
45
30
M=3
32
30
L=3
36
36
3
12
30
M=3
33
L=3
40
36
38
40
45
40
3
36
40
12
38
45
30
Deletion Algorithm: Rotation Step
K0
K2
K1
K0
K3
K4
K5
K1
K3
K2
K4
Too small
M=5
34
This is left rotation. Similarly, right rotation
K5
Deletion Algorithm: Merging Step
Too small!
K2
K1
K3
Too small
M=5
35
K4
Can’t get
smaller
K1
K2
K3
K4
Deletion Algorithm
1.
Remove the key from its leaf
2.
If the leaf ends up with fewer than L/2 items,
underflow!



36
Try a left rotation
If not, try a right rotation
If not, merge, then check the parent node for underflow
Deletion Slide Two
3.
If an internal node ends up with fewer than M/2
children, underflow!



3.
Try a left rotation
If not, try a right rotation
If not, merge, then check the parent node for underflow
If the root ends up with only one child, make the child
the new root of the tree
This reduces the height
of the tree!
37