Chap9. Multilevel Indexing and B-Trees

Download Report

Transcript Chap9. Multilevel Indexing and B-Trees

File Structures by Folk, Zoellick, and Ricarrdi
Chap 9. Multilevel Indexing
and B-Trees
서울대학교 컴퓨터공학부
객체지향시스템연구실
SNU-OOPSLA-LAB
김 형 주 교수
File Structure
SNU-OOPSLA Lab
1
Chapter Objectives(1)




Place the development of B-trees in the historical context of the
problems they were designed to solve
Look briefly at other tree structures that might be used on
secondary storage, such as paged AVL trees
Introduce multirecord and multilevel indexes and evaluate the
speed of the search operation
Provide an understanding of the important properties possessed
by B-trees, and show how these properties are especially well
suited to secondary storage applications
File Structure
SNU-OOPSLA Lab
2
Chapter Objectives(2)




Present the object-oriented design of B-trees
 define class BTreeNode and Btree
Explain the implementation of the fundamental operations on Btrees
Introduce the notion of page buffering and virtual B-trees
Describe variations of the fundamental B-trees algorithms, such
as those used to build B* trees and B-trees with variable-length
records
File Structure
SNU-OOPSLA Lab
3
Contents(1)
9.1 Introduction
9.2 Statement of the Problem
9.3 Indexing with Binary Search Trees
: AVL Trees, Paged Binary Trees, Problems with Paged Tress
9.4 Multilevel Indexing
9.5 B-Trees
9.6 Example of Creating a B-Tree
9.7 An Object-Oriented Representation of B-Trees
: Class BTreeNode , Class BTree
File Structure
SNU-OOPSLA Lab
4
Contents(2)
9.8
B-Tree Methods Search, Insert, and Others
9.9
B-Tree Nomenclature
9.10 Formal Definition of B-Tree Properties
9.11 Worst-case Search Depth
9.12 Deletion, Merging, and Redistribution
9.13 Redistribution During Insertion
9.14 B* Trees
9.15 Buffering of Pages : Virtual B-Trees
9.16 Variable-Length Records and Keys
File Structure
SNU-OOPSLA Lab
5
9.1 Introduction : Invention of the B-Tree
Introduction: The Invention of the B-tree

1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing
Corporation) “Organization and Maintenance of Large Ordered Indexes”

1979 : ‘de facto’ standard for database index

D.Comer “The Ubiquitous B-tree” ACM Computing Survey
Why the name B-tree?
 Balanced, Bushy, Broad, Boeing, Bayer

Retrieval, Insertion, Deletion time

= log K I ( I : no of indexes in file, K : no of indexes in a page)

Excellent for dynamically changing random access files
File Structure
SNU-OOPSLA Lab
6
9.2 Statement of the Problem
Statement of the Problem

Problems in an index on secondary storage

Searching the index must be faster than binary searching

In binary search:
15 items - 4 seeks, 1,000 items - 9.5 seeks

Insertion and deletion must be as fast as search
 inserting
a key may involve moving many other
keys in some file structures
File Structure
SNU-OOPSLA Lab
7
9.3 Indexing with Binary Search Trees
Binary Search Tree(1)
 Advantages
 Data
may not be physically sorted
 Good
performance on balanced tree
 Insert
cost = search cost
 Disadvantages
 In
out-of-balance binary tree, more seeks
are required
File Structure
SNU-OOPSLA Lab
8
9.3 Indexing with Binary Search Trees
Binary Search Tree(2)

Sorted list of keys

AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ
KF At most 4 seeks/one record
Binary search tree
representation
FB
HN
CL
AX
File Structure
DE
SD
FT
WS
PA
JD
NR
SNU-OOPSLA Lab
RF
TK
YJ
9
9.3 Indexing with Binary Search Trees
Internal Representation of Binary Tree
 With
RRN(fixed length record) or pointer
key left right
ROOT
9
0 FB 10 8
1 JD
2 RF
3 SD 6 13
4 AX
5 YJ
6 PA 11 2
File Structure
7 FT
SNU-OOPSLA Lab
key left right
8 HN 7 1
9 KF 0 3
10 CL 4 12
11 NR
12 DE
13 WS 14 5
14 TK
10
9.3 Indexing with Binary Search Trees
Unbalanced Binary Tree
KF
FB
CL
AX
DE
SD
HN
FT
WS
PA
JD
NR
RF
TK
YJ
LV
LA NP
- At most 9 seeks/one record
MB
- Worst case : sequential search
ND
NK
File Structure
SNU-OOPSLA Lab
11
9.3 Indexing with Binary Search Trees
AVL Tree(1)

A height-balanced k tree ( HB(k) tree)


Allowable difference in the height of any two sub-tree
is k
AVL Tree : HB(1) Tree

G.M. Adel’son, Vel’skii, E.M. Landis

Maintenance overhead is needed
Performance


Given N keys, worst-case search => 1.44 log2(N+2)
cf. Completely balanced AVL tree : worst-case search => log2(N+1)
File Structure
SNU-OOPSLA Lab
12
9.3 Indexing with Binary Search Trees
AVL Tree(2)
(a) AVL Trees
(b) Non - AVL Trees
X
X
X
File Structure
SNU-OOPSLA Lab
X
13
9.3 Indexing with Binary Search Trees
AVL Tree(3)

Binary tree structure that is balanced nature with
respect to the height of subtree

Definition

An empty tree is height balanced

If T is a nonempty binary tree with TL and TR as its
left and right subtrees, then T is height balanced iff
(1) TL and TR are height balanced and (2) |hL-hR|<1
where hL and hR are the heights of TL and TR,
respectively
File Structure
SNU-OOPSLA Lab
14
9.3 Indexing with Binary Search Trees
AVL Tree(4)

BalanceFactor, BF(T), of a node T in a binary tree
is hL-hR where hL and hR are the height of the left
and right subtree of T

For any node in tree T in AVL tree,
BF(T) should be one of “ -1,

0,
1”
If BF(T) is -2 or 2, then proper rotation is carried
out in order to get balance
File Structure
SNU-OOPSLA Lab
15
9.3 Indexing with Binary Search Trees
AVL Tree(5)
New Identifier
After Insertion
MARCH
0
MAR
New Identifier
After Insertion
MAY
No Rebalancing needed
No Rebalancing needed
-1
MAR
0
MAY
New Identifier
NOVEMBER
After Rebalancing
After Insertion
-2
MAR
-1
MAY
File Structure
0
MAY
RR
0
MAR
0
NOV
SNU-OOPSLA Lab
0
NOV
16
9.3 Indexing with Binary Search Trees
AVL Tree(6)
New Identifier
After Insertion
AUGUST
+1
MAY
+1
MAR
No Rebalancing needed
0
NOV
0
AUG
File Structure
SNU-OOPSLA Lab
17
9.3 Indexing with Binary Search Trees
AVL Tree(7)
New Identifier
After Insertion
APRIL
+2
MAY
+2
MAR
After Rebalancing
LL
+1
MAY
0
NOV
+1
AUG
0
AUG
0
APR
0
NOV
0
MAR
0
APR
File Structure
SNU-OOPSLA Lab
18
9.3 Indexing with Binary Search Trees
AVL Tree(8)
New Identifier
After Rebalancing
After Insertion
+2
MAY
JANUARY
-1
AUG
0
APR
0
MAR
LR
0
NOV
+1
MAR
0
AUG
0
APR
-1
MAY
0
JAN
0
NOV
0
JAN
File Structure
SNU-OOPSLA Lab
19
9.3 Indexing with Binary Search Trees
AVL Tree(9)
New Identifier
No Rebalancing needed
After Insertion
+1
MAR
DECEMBER
-1
AUG
-1
MAY
+1
JAN
0
APR
0
NOV
0
DEC
File Structure
SNU-OOPSLA Lab
20
9.3 Indexing with Binary Search Trees
AVL Tree(10)
New Identifier
+1
MAR
JULY
-1
AUG
-1
MAY
0
JAN
0
APR
0
DEC
File Structure
No Rebalancing needed
After Insertion
0
NOV
0
JUL
SNU-OOPSLA Lab
21
9.3 Indexing with Binary Search Trees
AVL Tree(11)
New Identifier
FEBRUARY
RL
+2
MAR
-2
AUG
+1
MAR
-1
MAY
+1
JAN
0
APR
After Rebalancing
After Insertion
0
NOV
0
JUL
-1
DEC
0
DEC
0
JAN
+1
AUG
0
APR
-1
MAY
0
FEB
0
NOV
0
JUL
0
FEB
File Structure
SNU-OOPSLA Lab
22
9.3 Indexing with Binary Search Trees
AVL Tree(12)
JUNE
-1
DEC
0
APR
LR
+2
MAR
0
FEB
0
NOV
-1
JUL
0
JAN
0
MAR
+1
DEC
-1
MAY
-1
JAN
+1
AUG
After Rebalancing
After Insertion
New Identifier
+1
AUG
0
APR
0
FEB
-1
MAY
-1
JUL
0
JUN
-1
NOV
0
JUN
File Structure
SNU-OOPSLA Lab
23
9.3 Indexing with Binary Search Trees
AVL Tree(13)
OCTOBER
-1
JAN
0
APR
RR
-1
MAR
+1
DEC
+1
AUG
After Rebalancing
After Insertion
New Identifier
0
FEB
-1
JUL
-2
MAY
0
JUN
-1
NOV
0
OCT
File Structure
SNU-OOPSLA Lab
24
9.3 Indexing with Binary Search Trees
AVL Tree(14)
0
JAN
+1
DEC
+1
AUG
0
APR
File Structure
0
MAR
0
FEB
0
NOV
-1
JUL
0
JUN
SNU-OOPSLA Lab
0
MAY
0
OCT
25
9.3 Indexing with Binary Search Trees
AVL Tree(15)
New Identifier
No Rebalancing needed
After Insertion
-1
JAN
SEPTEMBER
+1
DEC
+1
AUG
0
APR
-1
MAR
0
FEB
-1
NOV
-1
JUL
0
JUN
0
MAY
-1
OCT
0
SEP
File Structure
SNU-OOPSLA Lab
26
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(1)

Rebalancing is carried out using four different kinds of
rotations
 LL when new node Y is inserted in the left subtree of
the left subtree of A
 LR when new node Y is inserted in the right subtree
of the left subtree of A
 RR when new node Y is inserted in the right subtree
of the right subtree of A
 RL when new node Y is inserted in the left subtree of
the right subtree of A
File Structure
SNU-OOPSLA Lab
27
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(2)
A
Insert Y
LL
File Structure
LR
SNU-OOPSLA Lab
RL
RR
28
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(LL)
Balanced Subtree
Unbalanced following
insertion
+1
A
+2
A
0
B
AR
0
B
h+2
rotation type
LL
BL
BL
0
B
BL
AR
0
A
h+2
h
BR
Balanced Subtree
BR
AR
BR
Height of BL increase to h+1
(BL < B < BR < A < AR)
File Structure
SNU-OOPSLA Lab
29
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(RR)
Unbalanced following
insertion
Balanced Subtree
0
B
AL
BL
rotation type
RR
-2
A
-1
A
0
B
BR
0
A
0
B
AL
h+2
BR
Balanced Subtree
h+2
BL
BR
Al
BL
Height of BR increase to h+1
(AL < A < BL < B < BR)
File Structure
SNU-OOPSLA Lab
30
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(LR)
Balanced Subtree
+1
A
0
B
Unbalanced following
insertion
rotation type
Balanced Subtree
0
C
LR(a)
+1
A
0
B
-1
B
0
A
0
C
(B < C < A)
File Structure
SNU-OOPSLA Lab
31
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(LR)
Unbalanced following
insertion
Balanced Subtree
0
B
AR
h+2
BL
CR
h+2
-1
A
AR
+1
C
h
h-1 CL
0
C
0
B
-1
B
h
0
C
BL
rotation type
LR(b)
+2
A
+1
A
Balanced Subtree
CL
BL
CL
CR
AR
h
CR
(BL < B < CL < C < CR < A < AR)
File Structure
SNU-OOPSLA Lab
32
9.3 Indexing with Binary Search Trees
AVL Tree : Rebalancing(LR)
Unbalanced following
insertion
Balanced Subtree
0
B
AR
h-1 CL
h+2
BL
CR
0
C
h+2
0
A
+1
B
-1
B
h
0
C
BL
rotation type
LR(c)
+2
A
+1
A
Balanced Subtree
AR
-1
C
CL
BL
CL
CR
AR
h
CR
RL a, b and c are symmetric to LR a, b and c
File Structure
SNU-OOPSLA Lab
33
9.3 Indexing with Binary Search Trees
Paged Binary Tree(1)



Page

A unit of disk I/O for handling seek and transfer of disk data

Typically, 4k, 8k, 16k ...
Paged Binary Tree

Divide a binary tree into pages and then store each page in a
block of contiguous locations on disk.

If every page holds 7 keys, 511 nodes(keys) in only three seeks
Performance : # of seeks

A completely full balanced tree : log2 (N+1)

A completely full paged tree : log(k+1) (N+1)
(k : # of keys hold in a single page)
File Structure
SNU-OOPSLA Lab
34
9.3 Indexing with Binary Search Trees
Paged Binary Tree(2)
File Structure
SNU-OOPSLA Lab
35
9.3 Indexing with Binary Search Trees
The Problem with Paged Trees

Only valid when we have the entire set of keys in
hand before the tree is built

Problems due to out of balance


How to select a good separator

How to group keys

How to guarantee the maximum loading
B-tree provides a solution for above problems!
File Structure
SNU-OOPSLA Lab
36
9.3 Indexing with Binary Search Trees
Paged Binary Tree (Out of balance)
random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V
D
C
S
A
M
U
I
B
G
E
T
P
K
H J
N
L
R
O Q
W
V
Y
X Z
F
File Structure
SNU-OOPSLA Lab
37
9.4 Multilevel Indexing : A Better Approach to Tree Indexes
Multilevel Indexing



Approach as simple index record
 limited on the number of keys allowed
Approach as multirecord index
 consists of a sequence of simple index records
 binary search is too expensive
Approach as multilevel index
 reduced the number of records to be searched
 speed up the search
<example>
80Mbytes file of 8,000,000 records
10-byte keys
File Structure
SNU-OOPSLA Lab
38
9.4 Multilevel Indexing : A Better Approach to Tree Indexes
Example of Multilevel Indexing
4th level index
1
1 2
8
a single index record with 8 keys
3rd level index
1
2
1 2 ...
:
:
8
100
:
:
801
8 index records to index the largest keys
in the 800 second-level records
800
2nd level index
1
2
:
:
1 2 ...
100
:
:
9 901
:
:
:
:
:
800 7901
800 index records with 80,000 keys
choose one of the keys in each index record as the
key of that whole record
1000
8000
Lowest level index is an index to data file and its reference fields are
record addresses in the data file
File Structure
SNU-OOPSLA Lab
39
Multi-level Indexing(3)

How can we insert new keys into the multilevel index?
 The index records in some level might be full
 The several levels of indexes might be rebuilt
 Overflow chain may be helpful, but still ugly

Multi-level index structure is not strong in dynamic data
processing applications

B-tree will give you the right solution!
File Structure
SNU-OOPSLA Lab
40
9.5 B-Trees:Working up from the bottom
B-Trees: Working up from the bottom

Bayer and McCreight, 1972, Acta Infomatica

Build trees upward from the bottom instead of
downward from the top

Each node of B-tree is an index record which
consists of “key-reference” pairs

The order of B-tree: the max number of key-reference pairs

Every index record should have at least half of the order
File Structure
SNU-OOPSLA Lab
41
Sample B-Tree
D P
A C D
A
C
File Structure
D
D
T
I
M P
I
M
T
S
D
P
SNU-OOPSLA Lab
S
T
D
42
9.6 Example of Creating a B-Tree
Splitting & Promoting(1)


Splitting
 Creation of two nodes out of one because the
original node becomes overfull
 Result in the need to promote a key to a higherlevel node to provide an index separating the two
new nodes
Promotion of a key
 Movement of a key from one node into a higherlevel node when split occurs
File Structure
SNU-OOPSLA Lab
43
9.6 Example of Creating a B-Tree
Splitting & Promoting(2)
*A *B * C *D *E * F *G *
Initial leaf of a B-tree with a page size of seven
Insert J key
*A *B * C *D *
*
*
*
*E *F *G*J *
*
*
Splitting the leaf to accommodate the new J key
(continued....)
File Structure
SNU-OOPSLA Lab
44
*
9.6 Example of Creating a B-Tree
Splitting & Promoting(3)
J *
D
*A *B * C *D *
*
*
*
*
*
*E *F *G* J *
*
*
*
*
*
Promotion of the E key into a root node
File Structure
SNU-OOPSLA Lab
45
*
9.6 Example of Creating a B-Tree
Insertion in B-tree(1)
 Input Sequence
: CSDTAMPIBWNGURKEHOLJYQZFXV
C D S
D
T
T
Insertion of C, S, D, T
into the initial page
A C D
S
T
Insertion of A causes node to split and the
largest key in each leaf node(D and T)to be
placed in the root node
File Structure
SNU-OOPSLA Lab
46
9.6 Example of Creating a B-Tree
Insertion in B-tree(2)
D P
A C D
I
T
M P
S
T
M and P are inserted into the rightmost leaf node,
then insertion of I causes it to split
File Structure
SNU-OOPSLA Lab
47
9.6 Example of Creating a B-Tree
Insertion in B-tree(3)
D M P W
A B C D
G
I
M
N P
S
T W
Insertions of B,W,N, and G into leaf nodes causes
another split and the root is now full
File Structure
SNU-OOPSLA Lab
48
9.6 Example of Creating a B-Tree
Insertion in B-tree(4)
D M P W
A B C D
G
I
M
N P
S
T U W
Insertion of U proceeds without incident, but R would have
to be inserted into the rightmost leaf, which is full
File Structure
SNU-OOPSLA Lab
49
9.6 Example of Creating a B-Tree
Insertion in B-tree(5)
P W
D M P
A B C D
G I M
T W
N P
R S T
U W
Insertion of causes the rightmost leaf node to split, insertion into
the root to split and the tree grows to level three
File Structure
SNU-OOPSLA Lab
50
9.6 Example of Creating a B-Tree
Insertion in B-tree(6)
P Z
D
I M P
T Z
Q R S T
A B C D
E G M I
J K L M
U W Y Z
N O P
Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split
File Structure
SNU-OOPSLA Lab
51
9.6 Example of Creating a B-Tree
Insertion in B-tree(7)
I
D G I
A B C D
E F G
P Z
M P
J K L M
H
I
T X Z
Q R S T
N O P
Y Z
U V W X
Insertions of F, X, and V finish the insertion of the alphabet
File Structure
SNU-OOPSLA Lab
52
Insertion in B-trees

Major components of insertion
 Split the node
 Promote the middle key
 Increase the height of the B-tree

Insertion may touch no more than 2 nodes per level

Insertion cost is strictly linear in the height of the tree
File Structure
SNU-OOPSLA Lab
53
9.7 An Object-Oriented Representation of B-Trees
Class BTreeNode(1)



Represent B-Tree nodes in memory
 B-tree is an index file associated with a data file
Specified in btnode.h of Appendix I
The template BTreeNode class based on the
SimpleIndex template class
Public methods
SimpleIndex Class
BTreeNode Class
File Structure
Insert, Remove, Clear, Search
Print, NumKeys
Public methods
Insert, Remove, LargestKey
Split, Pack, Unpack
SNU-OOPSLA Lab
54
9.7 An Object-Oriented Representation of B-Trees
Class BTreeNode(2)

Members

Public methods:





insert : simply calls SimpleIndex::Insert and then check for
overflow
remove a key, split and merge nodes
search : inherited from SimpleIndex class(works perfectly well)
pack/unpack : manage the difference between the memory
and the disk representation of BTreeNode objects
Protected member

store the file address of the node and the minimum and
maximum number of keys
File Structure
SNU-OOPSLA Lab
55
Template <class keyType>
class BTreeNode: public SimpleIndex <keyType>
{public
BTreeNode(int maxKeys, int unique = 1);
int Insert (const keyType key, int recAddr);
int Remove(const keyType key, int recAddr = -1);
int LargestKey ();
int Split (BTreeNode<ketType>*newNode);
int Pack (IOBuffer& buffer);
int Unpack(IOBuffer& buffer);
protected
int MaxBKeys;
int Init();
friend class Btree<keyType>;
}
File Structure
SNU-OOPSLA Lab
56
9.7 An Object-Oriented Representation of B-Trees
Class BTree






Uses in-memory BTreeNode objects
adds the file access portion
enforces the consistent size of the nodes
specified in btree.h of Appendix I
Methods
 Create, Open, Close a B-Tree
 Search, Insert, Remove key-reference pairs
Protected area
 Fetch(transfer nodes from disk to memory)
 Store(transfer nodes back to disk)
 root node, height of the tree, file of index records
 BTNode **Node:used to keep a collection of tree nodes in
memory and reduce disk access
File Structure
SNU-OOPSLA Lab
57
Template <class keyType>
class Btree {public:
Btree(int order, int keySize=sizeof(keyType), int unique=1);
int Open (char * name, int mode);
int Create (char * name, int mode);
int Close ();
int Insert (const keyType key, const int recAddr);
int Remove (const ketType key, const int recAddr = -1);
int Search (const keyType key, const int recAddr = -1);
protected typedef BTreeNode<keyType> BTNode;
BTNode * FindLeaf (const ketType key);
BTNode * Fetch(const int recaddr);
int Store (BTNode *); BTNode Root; int Height; int Order;
BTNode ** Nodes;
RecordFile<BTNode> BtreeFile;
}|
File Structure
SNU-OOPSLA Lab
58
9.8 B-Tree Methods Search, Insert, and Others
Page Structure
2
D M P W
0
3
A B C D
8
G
I
M
KEYCOUNT
N
5
S
P
KEY array
T U W
CHILD array
Page 2
4 D M P W 0
Page 3
3 G I M
3
8
5
content of PAGE 2, 3
File Structure
SNU-OOPSLA Lab
Nil Nil Nil Nil
59
9.8 B-Tree Methods Search, Insert, and Others
Algorithm for Search

Searching procedure

iterative

work in two stages
operating alternatively on entire pages (Class BTree)
and then within pages (Class BTreeNode)

Step1: Loading a page into memeory

Step 2: Searching through a page, looking for the key
along the tree until it reaches the leaf level
File Structure
SNU-OOPSLA Lab
60
9.8 B-Tree Methods Search, Insert, and Others
Search and FindLeaf method
• Specifications of Search and FindLeaf methods(Fig 9.18)
Template <class keyType>
int BTree<keyType>::Search(const keyType key, const int recAddr)
template <class keyType>
BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key)
Search method
recAddr = btree.Search(‘L’)
call FindLeaf(‘L’);
Search key in the leaf node, and then
if key exists, return the data file address of record with key ‘L’
otherwise, return -1
FindLeaf method
Search down to leafNode, beginning of the root
return the address of leafNode
File Structure
SNU-OOPSLA Lab
61
9.8 B-Tree Methods Search, Insert, and Others
Algorithm for Insertion(1)


Observations of Insertion, Splitting, and Promotion

proceed all the way down to the leaf level

after finding the insertion location at the leaf level, the work
proceeds upward from the bottom
Iterative procedure as having three phases

Search to the leaf level, using FindLeaf method

Insertion, overflow detection, and splitting on the
upward path

Creation of a new root node, if the current root was split
File Structure
SNU-OOPSLA Lab
62
9.8 B-Tree Methods Search, Insert, and Others
Algorithm for Insertion(2)


With no redistribution
(Step 1) Locate node on bottom most level in which to insert
record. Location is determined by key search.

(Step 2) If vacant record slot is available, insert the record so that
key sequencing is maintained. Then, update the pointer
associated with the record (Pointer is null for level 0 records).
Then Stop!

(Step 3) If no vacant record slot exists, identify median record. All
records and pointers to the left of the median records are stored
in one node (the original) and those to the right are stored in
another node(the new node).
File Structure
SNU-OOPSLA Lab
63
9.8 B-Tree Methods Search, Insert, and Others
Algorithm for Insertion(3)

(Step 4) If the topmost node was split, create a new topmost node
which contains the median record identified in Step 3, filled with
pointers to the original and split nodes. Update the root node to point
to the new topmost node. Then Stop!

(Step 5) If topmost node was not split, prepare to insert median record
identified in Step 3 and a pointer to the new node (created in Step 3).
Then Goto Step 2.

Note : Step 4 makes B-tree increase in height by 1 level
B-trees have 70% occupancy(like B+-trees) on an average

File Structure
SNU-OOPSLA Lab
64
9.8 B-Tree Methods Search, Insert, and Others
Insertion Example
Insert 3
Insert 1
Insert 19,4,20
3
split
3 4 19 20
4 20
2
0
0
1 3 4
0
Insert 9
Insert 13,16
4 20
0
split
2
13 16 19 20
1
File Structure
1
4 16 20
2
1 3 4
19 20
1 3 4
0
SNU-OOPSLA Lab
9 13 16
1
19 20
3
65
9.8 B-Tree Methods Search, Insert, and Others
Create, Open, and Close


Specified in btree.tc of Appendix I
Method Create


Method Open


writes the empty root node into the file BTreeFile so that its
first record is reserved for that root node
opens BTreeFile and load the root node into memory from
the first record in the file
Method Close

simply stores the node into BTreeFile and close it
File Structure
SNU-OOPSLA Lab
66
9.9 B-Tree Nomenclature
B-Tree Nomenclature

Be aware that terms are not uniform in the literature

Definitions are also quite different

In fact, there are a number of B-tree variations

This text book uses “B tree” for B+ tree by other
books

In this book, “B+ tree” is B+ tree with a linked list of
sorted data blocks
File Structure
SNU-OOPSLA Lab
67
Root
A
B
Data Block
File Structure
C
G
E
F
Data Block
Other Book
Our Book
B-Tree
N/A
H
Data Block
SNU-OOPSLA Lab
I
Data Block
68
Root
A
B
Data Block
File Structure
C
C
G
I
E
F
G
Data Block
Other Book
Our Book
B+-Tree
B-Tree
H
Data Block
SNU-OOPSLA Lab
I
Data Block
69
Other Book
B+-Tree
with
Linked List
Root
A
B
Data Block
File Structure
C
C
G
I
E
F
G
Data Block
Data Block
SNU-OOPSLA Lab
H
Our Book
B+-Tree
I
Data Block
70
Another aspect (node structures)
Homogeneous Trees :B-Tree in other text

Homogeneous trees - leaf nodes and interior nodes have
same structures; Each contains both data pointers and tree
pointers

Average search length less for homogeneous trees,
because some searches may conclude before
reaching a leaf node
File Structure
SNU-OOPSLA Lab.
71
B-Tree in other text
23 pointers to 23 records in data file
37 64
45 53
8 23
1 7
1420
27 36
70 80
38 40
File Structure
85 91
50 52
SNU-OOPSLA Lab.
88
95
60
72
Another Aspect (node structures)
Heterogeneous Trees :B+-Tree in other text
 Heterogeneous
trees - leaf nodes and
interior nodes have different structures
File Structure
SNU-OOPSLA Lab.
73
B+-Tree in other text
23 pointers to 23 records in data file
37 64
45 53
14 23
1 7 8 1420
232736
85 91
6470
9195
808588
373840 455052 5360
File Structure
SNU-OOPSLA Lab.
74
Comparison of B-Tree and B+-Tree in
other text
Topic
B-Tree
Algorithm Complexity Rather complexity
for insertion
Retrieval
efficiency
Storage
efficiency
1-pass structure
creation algorithms
File Structure
less efficiency
(B-tree is tall &
spindle)
slightly more
efficient
(is less space)
rather complex
SNU-OOPSLA Lab.
B+-Tree
more simple
more efficient
B+-tree is short
& bushy
less efficient
(is more space)
simple
75
Comparison of B-Tree and B+-Tree in
other text

Historical Note
 B-tree
: Bayer & McCreight
 B+-tree: Comer
 B*-tree : Knuth, B-trees with 67% minimum occupancy
÷-trees : B+-trees with 67% minimum occupancy
B
File Structure
SNU-OOPSLA Lab.
76
9.10 Formal Definition of B-Tree Properties
Formal Definition of B-Tree Properties
** The properties of a B-tree of order m
1. Every page has a maximum of m descendants
2. Every page, except for the root and the leaves, has
at least ceiling of (m/2) descendants
3. The root has at least two descendants (unless it is a leaf)
4. All the leaves appear on the same level
5. The leaf level forms a complete, ordered index of the
associated data file
File Structure
SNU-OOPSLA Lab
77
9.11 Worst-Case Search Depth
Worst-case Search Depth(1)

Search depth : depth of the tree

Worst case

When every page of the tree has only the minimum #
of descendants

A maximal height with a minimum breadth
File Structure
SNU-OOPSLA Lab
78
9.11 Worst-Case Search Depth
Worst-case Search Depth(2)
B-TREE WITH ORDER m
level
minimum # of
descendants
1(root)
2
3
...
...
d
2
2 x [m/2]
2 x [m/2]2
2 x [m/2]d-1

For a tree with N keys in
its leaves,
N >= 2 x [m/2]d-1
 Upper bound for the depth
of a B-tree ---> d
d <= 1 + log[m/2](N/2)
e.g.. Btree order = 512 keys, given 1,000,000 keys
d <= 3.37
at most 3 depth ( 3 disk I/O )
File Structure
SNU-OOPSLA Lab
79
9.12 Deletion, Merging, and Redistribution
Deletion, Redistribution, and Concatenation

Ensure that the B-tree properties are maintained after
a deletion

Algorithm (with redistribution and cocatenation)
 1. If the key to be deleted is not in a leaf,
swap it with its immediate successor, which is in a leaf
(might be redistributed or concatenated!)

2. Delete the key
File Structure
SNU-OOPSLA Lab
80
9.12 Deletion, Merging, and Redistribution
Deletion Algorithm(Cont’d)

3. If underflow occurs
(the leaf now contains one too few keys),

3.1 If the left or right sibling has more than the
minimum number of keys , redistribute

3.2 Otherwise, concatenate the two leaves and
the median key from the parent into one leaf

3.3 Apply above step 3 to the parent as if it
were deleted
File Structure
SNU-OOPSLA Lab
81
9.12 Deletion, Merging, and Redistribution
Redistribution





Occur when a sibling has more than the minimum # of keys
Idea: Move keys between siblings
Result in a change in the key in the parent page
Does not propagate : strictly local effects
How many keys should be moved?
 Not necessarily fixed

Even distribution is desired
File Structure
SNU-OOPSLA Lab
82
9.12 Deletion, Merging, and Redistribution
Concatenation(merge)





Occur in case of underflow
Combining the two pages and the key from the
parent page ==> make a single full page
Reverse the splitting
Concatenation must involve demotion of keys : may
cause underflow in the parent page
The effects propagate upward
File Structure
SNU-OOPSLA Lab
83
9.12 Deletion, Merging, and Redistribution
e.g. Deletion(1)
Figure A
I
P Z
M P
D G I
A B C D
E F G
File Structure
J K L M
H
I
T X Z
Q R S T
N O P
SNU-OOPSLA Lab
Y Z
U V W X
84
9.12 Deletion, Merging, and Redistribution
e.g. Deletion(2)
I
A B C D
P Z
M P
D G I
A B D
E F G
File Structure
Removal of key C from figure A:
Change occurs only in leaf node
J K L M
H
I
T X Z
Q R S T
N O P
SNU-OOPSLA Lab
Y Z
U V W X
85
9.12 Deletion, Merging, and Redistribution
e.g. Deletion(3)
I
D F
A B C D
E F G
File Structure
Result of deleting P from figure A
: P changes to O in the second
level and the root
O Z
M O
I
J K L M
H
I
T X Z
Q R S T
N O
SNU-OOPSLA Lab
Y Z
U V W X
86
9.12 Deletion, Merging, and Redistribution
e.g. Deletion(4)
I
A B C D
E F G
P Z
M P
I
D
I
File Structure
Result of deleting H from figure A :
Removal of H caused an underflow,
and two leaf nodes were merged
J K L M
T X Z
Q R S T
N O P
SNU-OOPSLA Lab
Y Z
U V W X
87
9.13 Redistribution During Insertion
Redistribution during Insertion

A way to improve storage utilization

A way of avoiding the creation of new pages

Tend to make an efficient B-tree in terms of space
utilization
 Worst case : around 50%
 Average case : 67 ~ 69%
 With redistribution during insertion : over 85%
File Structure
SNU-OOPSLA Lab
88
9.13 Redistribution During Insertion
0
M
DELETE J
(No change) 1
QU
DH
AC
E F
I J K
DELETE M
(Swap with N)
NOP
RS
VWXY Z
0
MN
1
QU
DH
AC
E F
File Structure
I K
MN O P
SNU-OOPSLA Lab
RS
VWXY Z
89
9.13 Redistribution During Insertion
0
DELETE R
(Redistribution)
N
1
QUW
DH
AC
I K
E F
OP RS UV V WXY Z
DELETE A
(Concatenation)
0
N
1
QW
D H
underflow
File Structure
AC
E F
CD EF
I K
SNU-OOPSLA Lab
O P
S U V X Y Z
90
9.13 Redistribution During Insertion
NOW UNDERFLOW
PROPAGATE UPWARD!
0
N
1
underflow
QW
H
CDEF
I K
HEIGHT OF THE TREE
DECREASED
CDEF
File Structure
I K
O P
S U V X Y Z
HN QW
O P
SNU-OOPSLA Lab
S U V
X Y Z
91
9.14 B* Trees
B* Trees



Knuth, 1973, Addison-Wesley
Use redistribution operation during insertion
Perform two-to-three split
 When split, the page has at least one sibling that is also
full
 After split, the pages are about 2/3 full
 The page with at least (ceiling of (2m -1)/3) keys
c.f. remember (ceiling of (m/2)) -1 keys
File Structure
SNU-OOPSLA Lab
92
9.14 B* Trees
B* Tree(Cont’d)
Original tree:
A
A CD F H K
P RS T V X
Two-to-three-split:
FR
Insert B
A BC D
File Structure
H KMP
SNU-OOPSLA Lab
S TV X
93
9.15 Buffering of Pages:Virtual B-Trees
Buffering of B-tree pages: Virtual B-Trees

B-tree size >> main memory (in practice)
 Need buffering pages of B-tree
 Better to keep the root page in the main memory

Buffer replacement algorithm:
LRU + page height weighting factor
 Keep pages of top some levels all the time in main
memory

File Structure
SNU-OOPSLA Lab
94
9.15 Buffering of Pages:Virtual B-Trees
Placement of Information associated with the Key

How to store associated information

In a data and index mingled file
 Once
the key is found, no more disk access
required

In a separate file
 Larger
number of keys per a page
Higher order, shallower tree
File Structure
SNU-OOPSLA Lab
95
9.16 Variable-Length Records and Keys
Variable Length Records and Keys

A B-tree with variable length keys
 No single, fixed order
 A different criterion for over/underflow condition
 Using
max/min number of bytes
(c.f. max/min number of keys)

Key promotion mechanism
 Shortest
variable-length keys are promoted in
preference to longer ones
 Pages with the largest numbers of descendants up
high in the tree
File Structure
SNU-OOPSLA Lab
96
Let’s Review !!!
9.1 Introduction
9.2 Statement of the Problem
9.3 Indexing with Binary Search Trees
: AVL Trees, Paged Binary Trees, Problems with Paged Tress
9.4 Multilevel Indexing
9.5 B-Trees
9.6 Example of Creating a B-Tree
9.7 An Object-Oriented Representation of B-Trees
: Class BTreeNode , Class BTree
File Structure
SNU-OOPSLA Lab
97
Let’s Review !!!
9.8
B-Tree Methods Search, Insert, and Others
9.9
B-Tree Nomenclature
9.10 Formal Definition of B-Tree Properties
9.11 Worst-case Search Depth
9.12 Deletion, Merging, and Redistribution
9.13 Redistribution During Insertion
9.14 B* Trees
9.15 Buffering of Pages : Virtual B-Trees
9.16 Variable-Length Records and Keys
File Structure
SNU-OOPSLA Lab
98