Chap9. The B+ Tree Family and Indexed Sequential File Access

Download Report

Transcript Chap9. The B+ Tree Family and Indexed Sequential File Access

File Structures by Folk, Zoellick, and Ricarrdi
Chap.10 Indexed Sequential File
Access and Prefix B+ Trees
서울대학교 컴퓨터공학부
객체지향시스템연구실
SNU-OOPSLA-LAB
교수 김 형 주
File Structures
SNU-OOPSLA Lab.
1
Chapter Objectives






Introduce indexed sequential files
Describe operations on a sequence set of blocks that maintains records
in order by key
Show how an index set can be built on top of the sequence set to
produce an indexed sequential file structure
Introduce the use of a B-tree to maintain the index set, thereby
introducing B+ trees and simple prefix B+ trees
Illustrate how the B-tree index in a simple prefix B+ tree can be
of variable order, holding a variable number of separators
Compare the strengths and weakness of B+ trees, simple prefix
B+ trees, and B-trees
File Structures
SNU-OOPSLA Lab.
2
Contents










10.1
10.2
10.3
10.5
10.6
10.7
10.8
10.9
10.10
10.11
Indexed Sequential Access
Maintaining a Sequence Set
Adding a Simple Index to the Sequence Set
The Contents of the Index: Separators Instead of Keys
The Simple Prefix B+ Tree Maintenance
Index Set Block size
Internal Structure of the Index Set Blocks: A variable-order B-Tree
Loading a Simple Prefix B+ Tree
B+ Trees
B-Trees, B+ Trees, and Simple Prefix B+ Trees in Perspective
File Structures
SNU-OOPSLA Lab.
3
10.1 Indexed Sequential Access

Two alternative views


indexed : records are indexed by keys
 no good for sequential processing
sequential : records can be accessed sequentially
 not good for access, insert, delete records in random order

In chap 9, we see B tree and now we want derive Indexed
+ Sequential ==> B+ tree with help of the idea of the
sequence set

Sequential file ==> Indexed Sequential file ==> B+ tree

Indexed-Sequential file = Indexed Sequential Access Method (ISAM)
File Structures
SNU-OOPSLA Lab.
4
Overview : ISAM File
R
a
61 
b
e
11 20
C D
secondary memory
101 
10 20 50 61
d
1 3 10
A B A
main memory
f
30 40 45
D C A
part description records
primary key
PART #
PART-Type
g
51 55 57
A D B
50
D
c
h
65 70 101
E B C
60
B
i
120150
A D
61
A
Example : Indexed sequential structure (when using overflow chain)
File Structures
SNU-OOPSLA Lab.
5
Overview : ISAM File (2)

Compared with ordered relative file
 Ordered
on a key, like ordered relative file
 Can
be accessed by an index, structure that
contains information on where a record with a
given key is located (usually intermingled with
blocks of records)
 Tree
search of an index replaces binary search of
ordered relative files
File Structures
SNU-OOPSLA Lab.
6
Indexed Sequential Files

Block types



Index
Block
Index Block
Primary Data Block
Overflow Data Block
Data
Block
File Structures
Data
Block
Data
Block
...
SNU-OOPSLA Lab.
Overflow Overflow Overflow
Data
Data
Data
Block
Block
Block
7
Indexed Sequential Files :Retrieval

Retrieve parts_file where part# = 60



Primary Key search : nodes R,a,b,g accessed
3 primary block access, 1 overflow block accessed
Retrieve parts_file where part# = 101 and part_type =
C (overqualified)



Primary Key search : nodes R,a,c,h accessed
3 primary block accesses
Block “access”es are really block fetches. The blocks may
be in main memory buffers so that actual block accesses
aren’t performed
File Structures
SNU-OOPSLA Lab.
8
Indexed Sequential Files : Retrieval(2)

Retrieve part_file where part#= 101 or
part_type = C
 Scan
6
: node R,d,e,f,g,h,I accessed
primary block “accesses”
 overflow
File Structures
block “accesses”
SNU-OOPSLA Lab.
9
1
2
3
R
a
61 
b
c
101 
10 20 50 61
d
1 3 10
A B A
e
11 20
C D
f
30 40 45
D C A
g
51 55 57
A D B
50
D
h
65 70 101
E B C
60
B
i
120150
A D
61
A
Retrieval of Indexed sequential structure
File Structures
SNU-OOPSLA Lab.
10
Indexed Sequential Files : Insertion



(Step 1) Locate data level node via key search in
which to insert record
(Step 2) Determine if record is to be inserted into
primary block or overflow in order to maintain primary
key order sequence of records
(Step 3a) If record is to be placed in primary block
and block is not full, shift all records with highervalued primary keys to the right and place new record
into vacated slot. STOP.
File Structures
SNU-OOPSLA Lab.
11
Indexed Sequential Files : Insertion(2)

(Step 3b) If record is to be placed in primary block
and block is full, place record of the block with
highest valued primary key so that it is the first record
on the overflow chain (move one record to the
overflow chain) . Primary block is now not full. Go to
Step 3a.

(Step 4) If record is to be placed in overflow chain,
place record in appropriate position on overflow chain
so that primary key sequencing is maintained. STOP.
File Structures
SNU-OOPSLA Lab.
12
insert i 120 150
A
insert 130
D
Yields(step 3a)
E
insert 180
A
Yields(step 4)
C
insert 110
120 130 150
F
E
D
110 120 130
F
insert 170 Yields(step 4)
G
D
120 130 150
A
Yields(step 3b)
E
A
E
110 120 130
F
A
E
180
C
150
180
D
C
150
170
180
D
G
C
Example : Insertion
File Structures
SNU-OOPSLA Lab.
13
Indexed Sequential Files : Deletion

(Step 1) Locate record to delete by primary key
search

(Step 2) If record is in primary block, free its slot and
shift all records in the block with higher-valued
primary keys to the left. STOP

(Step 3) If record is in overflow, remove it from
overflow chain. STOP
File Structures
SNU-OOPSLA Lab.
14
remove 120
yields
A
F
remove 150 yields
D
110 130
E
110 130
F
E
150
170
180
D
G
C
170
180
G
C
Example : Deletion
File Structures
SNU-OOPSLA Lab.
15
Indexed Sequential Files : Update

(Step 1) Locate record to update by primary key search

(Step 2) If primary key was not altered, simply replace stored
copy of record with the updated copy. STOP.

(Step 3) If primary key was altered, delete(remove) the located
record. Insert updated record just as if were a new record. STOP.
File Structures
SNU-OOPSLA Lab.
16
Indexed Sequential Files :
Reorganization

Reading records out of old file in the primary key
order

Building new indexed sequential structure with no
records in overflow. (file creation)

Reorganization is really hectic !!!

Definitions

Loading Factor = average number of records per node

Initial Loading Factor = Loading Factor when file is created
File Structures
SNU-OOPSLA Lab.
17
main memory
secondary memory
45 70 
3 11 30 45
1 3
A B
120

10 11
20 30
40 45
10 120
150
A C
D D
C A
C A
D
51 57 61 70
50 51
D A
55 57
D B
60 61
45
B A
65 70
E B
Example : Reorganization
File Structures
SNU-OOPSLA Lab.
18
Indexed Sequential Files : Creation

(Step 1) Using a specified initial loading factor LF, pack LF
records per node and create the data level of the new indexed
sequential file structure. (Last node on data level will have from
1 to LF records in it)

(Step 2) Build consecutive levels of index nodes until a level is
reached where there is only a single node. The root node is
created and is placed on the next higher level blocks of index
are to be packed as full as possible. Stop.
File Structures
SNU-OOPSLA Lab.
19
10.2 Maintaining a Sequence Set
10.2 Maintaining a Sequence Set

A sequence set (similar terms: ordered file, sequential set)

a set of records in physical order by key
Sequence set + Simple Index ===> Simple Prefix B+ Tree

The Use of Blocks


We want to rule out sorting and resorting of the sequence set
 insertion of records into block : overflow -> split
 deletion of records : underflow -> redistribution, concatenation
 costs for avoidance of sorting
 more space overhead (internal fragmentation in a block)
-> redistribution in place of splitting, two-to-three splitting
 the maximum guaranteed extent of physical sequentiality is
within a block
-> choice of block size
File Structures
SNU-OOPSLA Lab.
20
10.2 Maintaining a Sequence Set
Block splitting & concatenation(1)
Block1
ADAMS...BAIRD...BIXBY...BOONE...
Block2
BYNUM...CARSON...COLE...DAVIS...
Block3
DENVER...ELLIS...
(a)Initial blocked sequence set
Block1
ADAMS...BAIRD...BIXBY...BOONE...
Block2
BYNUM...CARSON...CARTER...
Block3
DENVER...ELLIS...
Block4
COLE...DAVIS...
File Structures
(b)Sequence set after insertion of CARTER record
- block 2 splits, and the contents are divided
between blocks 2 and 4
SNU-OOPSLA Lab.
21
(continued....)
10.2 Maintaining a Sequence Set
Block splitting & concatenation(2)
Block1
ADAMS...BAIRD...BIXBY...BOONE...
Block2
BYNUM...CARSON...CARTER...
Block3
Block4
Available
for use
COLE...DENVER...ELLIS...
(c)Sequence set after deletion of DAVIS record
- block 4 is less than half full, so it is concatenated
with block3
File Structures
SNU-OOPSLA Lab.
22
10.2 Maintaining a Sequence Set
Issue: Choice of Block Size



Block : basic unit for I/O
The maximum guaranteed extent of physical sequentiality
Two considerations

several blocks should be in RAM at once

e.g. for split or concatenation, at least two blocks in RAM
reading/writing a block should not be very long



Cluster :- the minimum number of sectors allocated at a time
- the minimum size of a file
Reasonable suggestion: block size == cluster size

can access a block without seeking within a cluster
File Structures
SNU-OOPSLA Lab.
23
10.3 Adding a Simple Index to the Sequence Set(1)

An efficient way to locate some specific block containing a
particular record, given the record’s key


build index records containing the key for the last record in a block
Possible Index Structures

simple index
binary search of the index
 works well while the entire index is in RAM


B+ tree

B-tree index + a sequence set with actual records
File Structures
SNU-OOPSLA Lab.
24
Sequence of blocks
ADAMS
-BERNE
BOLEN
-CAGE
1
2
CAMP
-DUTTON
3
EMBRY
-EVANS
FABER
-FOLK
4
5
FOLKS
-GADDIS
6
Simple index
Key
BERNE
CAGE
DUTTON
EVANS
FOLK
GADDIS
File Structures
Block Number
1
2
3
4
5
6
SNU-OOPSLA Lab.
25
10.4 The Content of the Index :Separators Instead of Keys

Need not to have actual keys in the index set

Our real need is separators

Separator - distinguishes between 2 blocks

among many candidates, shortest separator is preferable

there is not always a unique shortest separator
File Structures
SNU-OOPSLA Lab.
26
Separators between blocks in the sequence set
Separators:
BO
ADAMS
-BERNE
1
CAM
BOLEN
-CAGE
2
E
CAMP
-DUTTON
3
F
FOLKS
EMBRY
-EVANS
FABER
-FOLK
4
5
FOLKS
-GADDIS
6
A list of potential separators
CAMP
-DUTTON
File Structures
DUTU
DVXGHSJF
DZ
E
EBQX
ELEEMOSYNARY
SNU-OOPSLA Lab.
EMBRY
-EVANS
27
10.5 The Simple Prefix B+ Tree

Index like B-tree + blocks of sequential sets

The use of simple prefixes


prefixes of the keys rather than actual keys

contains shortest separators

N separators -> N+1 children
Properties of B+ tree

B-tree like Index

Sequential data set

Indexed-sequential file
File Structures
SNU-OOPSLA Lab.
28
A B-tree index set for the sequence set,
forming a simple prefix B+ tree
E
Index
set
BO
ADAMS
-BERNE
1
File Structures
CAM
BOLEN
-CAGE
2
F
CAMP
-DUTTON
3
FOLKS
EMBRY
-EVANS
FABER
-FOLK
4
5
SNU-OOPSLA Lab.
FOLKS
-GADDIS
6
29
10.6 Simple Prefix B+ Tree Maintenance (1)

Changes localized to single blocks in the sequence set

deletion without concatenation, redistribution


e.g. delete EMBRY, FOLKS
insertion without splitting

e.g. insert EATON
File Structures
SNU-OOPSLA Lab.
30
Deletion of the EMBRY and FOLKS
from the sequence set
E
BO
ADAMS
-BERNE
1
File Structures
CAM
BOLEN
-CAGE
2
F
CAMP
-DUTTON
3
FOLKS
ERVIN
-EVANS
FABER
-FOLK
4
5
SNU-OOPSLA Lab.
FROST
-GADDIS
6
31
10.6 Simple Prefix B+ Tree Maintenance(2)

Changes involving multiple blocks in the sequence set



split, concatenation : propagate to index set
change the number of blocks in the sequence set
 change the number of separators
 change the index set
insertion with splitting


e.g. overflow in block1
split
block1, block7 with separator AY
deletion with concatenation/redistribution

e.g. underflow in block2 concatenation
File Structures
SNU-OOPSLA Lab.
block2, block3
32
An insertion into block 1 causes a split and
the consequent addition of block 7
BO
AY
ADAMS
-AVERY
1
AYERS
-BERNE
7
File Structures
E
CAM
BOLEN
-CAGE
2
F
CAMP
-DUTTON
FOLKS
ERVIN
-EVANS
FABER
-FOLK
4
5
3
SNU-OOPSLA Lab.
FROST
-GADDIS
6
33
A deletion from block 2 causes underflow and
the consequent concatenation of blocks 2 and 3
E
AY
ADAMS
-AVERY
1
File Structures
BO
AYERS
-BERNE
7
F
BOLEN
-DUTTON
2
FOLKS
ERVIN
-EVANS
FABER
-FOLK
4
5
SNU-OOPSLA Lab.
FROST
-GADDIS
6
34
Bottom up procedure to handle changes
** insert/delete in the sequence set as if there is no B-tree
index set
if blocks are split
a new separator must be inserted into the index set
if blocks are concatenated
a separator must be removed from the index set
if records are redistributed between blocks
the value of a separator in the index set must be changed
else
no propagation to index set
File Structures
SNU-OOPSLA Lab.
35
10.7 Index Set Block Size


size of an index node for the index set
== size of a data block in the sequence set
Reasons for using a common block size



the best size for sequence set is usually the best for the
index set
a common block size makes it easier to implement a
buffering scheme
the index set blocks and sequence set blocks are often
mingled within the same file

to avoid seeking between separate files while accessing the
simple prefix B+ tree
File Structures
SNU-OOPSLA Lab.
36
10.8 Internal Structure of Index Set Blocks:
A variable-order B-tree


Variable-length shortest separator

possibility of packing them into a node

separator index (fixed length) : means of performing binary
searches on a list of variable-length entities
A simple prefix B+ tree with a variable order

not maximum order -> not minimum depth

decisions about when to split, concatenate, or redistribute become
more complicated
File Structures
SNU-OOPSLA Lab.
37
separators
As, Ba, Bro, C, Ch, Cra, Dele, Edi, Err, Fa, File
AsBaBroCChCraDeleEdiErrFaFile
00 02 04 07 08 10 13 17 20 23 25
Variable-length separators and corresponding index
Separator count
Total length of separators
11 28
AsBaBroCChCraDeleEdiErrFaFile
Separators
00 02 04 07 08 10 13 17 20 23 25
Index to separators
B00 B01 ..... B10 B11
Relative block
numbers
Structure of an index set block
File Structures
SNU-OOPSLA Lab.
38
10.9 Loading a Simple Prefix B+ Tree(1)

One way is successive insertions and splits

The other way is using separate loading process
 working from a sorted file and then
 place the records into sequence set block
 if one block is full
determine the separator and insert it into the index set
block
 place the records into new sequence set block

File Structures
SNU-OOPSLA Lab.
39
10.9 Loading Simple Prefix B+ Tree(2)

Advantages to using a separate loading process
 the output can be written sequentially
 simple than succcessive insert & split
 performance during loading
can load 100% utilization (c.f. insert & split produces
blocks between 67~80% full)
 creating a degree of spatial locality

File Structures
SNU-OOPSLA Lab.
40
10.10 B+ Trees

Contains copies of actual keys

cf. simple prefix B+ tree : separator
ALWAYS/ASPECT/BETTER
00 06 12
Next separator: CATCH
Next sequence
set block:
ACCESS
-ALSO
File Structures
ALWAYS
-ASK
ASPECT
-BEST
SNU-OOPSLA Lab.
BETTER
-CAST
CATCH
-CHECK
41
10.11 B-Tree, B+ Tree and Simple Prefix B+Tree
in Perspective

Shared characteristics






Paged index structures : broad and shallow
Height-balanced
Growing from bottom-up
Possible to obtain greater storage efficiency through twothree block splitting, concatenation, redistribution
Can be implemented as virtual tree structures
Can be adapted for variable-length records
File Structures
SNU-OOPSLA Lab.
42
B-Trees


General Characteristics

Information can be found at any level of the B-tree

B-tree take up less space than B+ tree (B+ tree
additional space)
has
Ordered sequential access

Through in-order traversal of the tree(virtual tree is
necessary)

Separated record files(B-tree has only pointers) are not
workable
File Structures
SNU-OOPSLA Lab.
43
B+ Trees


General Characteristics

Separation of index set and sequence set

Separators : copies of keys

Shallower tree than B-tree
Ordered sequential access

Sequence set is truly linear

efficient access to records in order by key
File Structures
SNU-OOPSLA Lab.
44
Simple Prefix B+ Trees

General Characteristics

Separators : smaller than actual keys

Shallower than B+ Trees

Separator compression, variable-length field management
overhead

Ordered sequential access

Sequence set is truly linear (same as B+ Tree)
File Structures
SNU-OOPSLA Lab.
45
Let’s Review !!!










10.1 Indexed Sequential Access
10.2 Maintaining a Sequence Set
10.3 Adding a Simple Index to the Sequence Set
10.5 The Contents of the Index: Separators Instead of Keys
10.6 The Simple Prefix B+ Tree Maintenance
10.7 Index Set Block size
10.8 Internal Structure of the Index Set Blocks: A variable-order B-Tree
10.9 Loading a Simple Prefix B+ Tree
10.10 B+ Trees
10.11 B-Trees, B+ Trees, and Simple Prefix B+ Trees in Perspective
File Structures
SNU-OOPSLA Lab.
46