Transcript CS2007Ch14

COSC 2007
Data Structures II
Chapter 15
External Methods
Topics


Indexing
B tree



2
Insertion
deletion
B+ tree
External Data Structure

Data structures are not always stored in
the computer memory




Sometimes, we need to store, maintain
and perform operations on our data
structures entirely on disk

3
Volatile
Has a limited capacity
Fast, which makes it relatively expensive
Called external data structures
External Data Structure

Problems: disks are much slower than
memory



4
Disk access time usually measured in milliseconds
Memory access time measured in nanoseconds
So the same data structures that work
well in memory may be really awful on
disk
External Data Structure

Two types of files

Sequential files
Access to records done in a strictly sequential
manner
 Searching a file using sequential access takes
O(n) where n is the number of records in file to
be read


Random files

5
Access to records done strictly by a key look up
mechanism
A Look At External Storage
Indexing

In order to gain fast random access to
records in block, maintain index
structure

7
Index on largest/smallest key value
Indexing

An index is much like an index in a book



Thus, an indexed file consists of two parts


8
In a book, an index provides a way to quickly look up info
on a particular topic by giving you a page number which
you then use to go directly to the info you need
In an Indexed file, the index accepts a key value and gives
you back the disk address of a block of data containing the
data record with that key
The index
The actual file data
Indexing An External File

An index (or index file)


Used to locate items in an external data file
Contains an index record for each record in the data file
Indexing An External File

An index record has two parts



A key contains the same value as the search key of its
corresponding record in the data file
A pointer shows the number of the block in the data file
that contains the data record
Advantages of an index file



An index file can often be manipulated with fewer block
accesses than would be needed to manipulate the data
file
Data records do not need to be shifted during insertions
and deletions
Allows multiple indexing
Indexing An External File

A simple scheme for organizing the index file

Store index records sequentially
B-Trees




12
Almost all file systems on almost all computers use
B-Trees to keep track of which portions of which files
are in which disk sectors.
B-Trees are an example of multiway trees.
In multiway trees, nodes can have multiple data
elements (in contrast to one for a binary tree node).
Each node in a B-Tree can represent possibly many
subtrees.
s
2-3 Trees

A 2-node, which has two children


Must contain a single data item whose search key si
greater than the left child’s and less than the right
child’s
Must contain two data items whose search keys
satisfy certain condition
A leaf node contain either one of two data items
S
L
13
>s
A 3-node, which has three children


<S
<S
>S, <L
>L
Nodes
Figure 15-11
a) A node with two children; b) a node with three children; c) a node with m
children
m-Way Trees



An m-way tree is a search tree in which each node
can have from zero to m subtrees.
m is defined as the order of the tree.
In a non-empty m-way tree:






Each node has 0 to m subtrees.
Given a node with k<m subtrees, the node contains k subtrees
(some of which may be null) and k-1 data entries.
The keys are ordered, key1<=key2<=key3<=….<=keyk-1.
The key values in the first subtree are less than the key values in
the first entry.
A binary search tree is an m-way tree of order ?.
A 2-3 tree is an m-way tree of order ?
15
An m-way tree
K1
K2
K3
Keys
Subtrees
Keys < K1
K1 <=Keys < K2
K2 <=Keys < K3
A 4-way Tree
16
Keys >= K3
B-Trees

A B-Tree is an m-way tree with the following additional
properties:





There are four basic operations for B-Trees:




17
The root is either a leaf or it has 2….m subtrees.
All internal nodes have at least m/2 non-null subtrees and at most
m nonnull subtrees.
All leaf nodes are at the same level; that is, the tree is perfectly
balanced.
A leaf node has at least m/2 -1 and at the most m-1 entries.
insert (add)
delete (remove)
traverse
search
A B-tree of Order 5* (m=5)
42
Node with
minimum
entries (2) 16 21
Root
Four keys, five
subtrees
58 76 81 93
Node with
maximum
entries (4)
11 14
17 19 20
21 22 23 24
45 52
63 65 74
78 79
*Min # of subtrees is 3 and max is 5;
*Min # of entries is 2 and max is 4
18
85 87
94 97
B-Tree Search



19
Search in a B-tree is a generalization of search
in a 2-3 tree.
Perform a binary search on the keys in the
current node. If the search key is found, then
return the record. If the current node is a leaf
node and the key is not found, then report an
unsuccessful search.
Otherwise, follow the proper branch and
repeat the process.
Insertion


B-tree insertion takes place at a leaf node.
Steps: locate the leaf node for the data being
inserted.


if node is not full (max no. of entries) then insert data in
sequence in the node.
When leaf node is full, we have an overflow
condition.




Insert the element anyway (temporary violate tree conditions)
Split node into two nodes
Each new node contains half the data
middle entry is promoted to the parent (which may in turn
become full!)
 B-trees grow in a balanced fashion from the bottom
20
Follow Through An Example



Given a B-Tree structure of order m=5.
Insert 11, 21, 14, 78, and 97.
Suppose I now add the following keys to the tree: 85, 74, 63,
42, 45, 57.
21
B-tree Deletion

Deletion is done similarly



If the number of items in a leaf falls below the minimum,
adopt an item from a neighboring leaf
If the number of items in the neighboring leaves are also
minimum, combine two leaves. Their parent will then
lose a child and it may need to be combined with its
neighbor
This combination process should be recursively
executed up the tree until:


22
Getting to the root
A parent has more than the minimum number of children
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
B-Trees
The steps for deleting 73
B+ Trees


B-tree only (effectively) gives you random
access to data
B+ tree gives you the ability to access data
sequentially as well




27
Internal nodes do not store records, only key values to guide the
search.
Leaf nodes store records or pointers to the records.
A leaf node has a pointer to the next sibling node. This allows
for sequential processing.
An internal node with 3 keys has 4 pointers. The 3 keys are the
smallest values in the last 3 nodes pointed to by the 4 pointers.
The first pointer points to nodes with values less than the first
key.
Sample B+-Tree
Los Angeles
Detroit
Baltimore


28
Redwood City
Chicago
Detroit
Los Angeles
Redwood City
B+-tree with n=3
interior nodes: no more than 3 pointers,
but at least 2
SF