File Structures
Download
Report
Transcript File Structures
File Structures
Dale-Marie Wilson, Ph.D.
Basic Concepts
Primary storage
Main memory
Inappropriate for storing database
Volatile
Secondary storage
Physical storage e.g. magnetic disks
Nonvolatile
Cheaper
Basic Concepts
2° storage organized into files
Each file has one or more records
Each record has one or more fields Process
User requests tuple e.g. SG37
DBMS maps logical record to physical record
Physical record moved to DBMS buffers
N.B. Physical record is unit of transfer
between disk and primary storage
Basic Concepts
Physical record typically consists of more than 1 logical
record
Logical record can correspond to more than 1 physical
record
Refer to physical record as blocks and pages
staffNo
lName
position
branchNo Page
SL21
White
manager
B005
SG37
Beech
Assistant
B003
SG14
Ford
Supervisor
B003
SA9
Howe
Assistant
B007
SG5
Brand
Manager
B003
SL41
Lee
Assistant
B005
1
2
Basic Concepts
File organization
Physical arrangement of data in file into records and
pages in 2° storage
Determines order records stored and accessed
Types
Heap (unordered)
• Records place on disk in no specific order
Sequential (ordered)
• Records ordered by value of specific field
Hash
• Records placement determined by hash function
Basic Concepts
Access method
Steps involved in storing and retrieving
records from file
Heap Files
Unordered files
Aka heap files
Simplest organization
Records placed in same order inserted
Linear search for retrieval
Insertion efficient; retrieval not efficient
Deletion process
Relevant page identified
Record marked as deleted
Page rewritten to disk
N.B. deleted record space not reused → performance
deterioration
Best suited for bulk loading data
Ordered Files
Ordered files
Aka sequential files
Sorted on field – ordering field
If ordering field = key → ordering key
Binary search for retrieval
Insertion and deletion problematic
• Need to maintain order of records
Rarely used unless 1° index exists
Hash Files
Hash files
Aka random/direct files
Hash function used to det. page address for storing record
• Chosen to provide most even distribution of records – min. collisions
• Examples:
• Folding – applying arithmetic function to hash field e.g. + 7
• Division-remainder – uses mod function to det. field value
• Each address corresponds to a page/bucket
• Each bucket has slots for multiple records – placed in order of arrival
Base field – hash field
If hash field = key → hash key
Collision
Hash function does not calculate unique address for 2 or more
records
Hash Files
Collision management techniques
Open addressing
Unchained overflow
Chained overflow
Multiple hashing
Collision Management
Open addressing
search performed to locate 1st
available slot
Same procedure for searching for record
Linear
• Record doesn’t exist if empty slot found before
record located
Collision Management
Unchained overflow
Overflow area maintained for collisions
Improves over open addressing by minimizing collisions
Bucket
Staff SA9 record
Staff SL21 record
0
Staff SG37 record
1
Staff SG5 record
Staff SG14 record
2
Bucket
Staff SL41 record
3
4
Collision Management
Chained overflow
Overflow area maintained for collisions
Uses synonym pointer
• Additional field that indicates whether collision occurred
• If collision, contains bucket address of overflow area
Bucket
Staff SA9 record
Staff SL21 record
0
0
Staff SG37 record
0
1
3
2
Staff SG5 record
Staff SG14 record
Bucket
Staff SG7 record
3
4
Collision Management
Multiple hashing
If
collision occurs, new hash function
performed
2nd hash function typically used to place
record in overflow area
Indexes
Index
Data structure that allows DBMS to locate particular
records in file more quickly
Similar to index in book
Main types of indices:
• Primary index
• Index a key field
• Clustering index
• File sequentially ordered on non-key field i.e. more than
record can correspond with index
• Secondary index
• Index defined on non-ordering field of data file
Indexes
File can have:
At
most 1 primary or 1 clustering index
Several secondary indices
Index may be:
Dense
• Index record for every search key value
Sparse
• Index record for some key search values
Indexed Sequential Files
Indexed sequential file
Sorted
data file with primary index
Has:
• Primary storage area
• Separate index
• Overflow area
Multilevel Index
Multilevel index
Index
treated as file and split into smaller
indices
Overcomes problems with large indices that
span several pages
B+ Trees
Search Tree
Used to guide search for a record, given the value of
one of its fields
Two types of Nodes
• Internal Nodes contain Key values and node pointers
• Leaf Nodes contain Key, Record-Pointer pairs
Degree/order
Max # children allowed
B-tree – balanced tree
Depth from root to leaf same for every leaf
B+ Trees
The structure of internal nodes in a B+ tree of order p:
Each internal node is of the form
<P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq > , where q <= p , each Pi is a
tree pointer
Within each internal node, K1 < K2 < ... < Kq-1
For all values of X in the subtree pointed at by Pi , we have
Ki-1 < X < Ki for 1 < i < q , X < Ki for i=q, and Ki-1 < X for i=q
Each internal node has at most p tree pointers
Each internal node, except the root, has at least (p/2) tree
pointers. The root node has at least two tree pointers if it is
an internal node.
An internal node with q pointers, q <= p, has q-1 search
field values.
B+ Trees
B+ Trees
The structure of leaf nodes in a B+ tree of order
p:
Each leaf node is of the form
< <K1,Pr1>, <K2,Pr2>, ..., <Kq-1,Prq-1>, Pnext > ,
where q <= p , each Pri is a data pointer that
points to a record or block of records
Within each internal node, K1 < K2 < ... < Kq-1
Each leaf node, has at least (p/2) values
All leaf nodes are at the same level
The Pnext pointer points to the next leaf node in
the tree
• This give efficient sequential access to data
B+ Trees
B+ Trees
Insertion example for B+ Tree:
When you insert into a leaf node that is
full, you split and pass the rightmost
value up to the parent
When you insert into a full root, the root
splits and a new root is created with the
middle value from the child nodes
Otherwise, values are inserted into
openings at the lowest level
Appendix F
Assignment #7