Overview of Storage and Indexing

Transcript Overview of Storage and Indexing

Overview of Storage and Indexing
Yanlei Diao
UMass Amherst
Feb 13, 2007
Slides Courtesy of R. Ramakrishnan and J. Gehrke
1
DBMS Architecture
Query Parser
Query Rewriter
Query Optimizer
Query Executor
Lock Manager
Access Methods
Log Manager
Buffer Manager
Disk Space Manager
DB
2
Data on External Storage

Disks: Can retrieve random pages at (almost) fixed cost
 But reading several consecutive pages is much faster than reading
them in random order.

Tapes: Can only read pages in sequence
 Slower but cheaper than disks; used for archival storage

Page: Unit of information read from or written to disk
 Size of page: DBMS parameter, 4KB, 8KB

Disk space manager:
 Abstraction: a collection of pages.
 Allocate/de-allocate a page.
 Read/write a page.

Page I/O:
 Pages read from disk and pages written to disk
 Dominant cost of database operations
3
Access Methods

Access methods: routines to manage various disk-based
data structures.
 Files of records
 Various kinds of indexes

File of records:
 Important abstraction of external storage in a DBMS!
 Record id (rid) is sufficient to physically locate a record

Indexes:
 Auxiliary data structures
 Given a value in the index search key field(s), find the record
ids of records with this value
4
Buffer Management

Architecture:
 Data is read into memory for processing
 Data is written to disk for persistent storage
Buffer manager stages pages between external
storage and main memory buffer pool.
 Access method layer makes calls to the buffer
manager.

5
Access Methods
Many alternatives exist, each ideal for some
situations, and not so good for others:



Heap (unorder) files: Suitable when typical access
is a file scan retrieving all records.
Sorted Files: Best if records are retrieved in order
of the sorting attribute(s), or only a `range’ of
records is needed.
Indexes: Data structures to organize records via
trees or hashing.
•
•
Like sorted files, they speed up searches for a subset of
records, based on values in certain (“search key”) fields
Updates are much faster than in sorted files.
6
Indexes

An index on a file speeds up selections on the search key
field(s) for the index.



Any subset of the fields of a relation can be the search key for an
index on the relation.
Search key is not the same as key (minimal set of fields that
uniquely identify a record in a relation).
An index contains a collection of data entries, and
supports efficient retrieval of all data entries k* with a
given key value k.
 Data entry (in an index) versus data record (in a file).
 Given a data entry k*, we can find record(s) with the key k in at
most one disk I/O. (Details soon …)
7
B+ Tree Indexes
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)

Leaf pages contain data entries:
• Leaf pages are chained using prev & next pointers
• Data entries are sorted by the search key value
8
B+ Tree Indexes
Non-leaf
Pages
Leaf
Pages
(Sorted by search key)

Non-leaf pages have index entries; only used to direct searches
index entry
P0
K 1
P1
K 2
P 2
K m Pm
9
Example B+ Tree
Data entries in leaf
nodes are sorted.
Root
17
Entries <= 17
5
2*



3*
Entries > 17
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
33* 34* 38* 39*
Equality selection: find 28*? 29*?
Range selection: find all > 15* and < 30*
Insert/delete: Find the data entry in a leaf, then change it.
 Need to adjust the parent sometimes.
 And the change sometimes bubbles up the tree.
10
Hash-Based Indexes
Good for equality selections.
 A hash index is a collection of
buckets.

 Bucket = primary page plus zero
or more overflow pages.
Search key
value k
h
1 2 3 … … … … … … N-1
 Buckets contain data entries.

Hashing function h:
h(k) = bucket of data entries that
may contain the search key value k.
 No need for “index entries” in this scheme.
11
Alternatives for Data Entry k* in Index

In a data entry k* we can store:
 Alt1: Data record with key value k
 Alt2: <k, rid of data record with search key value k>
 Alt3: <k, list of rids of data records with search key k>

Choice of an alternative for data entries is
orthogonal to an indexing technique used.
 Indexing techniques: B+ tree, hashing
 Every indexing technique can be combined with any
alternative.
12
Alternative 1 for Data Entries

Alternative 1:

Index structure itself is a file organization for data
records (instead of a Heap or sorted file).

At most one index on a given collection of data records
can use Alternative 1.
•

Otherwise, data records are duplicated, leading to redundant
storage and potential inconsistency.
If data records are very large, # of pages containing
data entries is high.
•
Size of the index structure to direct searches (e.g., non-leaf
nodes in B+ tree) can also be large.
13
Alternatives 2, 3 for Data Entries

Alternatives 2 and 3:

Data entries, <k, rid(s)>, typically much smaller than
data records.
•
•

Alternative 3 more compact than Alternative 2
•

Index structure used to direct search is much smaller than
with Alternative 1.
So, better than Alternative 1 with large data records,
especially if search keys are small.
In the presence of multiple K* entries!
Alternative 3 leads to variable sized data entries,
even if search keys are of fixed length.
•
Variable sizes records/data entries are hard to manage.
14
Index Classification

Recall: Key? Primary key? Candidate key?

Primary index vs. secondary index:
 If search key contains the primary key, then called
primary index.
 Other indexes are called secondary indexes.

Unique index: Search key contains a candidate
key.

No data entries can have the same search key value.
15
Index Classification (Contd.)

Clustered index: order of data records is the same
as, or `close to’, order of (sorted) data entries.





Alternative 1 implies clustered
Alternatives 2 and 3 are clustered only if data records
are sorted on the search key field.
A data file can be clustered on at most one search key.
Retrieving data records: fast, with one or a few I/Os.
Why?
Unclustered index:


Multiple unclustered indexes on a data file.
Retrieving data records: can be slow, toughing many
pages.
16
Clustered vs. Unclustered Indexes
CLUSTERED
Index entries
direct search for
data entries
Data entries
UNCLUSTERED
Data entries
(Index File)
(Data file)
Data Records

Data Records
Suppose that Alternative (2) is used for data entries, and data
records are stored in a Heap file.


To build a clustered index, first sort the Heap file (with some free
space on each page for future inserts).
Overflow pages may be needed for inserts. (Thus, order of data
records is `close to’, but not identical to, the sort order.)
17
Index Classification (Contd.)

Dense index: contains (at least) one data entry for
every search key value that appears in a record in the
indexed file
 Alternatives 1, 2, 3

Sparse index: contains one entry for each page of
records in the indexed file.




Sparse index has to be clustered!
Alternative 1?
Alternative 2?
Alternative 3?
18
Cost Model for Our Analysis
We ignore CPU costs, for simplicity:

B: The number of data pages

R: Number of records per page

D: (Average) time to read or write disk page

I/O cost: Measuring # of page I/O’s ignores pre-fetching
of a sequence of pages; thus, approximate I/O cost.

Average-case analysis, based on simplistic assumptions.
 Good enough to show the overall trends!
19
Comparing File Organizations





Heap files (random order; insert at eof)
Sorted files, sorted on <age, sal>
Clustered B+ tree file, Alternative (1),
search key <age, sal>
Unclustered B + tree index on a heap file,
search key <age, sal>
Unclustered hash index on a heap file,
search key <age, sal>
20
Operations to Compare


Scan: Fetch all records from disk
Equality search:
e.g. (age, sal) = (24, 50K)

Range selection:
e.g. (age, sal)  [(22, 20K), (30, 200K)]


Insert a record
Delete a record
21
Assumptions in Our Analysis

Heap Files:


Sorted Files:


Equality selection on key; exactly one match.
Files compacted after deletions.
Indexes:
 Alt (2), (3): size of data entry = 10% size of data record
 Hash: No overflow chains.
•

80% page occupancy => File size = 1.25 data size
B+Tree:
•
•
67% occupancy (typical): implies file size = 1.5 data size
Balanced with fanout F (133 typical) at each non-leaf level
22
Assumptions (contd.)

Scans:
 Leaf nodes of a tree-index are chained.
 Index data entries plus actual file scanned for
unclustered indexes.

Range searches:
 We use tree indexes to restrict the set of data
records fetched, but ignore hash indexes.
23
Cost of Operations
(a) Scan
(b) Equality
(c ) Range
(d) Insert (e) Delete
(1) Heap
BD
0.5BD
BD
2D
(2) Sorted
BD
Dlog 2B
D(log 2 B +
# pgs with
match recs)
(3)
1.5BD
Dlog F 1.5B D(log F 1.5B
Clustered
+ # pages w.
match recs)
(4) Unclust. BD(R+0.15)
D(1 +
D(log F 0.15B
Tree index
log F 0.15B) + # match
recs)
(5) Unclust. BD(R+0.125) 2D
BD
Hash index
Search
+ BD
Search
+D
Search
+BD
Search
+D
Search
+D
Search
+ 2D
Search
+ 2D
Search
+ 2D
Search
+ 2D
 Several assumptions underlie these (rough) estimates!
24
Cost of Operations
(a) Scan
(b) Equality
(c ) Range
(d) Insert (e) Delete
(1) Heap
BD
0.5BD
BD
2D
(2) Sorted
BD
Dlog 2B
Search
+D
Search
+BD
D(log 2 B +
Search
# pgs with
+ BD
match recs)
(3)
1.5BD
Dlog F 1.5B D(log F 1.5B Search Search
Clustered
+ # pages w. + D
+D
match recs)
(4) Unclust. BD(R+0.15)
D(1 +
D(log F 0.15B Search Search
Tree index
log F 0.15B) + # match
+ 2D
+ 2D
recs)
(5)
BD(R+0.125)
2D
BD
Search
# ofUnclust.
leaf pages:
B*0.1/67%=0.15B;
Cost of index I/O:
0.15BD Search
Hash
index
# of data
entries on a leaf page: R*10*0.67=6.7R; Cost of data I/O:+0.15B*6.7R*D=BDR
2D
+ 2D
Key:
unclustered
B+tree,
one I/O per data entry!
#
of data
entry pages:
B*0.1/80%=0.125B;
Cost of index I/O: 0.125BD
Several
assumptions
underlie
these
(rough)
estimates!
# of data
entries
on a hash
page: R*10*0.80=8R;
Cost
of data
I/O: 0.125B*8R*D=BDR
Key: unclustered hash index, one I/O per data entry!
25
Summary
Many alternative access methods exist, each
appropriate in some situation.
 Index is a collection of data entries plus a way
to quickly find entries with given key values.
 If selection queries are frequent, building an
index is important.



Hash-based indexes only good for equality search.
Tree-based indexes best for range search; also good
for equality search.
26
Summary (Contd.)

Data entries can be actual data records, <key,
rid> pairs, or <key, rid-list> pairs.

Choice orthogonal to indexing technique used to
locate data entries with a given key value.
Can have several indexes on a given file of
data records, each with a different search key.
 Indexes can be classified as clustered vs.
unclustered, primary vs. secondary, etc.
Differences have important consequences for
utility/performance.

27
Questions
28

Overview of Storage and Indexing

Transcript Overview of Storage and Indexing

Directory