Indexing Techniques
Download
Report
Transcript Indexing Techniques
Indexing Techniques
Storage Technology: Topic 4
Introduction to Database Systems
1
Indexes
An index on a collection of records speeds up
selections on the search key fields.
– Any subset of the fields of a record can be the
search key for an index on the collection.
An index is a collection of index entries.
– Retrieve all entries k* with key value k
– Retrieve all entries k* between two key values
– Retrieve entries in search key order
Introduction to Database Systems
2
Alternatives for Data Entry k* in Index
Three alternatives:
Data record with search key value k
issue : how much data repetition?
Issue: is this simply a fancy file format?
<k, rid of data record with search key value k>
<k, list of rids of data records with search key k>
Our focus: alternative 2.
– Examples of indexing techniques: B+ trees, hashbased structures
Introduction to Database Systems
3
Index Classification: Clustering
Clustered vs. unclustered: If order of data records
is the same as, or ``close to’’, order of data
entries, then called clustered index.
– At most one independent clustered index.
– Cost of retrieving data through index varies greatly
based on whether index is clustered or not! Why?
– Usually, clustering desired for sorted access.
Introduction to Database Systems
4
Clustered vs. Unclustered Index
Title: es_f52.fig
Creator: /s/transfig-3.1.1/exe/fig2dev Version 3.1 Patchlevel 1
CreationDate: Wed Oct 11 19:11:29 1995
Introduction to Database Systems
5
Sparse Clustering
Dense vs. Sparse: If
there is at least one data
entry per search key
value (in some data
record), then dense.
Title: l3_f1.fig
Creator: /s/transfig-3.1.1/exe/fig2dev Version 3.1
CreationDate: Wed Sep 6 17:49:58 1995
– Every sparse index is
clustered!
Introduction to Database Systems
6
Primary/Secondary Indexes
Definition 1: Primary == Clustered
Definition 2: Primary == search key contains
primary key of the relation
We will use Definition 2
Introduction to Database Systems
7
Tree-Structured Indexing
Tree-structured indexing techniques support
both range searches and equality searches
``Find all students with gpa > 3.0’’
– If data is in sorted file, use binary search.
Simple idea: Create an `index’ file.
Can do binary search on (smaller) index file!
Introduction to Database Systems
8
ISAM
Index file may still be quite large. But we can
apply the idea repeatedly!
Leaf pages contain data entries.
Introduction to Database Systems
9
Comments on ISAM
File creation: Leaf pages allocated sequentially,
sorted by search key; then index pages
allocated, then space for overflow pages.
Index entries: <search key value, page id>; they
`direct’ search for data entries, which are in leaf pages.
Search: Start at root; use key comparisons to go to leaf.
Cost log F N ; F = # entries/index pg, N = # leaf pgs
Insert: Find leaf data entry belongs to, and put it there.
Delete: Find and remove from leaf; if empty overflow
page, de-allocate.
Static tree structure: inserts/deletes affect only leaf pages.
Introduction to Database Systems
10
Example ISAM Tree
Each node can hold 2 entries; no need for
`next-leaf-page’ pointers. (Why?)
Introduction to Database Systems
11
After Inserting 23*, 48*, 41*, 42* ...
Introduction to Database Systems
12
... Then Deleting 42*, 51*, 97*
Note that 51* appears in index levels, but not in leaf!
Introduction to Database Systems
13