indexed file
Download
Report
Transcript indexed file
Unit 12
Index in Database
大量資料存取方法之研究
Approaches to Access/Store Large Data
楊維邦 博士
國立東華大學
資訊管理系教授
6.1 Introduction
The Role of Access Method/Index in DBMS
Query in SQL:
SELECT CUSTOMER. NAME
FROM CUSTOMER, INVOICE
WHERE REGION = 'N.Y.' AND
AMOUNT > 10000 AND
CUSTOMER.C#=INVOICE.C#
DBMS
Language Processor
Internal Form:
P(s (S
SP)
Optimizer
Operator:
SCAN C using region index, create C
SCAN I using amount index, create I
SORT C?and I?on C#
JOIN C?and I?on C#
EXTRACT name field
Calls to Access Method:
OPEN SCAN on C with region index
GET next tuple
.
.
.
Calls to file system:
GET10th to 25th bytes from
block #6 of file #5
Operator Processor
Access Method/Index
File System
database
Wei-Pang Yang, Information Management, NDHU
12-3
The Internal Level
Objectives:
- concern the way the data is actually stored.
- store data on direct access media. e.g. disk.
- minimize the number of disk access (disk I/O).
- disk access is much slower than main storage access time.
Main
Disk
index
CPU
Buffer I/O
index
Wei-Pang Yang, Information Management, NDHU
12-4
The Internal Level (cont.)
Physical database design:
• Process of choosing an appropriate storage representation for a
given database (by DBA). E.g. designing B-tree index or
hashing
• Nontrivial task
• require a good understanding of how the database will be
used.
Logical database design:
S
P
S
P-SP
SP
data
Wei-Pang Yang, Information Management, NDHU
12-5
6.2 Indexing
Indexing: Introduction
Consider the Supplier table, S.
Suppose "Find all suppliers in city xxx" is an important query.
i.e. it is frequency executed.
=> DBA might choose the stored representation as Fig. 6.2.
City-Index (index)
S (indexed file)
Athens
S1
Smith
20
London
London
S2
Jones
10
Paris
London
S3
Blake
30
Paris
Paris
S4
Clark
20
London
Paris
S5
Adams
30
Athens
Fig. 6.2: Indexing the supplier file on CITY.
Wei-Pang Yang, Information Management, NDHU
12-7
Indexing: Introduction (cont.)
Now the DBMS has two possible strategies:
<1> Search S, looking for all records with city = 'xxx'.
<2> Search City-Index for the desired entry.
Advantage:
• speed up retrieval.
• index file is sorted.
• fewer I/O's because index file is smaller.
Disadvantages:
• slow down updates.
• both index and indexed file should be updated when we insert new
tuple.
Wei-Pang Yang, Information Management, NDHU
12-8
Indexing: Multiple Fields
Primary index : index on primary key. <e.g> s#
Secondary index: index on other field. <e.g> city
A given table may have any number of indexes.
CITY-index
Status-index
S
Athens
S1
Smith
20
London
10
London
S2
Jones
10
Paris
20
London
S3
Blake
30
Paris
20
Paris
S4
Clark
20
London
30
Paris
S5
Adams
30
Athens
30
Fig. 6.3: Indexing the supplier file on both CITY and STATUS.
Wei-Pang Yang, Information Management, NDHU
12-9
How Index are used?
Consider:
London
<1> Sequential access :
S (indexed file)
accessed in the sequence defined by
values of the indexed field.
S1 ... ... London
<e.g> Range query : "Find the suppliers whose
S2 ... ... Paris
city begins with a letter in the range L-R."
London
S3 ... ... Paris
Paris
S4 ... ... London
Paris
S5 ... ... Athens
City-Index
Athens
<2> Direct Access :
<e.g> "Find suppliers in London."
<e.g> list query:"Find suppliers whose city
is in London, Paris, and N.Y."
<3> Existence test :
<e.g> "Is there any supplier in London ?"
Note: It can be done from the index alone.
Wei-Pang Yang, Information Management, NDHU
12-10
Indexing on Field Combinations
To construct an index on the basis of values of two or more fields.
<e.g.> City/Status-Index
S
Athens/30
S1
Smith
20
London
London /20
S2
Jones
10
Paris
London /20
S3
Blake
30
Paris
Paris /10
S4
Clark
20
London
Paris/30
S5
Adams
30 Athens
Query: “Find suppliers in Paris with status 30.”
- on city/status index: a single scan of a single index.
- on two separate indexes: two index scan => still difficult. (Fig. 6.3)
Wei-Pang Yang, Information Management, NDHU
12-11
Dense V.S. Nondense Indexing
Assume the Supplier file (S) is clustered on S# .
Index (dense)
S
page1
S1...
S1 S2 S3 S4 S5 S6
page3
page2
S2...
S4...
S3...
S5...
S6...
City_index
S1
S3
S5
A L L P P
Index (nondense)
Wei-Pang Yang, Information Management, NDHU
12-12
Dense V.S. Nondense Indexing (cont.)
Nondense index: not contain an entry for every record in the
indexed file.
• retrieval steps:
<1> scan the index (nondense) to get page # , say p.
<2> retrieve page p and scan it in main storage.
• advantages:
<1> occupy less storage than a corresponding dense index
<2> quicker to scan.
• disadvantages: can not perform existence test via index alone.
Note: At most only one nondense index can be constructed.
(why?)
Clustering: logical sequence = physical sequence
Wei-Pang Yang, Information Management, NDHU
12-13
B-tree
Introduction:
• is a particular type of multi-level (or tree structured) index.
• proposed by Bayer and McCreight in 1972.
• the commonest storage structure of all in modern DBMS.
Definition: (from Horowitz "Data Structure")
A B-tree T of order m is an m-way search tree, such that
<1> the root node has at least 2 children.
<2> non-leaf nodes have at least [m/2] children.
<3> all leave nodes are at the same level.
Goal: maintain balance of index tree by dynamically
restructuring the tree as updates proceed.
Wei-Pang Yang, Information Management, NDHU
12-14
+
B+-tree (Knuth's variation)
50
82
index set
12
32
58
70
89
94
(nondense)
6
8 12
15 18 32
35 40 50
51 52 58
60 62 70
71 78 82
83 85 89
91 93 94
96 97 99
Sequence set
- index set: provides fast direct access to the sequential set
and thus to the data too.
- sequence set: provides fast sequential access to the indexed data.
(with pointers
to data records)
(dense or nondense)
Other variations: B*-tree, B'-tree,...
Wei-Pang Yang, Information Management, NDHU
12-15