Indexing and Index Tuning
Download
Report
Transcript Indexing and Index Tuning
CS5226
Indexing and Index Tuning
When Change is the Only
Constant
CPU
Memory Speed and Size
Harddisk Speed and Size
Bandwidth
Moore’s Law being
proved...
main memory
cache
cache pages
disk size
disk/memory size
transfer rate
random access
scanning full disk
1969
200 KB
20 KB
20
7.5 MB
40
150 KB/s
50 ms
130 s
2001
200 MB
20 MB
5000
20 GB
100
15 MB/s
5 ms
1300 s
Factor
103
103
<103
3*103
-2.5
102
10
-10
Improvement in
Performance
CPU
(60%/yr)
10000
1000
100
DRAM
(10%/yr)
10
Disk
(5%/yr)
1
1980
2000
Indexing
Single dimensional Indexing
Multi-dimensional Indexing
High-dimensional Indexing
Indexing for advanced applications
Single Record and Range Searches
Single record retrievals
``Find student name whose matric# = 921000Y13’’
Range queries
``Find all students with cap > 3.0’’
Sequentially scanning the file is costly
If data is in sorted file, do binary search to find
first such student, then scan to find others.
cost of binary search can still be quite high.
Indexes
An index on a file speeds up selections on the
search key fields for the index.
Any subset of the fields of a relation can be the
search key for an index on the relation.
Search key is not the same as key (minimal set of
fields that uniquely identify a record in a relation).
e.g., consider Student(matric#, name, addr, cap),
the key is matric#, but the search key can be
matric#, name, addr, cap or any combination of
them.
Simple Index File (Data File Sorted)
Dense Index
10
20
30
40
50
60
70
80
90
100
110
120
Sequential File
10
20
30
40
50
60
70
80
90
100
Simple Index File (Cont)
Sparse Index
10
30
50
70
90
110
130
150
170
190
210
230
Sequential File
10
20
30
40
50
60
70
80
90
100
Simple Index File (Cont)
Sequential File
Sparse 2nd level
10
90
170
250
330
410
490
570
10
30
50
70
90
110
130
150
170
190
210
230
10
20
30
40
50
60
70
80
90
100
Secondary indexes
Sequence
field
• Sparse index
30
20
80
100
30
50
20
70
90
...
80
40
does not make sense!
100
10
90
60
Secondary indexes
Sequence
field
• Dense index
10
50
90
...
sparse
high
level
10
20
30
40
30
50
50
60
70
...
80
40
20
70
100
10
90
60
Conventional indexes
Advantages:
- Simple
- Index is sequential file
good for scans
Disadvantages:
- Inserts expensive, and/or
- Lose sequentiality & balance
Example
Index(sequential)
continuous
10
20
30
33
40
50
60
free space
70
80
90
39
31
35
36
32
38
34
overflow area
(not sequential)
Tree-Structured Indexing
Tree-structured indexing techniques support
both range searches and equality searches
index file may still be quite large. But we
can apply the idea repeatedly!
Data
pages
B+ Tree: The Most Widely Used Index
Height-balanced.
Insert/delete at log F N cost (F = fanout, N = #
leaf pages);
Grow and shrink dynamically.
Minimum 50% occupancy (except for root).
Each node contains d <= m <= 2d entries. The
parameter d is called the order of the tree.
`next-leaf-pointer’ to chain up the leaf nodes.
Data entries at leaf are sorted.
Example B+ Tree
Each node can hold 4 entries (order = 2)
Root
17
5
2
3
24
13
5
7
8
14 16
19 20
22
30
24 27 29
33 34 38 39
Node structure
Non-leaf nodes
index entry
P
0
K
1
P
1
K 2
P
K 2
P
2
K m
Pm
2
K m
Pm
Leaf nodes
P
0
K
1
P
1
Next leaf
node
Searching in B+ Tree
Search begins at root, and key comparisons
direct it to a leaf (as in ISAM).
Search for 5, 15, all data entries >= 24 ...
Root
13
2
3
5
14
16
17
19
20
24
22
30
24
27
29
33
34
38
39
Based on the search for 15*, we know it is not in the tree!
B+-Tree Scalability
Typical order: 100. Typical fill-factor: 67%.
average fanout = 133
Typical capacities (root at Level 1, and has 133
entries):
Level 5: 1334 = 312,900,700 records
Level 4: 1333 =
2,352,637 records
Can often hold top levels in buffer pool:
Level 1 =
1 page =
8 Kbytes
Level 2 =
133 pages =
1 Mbyte
Level 3 = 17,689 pages = 133 MBytes
A Note on `Order’
Order (d) concept replaced by physical space
criterion in practice (`at least half-full’).
Index pages can typically hold many more entries
than leaf pages.
Variable sized records and search keys mean different
nodes will contain different numbers of entries.
Even with fixed length fields, multiple records with the
same search key value (duplicates) can lead to
variable-sized data entries
Inserting a Data Entry into a B+ Tree
Find correct leaf L.
Put data entry onto L.
If L has enough space, done!
Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key.
Insert index entry pointing to L2 into parent of L.
This can happen recursively
To split index node, redistribute entries evenly, but
push up middle key. (Contrast with leaf splits.)
Splits “grow” tree; root split increases height.
Tree growth: gets wider or one level taller at top.
Inserting 7 & 8 into Example B+ Tree
Root
13
2
3
5
7
14
16
17
19
20
24
22
30
24
27 29
33
34
38
39
(Note that 5 is copied up and
continues to appear in the leaf.)
Observe how minimum
occupancy is
guaranteed in both leaf
and index pg splits.
13
5
2
3
5
7
17
8
24
30
Insertion (Cont)
Note
difference
between
copy-up and
push-up; be
sure you
understand
the reasons
for this.
(Note that 17 is pushed up and only
appears once in the index. Contrast
this with a leaf split.)
17
5
13
24
13
5
2
3
30
5
7
17
8
24
30
Example B+ Tree After Inserting 8
Root
17
5
2
3
24
13
5
7
8
14 16
19 20 22
30
24 27 29
33 34 38 39
Notice that root was split, leading to increase in height.
In this example, we can avoid splitting by re-distributing
entries; however, this is usually not done in practice. Why?
Deleting a Data Entry from a B+ Tree
Start at root, find leaf L where entry belongs.
Remove the entry.
If L is at least half-full, done!
If L has only d-1 entries,
Try to re-distribute, borrowing from sibling (adjacent
node with same parent as L).
If re-distribution fails, merge L and sibling.
If merge occurred, must delete entry (pointing to
L or sibling) from parent of L.
Merge could propagate to root, decreasing height.
Example Tree After (Inserting 8, Then)
Deleting 19
Root
17
5
2
3
24
13
5
7
8
14 16
Deleting 19 is easy.
20 22
30
24 27 29
33 34 38 39
Example Tree After Deleting 20 ...
Root
17
5
2
3
27
13
5
7
8
14 16
22 24
30
27 29
33 34 38 39
Deleting 20 is done with re-distribution.
Notice how middle key is copied up.
... And Then Deleting 24
Must merge.
Observe `toss’ of
index entry (on right),
and `pull down’ of
index entry (below).
30
22
27
29
33
38
34
39
Root
5
2
3
5
7
8
13
14
16
17
30
22
27
29
33
34
38
39
Example of Non-leaf Re-distribution
(Delete 24)
22
5
13
17
Root
27
20
22 24
2
3
5
7
8
In contrast to previous
example, can redistribute entry from left
child of root to right
child.
5
2
3
5
7
8
17 18
14 16
13
14 16
Root
30
20
17 18
33 34 38 39
20 21
22
17
27 29
30
20 21
22 27 29
33 34 38 39
After Re-distribution
Intuitively, entries are re-distributed by `pushing
through’ the splitting entry in the parent node.
It suffices to re-distribute index entry with key 20;
we’ve re-distributed 17 as well for illustration.
Root
17
5
2
3
5
7
8
13
14 16
20
17 18
20 21
22
30
22 27 29
33 34 38 39
Index Classification
Primary vs. secondary: If search key contains
primary key, then called primary index.
Unique index: Search key contains a candidate key.
Clustered vs. unclustered: If order of data
records is the same as, or `close to’, order of
data entries, then called clustered index.
A file can be clustered on at most one search key.
Cost of retrieving data records through index varies
greatly based on whether index is clustered or not!
Clustered vs. Unclustered Index
Suppose the data file is unsorted.
To build clustered index, first sort the data file (with some
free space on each page for future inserts).
Overflow pages may be needed for inserts. (Thus, order of
data recs is `close to’, but not identical to, the sort order.)
CLUSTERED
Index entries
direct search for
data entries
Data entries
UNCLUSTERED
Data entries
(Index File)
(Data file)
Data Records
Data Records
Index Classification (Cont.)
Dense vs. Sparse:
If there is at least
one data entry per
search key value (in
some data record),
then dense.
Every sparse index is
clustered!
Sparse indexes are
smaller.
Ashby, 25, 3000
22
Basu, 33, 4003
Bristow, 30, 2007
25
30
Ashby
33
Cass
Cass, 50, 5004
Smith
Daniels, 22, 6003
Jones, 40, 6003
40
44
Smith, 44, 3000
44
50
Tracy, 44, 5004
Sparse Index
on
Name
Data File
Dense Index
on
Age
Index Classification (Cont.)
Composite Search Keys:
Search on a combination of
fields.
Equality query: Every field value
is equal to a constant value. E.g.
wrt <sal,age> index:
age=20 & sal =75
Range query: Some field value is
not a constant. E.g.:
age =20; or age=20 & sal > 10
Data entries in index sorted by
search key to support range
queries.
Lexicographic order, or
Spatial order.
Examples of composite key
indexes using lexicographic order.
11,80
11
12,10
12
12,20
13,75
<age, sal>
10,12
20,12
75,13
name age sal
bob 12
10
cal 11
80
joe 12
20
sue 13
75
12
13
<age>
10
Data records
sorted by name
80,11
<sal, age>
Data entries in index
sorted by <sal,age>
20
75
80
<sal>
Data entries
sorted by <sal>
Summary
Tree-structured indexes are ideal for rangesearches, also good for equality searches.
B+ tree is a dynamic structure.
Inserts/deletes leave tree height-balanced;
log F N cost.
High fanout (F) means depth rarely more than
3 or 4.
Almost always better than maintaining a
sorted file.
Summary (Cont.)
Typically, 67% occupancy on average.
Usually preferable to ISAM, modulo locking
considerations; adjusts to growth gracefully.
If data entries are data records, splits can change rids!
Indexes can be classified as clustered vs.
unclustered, primary vs. secondary, and dense vs.
sparse, simple vs composite
New Database Challenges
More Complex applications. Eg. GIS,
OLAP, Mobile
Hardware Advances: Big and Small
Web and Internet. Eg XML
New Index?
A most effective mechanism to prune the
search
Order of magnitude of difference between
I/O and CPU cost
Increasing data size
Increasing complexity of data and search
Something that
Transcends Time…B+-tree
Success Factors
Robustness
Concurrency
Performance
Scalability
Fundamentals of
Building DBMS
B+-trees forever?
Btree
Can the B+-tree being a single-dimensional
index be used for emerging applications such
as:
Spatial databases
High-dimensional databases
Temporal databases
Main memory databases
String databases
Genomic/sequence databases
...
CS5226
Multi-Dimensional Indexing
What is a Spatial
Database?
A Spatial DBMS is a DBMS
It offers spatial data types/data models/ query
language
Support spatial properties/operations
It supports spatial data types in its
implementation
Support spatial indexing, algorithms for spatial
selection and join based on spatial relationships
Applications
Geographical Information Systems (GIS):
dealing extensively with spatial data. Eg.
Map system, resource management
systems
Computer-aided design and
manufacturing (CAD/CAM): dealing mainly
with surface data. Eg. design systems.
Multimedia databases: storing and
manipulating characteristics of MM
objects.
Spatial Data
Examples of non-spatial
data
Names, zip-codes …
Examples of Spatial data
Census Data
NASA satellites imagery
Weather and climate Data
Rivers, farms, ecological
impact
Medical Imaging
Spatial Databases
Spatial Objects:
Points: spatial location: eg. feature vectors
Lines: set of points: eg. roads, coastal line
Polygons: set of points: eg. Buildings, lakes
Data Types:
Point: a spatial data object with no extension
no size or volume
Region:a spatial object with a location and a
boundary that defines the extension
Spatial Relationships
Topological relationships:
adjacent, within/contain, intersect, disjoint,
etc
Direction relationships:
Above, below, north-of, south-of,etc
Metric relationships:
“distance < 100 km”
And operations to express the
relationships
Spatial Queries
Range queries: “Find all cities within 50
km of Madras?”
Nearest neighbor queries: “Find the 5
cities that are nearest to Madras?”
“Find the 10 images most similar to this
image?”
Spatial join queries: “Find pairs of cities
within 200 km of each other?’
More Examples
Window Range Query: “Find me data
points that satisfy the conditions x1 <A1
< y1, x2 <A2 <y2…?”
Spatial Query: “Find me buildings that are
adjacent to the Railway Stations?”
Nearest Neighbour Query: “Find me the
nearest fire station to Clementi Ave. 3?”
Spatial Representation
Raster model:
Vector model:
Representation of Spatial
Objects
Testing on real objects is expensive
Minimum Bounding Box/Rectangle
How to test if 2-d rectangles intersect?
y2
y1
x1
representation
x2
testing
Query Operation & Spatial
Index
Filter Step:
Select the objects whose mbr satisfies the
spatial predicate
Traverse the index and apply the spatial test on
the mbrs indexed by the index
Output: set of oids (including negatives)
Refinement Step:
Spatial test is done on the actual geometries of
objects whose mbr satisfied the filter step
(output)
Costly operation
Executed only on a limited number of objects
Why spatial index methods
(SAMs)?
B-tree & hash tables
Guarantee the number of I/O operations is
respectively logarithmic and constant with respect
to the collection’s size
Index a collection on a key
Rely on a total order on the key domain, the order
of natural numbers, or the lexicographic order on
strings
There is no such total order for geometric
objects with spatial extent
SAMs were designed to try as much as
possible to preserve spatial object proximity
Approaches to the Design of
SAMs
Space-Based structures:
Partition the embedding Space into rectangular cells
Independent from the distribution of the objects
Objects are mapped to the cells based on some
geometric criterion
Eg: Grid file, Buddy-tree, KDB-tree
Data-Based structures:
Organize by partitioning the set of objects based on
spatial proximity such that each group can be fit into a
page, as opposed to the embedding space
Adapt to the objects’ distribution in the embedding
space
Eg. R-tree, R* tree, R+ tree
Mapping
The R-tree
A leaf entry is a pair (mbr, oid)
A non-leaf node contains an array of node entries
The number of entries is between m (fill-factor)
and M
For each entry (mbr, nodeid) in a non-leaf node
N, mbr is the directory rectangle of a child node
of N, whose page address is nodeid
All leaves are at the same level
An object appears in one, and only one of the
tree leaves
R-trees
A
B
Height balanced tree
A
B
Problem:
Overlap of
covering
rectangles.
Insertion in the R-Tree
Algorithm ChooseSubtree
CS1
[Initialize] Set N to be the root node
CS2
[Leaf check]
If N is a leaf,
return N
else
[Choose subtree]
Choose the entry in N whose rectangle needs least area
enlargement to include the new data. Resolve ties by
choosing the entry with the rectangle of smallest area
end
CS3
[Descend until a leaf is reached]
Set N to be the childnode pointed to by the childpointer of
the chosen entry. Repeat from CS2
Splitting Strategies in the
R-Tree
Three versions
all are designed to minimize area covered by two
covering rectangles resulting from split
Exponential
find the area with global minimum
CPU cost is too high
Quadratic and Linear
find approximation
Quadratic performs much better than linear
Splitting Strategies in the
R-Tree
Algorithm QuadraticSplit
[Divide a set of M+1 index entries into two groups]
QS1
[Pick first entry for each group ]
Invoke PickSeeds to choose two entries, each be first
entry of each group
QS2
[Check if done]
Repeat
DistributeEntry
until
all entries are distributed or one of the two groups has
Mm+1 entries (so that the other group has m entries)
QS3
[Select entry to assign ]
If entries remain, assign them to the other group so that it
has the minimum number m required
Splitting in the R-Tree
Algorithm PickSeeds
[Choose two entries to be the first entries of the groups]
PS1 [Calculate inefficiency of grouping entries
together]
For each pair of entries E1 and E2, compose a
rectangle R including E1 rectangle and E2 rectangle
Calculate d = area(R) - area(E1 rectangle) area(E2 rectangle)
PS2 [Choose the most wasteful pair ]
Choose the pair with the largest d
[the seeds will tend to be small, if the rectangles are of
very different size (and) or the overlap between them
is high]
Splitting in the R-Tree
Algorithm DistributeEntry
[Assign the remaining entries by the criterion of minimum area]
DE1
Invoke PickNext to choose the next entry to be assigned
DE2
Add It to the group whose covering rectangle will have to
be enlarged least to accommodate It. Resolve ties by
adding the entry to the group with the smallest area, then
to the one with the fewer entries, then to either
Algorithm PickNext
[chooses the entry with best area-goodness-value in every
situation]
DE1
For each entry E not yet in a group, calculate d1 = the area
increase required in the covering rectangle of Group 1 to
include E Rectangle. Calculate d2 analogously for Group 2
DE2
Choose the entry with the maximum difference between d1
and d2
Node Splitting R-trees
Node Splitting R-trees
R-trees
Range Query
Insert
Node splitting
Optimization
Coverage
Overlap
Delete
Variants: R+-tree
R*-tree, buddy-tree
The R*-Tree
A variant of R-Tree
Several improvements to the insertion
algorithm
Aim at optimizing
Node overlapping
Area covered by a node
Perimeter of a node’s directory rectangle
Given a fixed area, the shape that minimizes the rectangles
perimeter is the square
Two variants that bring the most significant
improvement
Split Algorithm
Forced Reinsertion Strategy
The R+ Tree
The directory rectangles at a given level do not
overlap
For a point query, a single path is followed from
the root to a leaf; for a region query, subtrees
whose covering mbr intersecting the query region
is traversed
The I/O complexity is bounded by the depth of
the tree
Dead space problem
Index Tuning
Index issues
Indexes may be better or worse than scans
Multi-table joins that run on for hours,
because the wrong indexes are defined
Concurrency control bottlenecks
Indexes that are maintained and never used
3 - Index Tuning
68
Information about
indexes...
Application codes
V$SQLAREA -- look for the one with high
# of executions
INDEX_STATS: meta information about
indexes
HASH_AREA_SIZE
HASH_MULTIBLOCK_IO_COUNT
…home work
Clustered / Non clustered
index
Clustered index
(primary index)
A clustered index on
attribute X co-locates
records whose X values are
near to one another.
Records
Non-clustered index
(secondary index)
A non clustered index
does not constrain table
organization.
There might be several
non-clustered indexes
per table.
Records
Dense / Sparse Index
Sparse index
Dense index
Pointers are associated to
pages
P1
3 - Index Tuning
P2
Pi
Pointers are associated to
records
Non clustered indexes are
dense
record
record
record
71
Index Implementations in
some major DBMS
SQL Server
B+-Tree data structure
Clustered indexes are
sparse
Indexes maintained as
updates/insertions/deletes
are performed
DB2
B+-Tree data structure,
spatial extender for R-tree
Clustered indexes are
dense
Explicit command for index
reorganization
3 - Index Tuning
Oracle
B+-tree, hash, bitmap,
spatial extender for R-Tree
clustered index
Index organized table
(unique/clustered)
Clusters used when
creating tables.
TimesTen (Main-memory
DBMS)
T-tree
72
Types of Queries
Point Query
SELECT balance
FROM accounts
WHERE number = 1023;
Multipoint Query
SELECT balance
FROM accounts
WHERE branchnum = 100;
3 - Index Tuning
Range Query
SELECT number
FROM accounts
WHERE balance > 10000
and balance <= 20000;
Prefix Match Query
SELECT *
FROM employees
WHERE name = ‘J*’ ;
73
More Types of Queries
Extremal Query
Grouping Query
SELECT *
FROM accounts
WHERE balance =
max(select balance from accounts)
Ordering Query
SELECT *
FROM accounts
ORDER BY balance;
3 - Index Tuning
SELECT branchnum, avg(balance)
FROM accounts
GROUP BY branchnum;
Join Query
SELECT distinct branch.adresse
FROM accounts, branch
WHERE
accounts.branchnum =
branch.number
and accounts.balance > 10000;
74
Index Tuning -- data
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
clustered index c on employees(hundreds1)
with fillfactor = 100;
nonclustered index nc on employees (hundreds2);
index nc3 on employees (ssnum, name, hundreds2);
index nc4 on employees (lat, ssnum, name);
1000000 rows ; Cold buffer
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec
(80Mb), 4x18Gb drives (10000RPM), Windows 2000.
Index Tuning -- operations
Operations:
Update:
update employees set name = ‘XXX’ where ssnum = ?;
Insert:
insert into employees values
(1003505,'polo94064',97.48,84.03,4700.55,3987.2);
Multipoint query:
select * from employees where hundreds1= ?;
select * from employees where hundreds2= ?;
Covered query:
select ssnum, name, lat from employees;
Range Query:
select * from employees where long between ? and ?;
Point Query:
select * from employees where ssnum = ?
Clustered Index
clustered
nonclustered
no index
Throughput ratio
1
0.8
0.6
0.4
0.2
0
SQLServer
3 - Index Tuning
Oracle
DB2
Multipoint query that
returns 100 records
out of 1000000.
Cold buffer
Clustered index is
twice as fast as nonclustered index and
orders of magnitude
faster than a scan.
77
Positive Points of
Clustering indexes
If the index is sparse, it has less points -less I/Os
Good for multipoint queries
eg. Looking up names in telephone dir.
Good for equijoin. Why?
Good for range, prefix match, and
ordering queries
Index “Face Lifts”
Throughput
(queries/sec)
SQLServer
100
80
60
40
20
0
No maintenance
Maintenance
0
20
40
60
80
% Increase in Table Size
3 - Index Tuning
100
Index is created with
fillfactor = 100.
Insertions cause page
splits and extra I/O for
each query
Maintenance consists in
dropping and recreating
the index
With maintenance
performance is constant
while performance
degrades significantly if
no maintenance is
79
performed.
Index Maintenance
Throughput
(queries/sec)
Oracle
20
15
10
No
maintenance
5
0
0
20
40
60
80
% Increase in Table Size
3 - Index Tuning
100
In Oracle, clustered index
are approximated by an
index defined on a clustered
table
No automatic physical
reorganization
Index defined with pctfree
=0
Overflow pages cause
performance degradation
80
Covering Index - defined
Select name from employee where
department = “marketing”
Good covering index would be on
(department, name)
Index on (name, department) less useful.
Index on department alone moderately
useful.
3 - Index Tuning
81
Throughput (queries/sec)
Covering Index - impact
70
60
covering
50
covering - not
ordered
non clustering
40
30
20
clustering
10
0
SQLServer
3 - Index Tuning
Covering index performs
better than clustering
index when first
attributes of index are in
the where clause and last
attributes in the select.
When attributes are not
in order then
performance is much
worse.
82
Positive/negative points of
non-clustering indexes
Eliminate the need to access the
underlying table
eg. Index on (A, B, C)
Select B,C From R Where A=5.
Good if each query retrieves significantly
fewer records than there are pages in DB
May not be good for multipoint queries
Examples:
Table T has 50-bytes records and attribute
A has 20 different values which are
uniformly distributed. Page size=4K. Is a
nonclustering index on A any good?
Now the record size is 2Kbytes.
Throughput (queries/sec)
Scan Can Sometimes Win
scan
non clustering
0
5
10
15
% of selected records
3 - Index Tuning
20
25
IBM DB2 v7.1 on
Windows 2000
Range Query
If a query retrieves 10%
of the records or more,
scanning is often better
than using a nonclustering non-covering
index. Crossover > 10%
when records are large or
table is fragmented on
disk – scan cost
increases.
85
Throughput (updates/sec)
Index on Small Tables
18
16
14
12
10
8
6
4
2
0
no index
3 - Index Tuning
index
Small table: 100 records,
i.e., a few pages.
Two concurrent
processes perform
updates (each process
works for 10ms before it
commits)
No index: the table is
scanned for each update.
No concurrent updates.
A clustered index allows
to take advantage of row
locking.
86
Bitmap vs. Hash vs. B+Tree
Settings:
employees(ssnum, name, lat, long, hundreds1,
hundreds2);
create cluster c_hundreds (hundreds2 number(8)) PCTFREE 0;
create cluster c_ssnum(ssnum integer) PCTFREE 0 size 60;
create cluster c_hundreds(hundreds2 number(8)) PCTFREE 0
HASHKEYS 1000 size 600;
create cluster c_ssnum(ssnum integer) PCTFREE 0 HASHKEYS
1000000 SIZE 60;
create bitmap index b on employees (hundreds2);
create bitmap index b2 on employees (ssnum);
1000000 rows ; Cold buffer
Dual Xeon (550MHz,512Kb), 1Gb RAM, Internal RAID controller from Adaptec
(80Mb), 4x18Gb drives (10000RPM), Windows 2000.
Multipoint query: B-Tree,
Hash Tree, Bitmap
Throughput (queries/sec)
Multipoint Queries
25
20
15
10
5
0
B-Tree
3 - Index Tuning
Hash index
Bitmap index
There is an overflow
chain in a hash index
In a clustered B-Tree
index records are on
contiguous pages.
Bitmap is proportional
to size of table and
non-clustered for
record access.
88
B-Tree, Hash Tree, Bitmap
Throughput (queries/sec)
Range Queries
Hash indexes don’t
help when evaluating
range queries
0.5
0.4
0.3
0.2
0.1
0
B-Tree
Hash index
Bitmap index
Throughput(queries/sec)
Point Queries
60
50
40
Hash index
outperforms B-tree
on point queries
30
20
10
0
B-Tree
3 - Index Tuning
hash index
89
Summary
Primary means to reduce search costs
(I/O and CPU)
Properties: robust, concurrent, scalable,
efficient
Most supported indexes: Hash, B+-trees,
bitmap index, and R-trees
Tuning: Usage, Maintenance,
Drop/Rebuild, index locking in buffer...