Transcript slides

Overview of Query Evaluation
Chapter 12
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
1
Review

We studied Relational Algebra
 Many equivalent queries, produce same result
 Which expression is most efficient?

We studied file organizations
 Hash files, Sorted files,
 Clustered & Unclustered Indexes
 Sorting, searches, insert, delete
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
2
Problem Definition

SQL declarative language
 It describes the query result, but not how to get it

Relational Algebra describes how to get results
 Many relational algebra queries are equivalent

Question:
 How to choose the right one for an SQL query?

How does a DBMS proceed to execute a query?
 It generates a variety of possible plans (relational
algebra queries), and finds the cheapest one to
execute.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
3
Review: Relational Algebra
Relational Algebra

Selection ( s ) Selects a subset of rows from relation (horizontal).

Projection ( p ) Retains only wanted columns from relation (vertical).

Cross-product (  ) Allows us to combine two relations.





Set-difference ( — ) Tuples in r1, but not in r2.
Union (  ) Tuples in r1 and/or in r2.
Intersection ()
Join (
)
Division ( / )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
4
System Catalogs

For each index:


For each relation:





name, file name, file structure (e.g., Heap file)
attribute name and type, for each attribute
index name, for each index
integrity constraints
For each view:


structure (e.g., B+ tree) and search key fields
view name and definition
Plus statistics, authorization, buffer pool size, etc.
*
Catalogs are themselves stored as relations!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
5
Statistics and Catalogs
Need : Information about relations and indexes.
 Catalogs typically contain at least:





# tuples (NTuples)
# pages (NPages) for each relation.
# distinct key values (NKeys) and NPages for each index.
Index height, low/high key values (Low/High) for each
tree index.
Catalogs updated periodically.


Updating whenever many data changes occurred;
Lots of approximation anyway, so slight inconsistency ok.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
6
Overview of Query Evaluation
 SQL
queries are translated into extended
relational algebra:
 Query Plan Reasoning:
• Tree of operators
• With a processing algorithm per operator
• Several algorithms for each operator
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
7
Query Plan Evaluation
sname
SELECT sname
FROM Reserves R, Sailor S
WHERE R.sid = S.sid and
R.bid = 100 and S.rating > 5
bid=100
sid=sid
Reserves
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
rating > 5
Sailors
8
Overview of Query Evaluation

Plan: Tree of R.A. ops, with choice of alg for each operator

Two main issues in query optimization:

For a given query, what plans are considered?
• Algorithm to search plan space for cheapest (estimated) plan.

How is the cost of a plan estimated?
•
Cost models based on I/O estimates
Ideally: Want to find best plan.
 Practically: Avoid worst plans!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
9
Some Common Techniques

Algorithms for evaluating relational operators
use some simple ideas extensively:
 Indexing: Can use WHERE conditions to retrieve
small set of tuples (selections, joins)
 Iteration: Scan all tuple.
• Sometimes, faster to scan all tuples even if there is an index.
• Sometimes, we can scan the data entries in an index instead of
the table itself.
 Partitioning: By using sorting or hashing, we can
partition the input tuples and replace an expensive
operation by similar operations on smaller inputs.
* Watch for these techniques as we discuss query evaluation!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
10
Access Paths

An access path


A method of retrieving tuples from a table
Method:


File scan,
Index that matches a selection (in the query)

The choice of access path contributes significantly
to cost of relational operator.

Most selective access path: An index or file scan that we estimate
will require the fewest page I/Os.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
11
Access Path Method: Matching

A tree index matches (a conjunction of) terms that
involve only attributes in a prefix of search key.

Example : Given tree index on <a, b, c>
 selection a=5 AND b=3 ?
 selection a=5 AND b>6 ?
 selection b=3 ?
 selection a=5 AND c>2 ?
 selection b=6 AND c=2 ?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
12
Access Path Method: Matching

A hash index matches (a conjunction of) terms that
has a term attribute = value for every attribute in
search key of index.

Example : Given hash index on <a, b, c>
 selection a=5 AND b=3 AND c =5 ?
 selection c=5 AND b=3 AND a=2 ?
 selection a > 5 AND b=3 and c =5 ?
 selection c = 5 AND b = 6 ?
 selection a = 5 ?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
13
Query Evaluation – In a Nutshell

Choose an Access Path to get at each table

Evaluate different algorithms for each
relational operator

Choose the order to apply the relational
operators

Interrelate the above
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
14
Evaluation of Relational
Operators
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
15
Selection

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
16
Selection

Case 2: No Index, Sorted Data on R.attr

What is the most selective path?
 Binary search for first tuple meeting the criterion.
 Scan R for all satisfied tuples.
• Left? Right? Both?
 Cost: O(log2M)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
17
Selection Using B+ tree index
Case 3: B+ tree Index
 What is the most selective path?

 Cost I (find qualifying data entries)
+ Cost II (retrieve records) :
• Cost I: depth of B+ tree, usually 2-3 I/Os.
• Cost II:
• clustered index: 1 I/O on average (depends on distribution)
• unclustered index: up to one I/O per qualifying tuple.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
18
Example : B+-Tree Index for Selection
SELECT *
FROM Reserves R
WHERE R.rname < ‘C%’


Assume uniform distribution of names, about
10% of tuples qualify (100 pages, 10,000 tuples).
Clustered index:
•

little more than 100 I/Os;
Unclustered index :
•
up to 10,000 I/Os!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
19
Selection: Improve Unclustered B+-Tree
Key idea: Avoid retrieving the same page multiple
times.
1. Find qualifying data entries in index.
2. Sort rid’s of data records to be retrieved.
3. Fetch rids in order.
However, # of such pages likely to be still higher than
with clustering.

Use of unclustered index for a range selection
could be expensive. Simpler if just scan data file.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
20
Selection Using Hash Index
 Hash
index is good for equality selection.
 Cost
= Cost I (retrieve index bucket page)
+ Cost II (retrieving qualifying tuples from
R)
• Cost I is 1 I/O (on average)
• Cost II could be up to one I/O per satisfying tuple.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
21
A Note on Complex Selections
(day<8/9/94 AND rname=‘Paul’) OR bid=5 OR sid=3
Selection conditions are first converted to conjunctive
normal form (CNF):
(day<8/9/94 OR bid=5 OR sid=3 ) AND
(rname=‘Paul’ OR bid=5 OR sid=3)
 We only discuss case with no ORs;

 See text if you are curious about the general case.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
22
Conjunction

A condition with several predicates combined by
conjunction (AND):

Example : day<8/9/94 AND bid=5 AND sid=3.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
23
Selection with Conjunction
First approach: (utilizing single index)
 Find the most selective access path, retrieve tuples using it.


Apply any remaining terms that don’t match the index:



To reduce the number of tuples retrieved
To discard some retrieved tuples
This does not affect number of tuples/pages fetched.
Example : Consider day<8/9/94 AND bid=5 AND sid=3.




A B+ tree index on day can be used;
then bid=5 and sid=3 must be checked for each retrieved tuple.
Hash index on <bid, sid> could be used
day<8/9/94 must then be checked on fly.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
24
Selections with Conjunction
Second approach (utilizing multiple index)
 Assuming 2 or more matching indexes that use Alternatives
(2) or (3) for data entries.




Get sets of rids of data records using each matching index.
Then intersect these sets of rids
Retrieve records and apply any remaining terms.
Example : Consider day<8/9/94 AND bid=5 AND sid=3.





A B+ tree index I on day and an index II on sid, both Alternative (2).
Retrieve rids of records satisfying day<8/9/94 using index I,
Retrieve rids of recs satisfying sid=3 using Index II
Intersect rids
Retrieve records and check bid=5.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
25
Evaluation of Projection
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
29
Projection

SELECT DISTINCT
FROM
R.sid, R.bid
Reserves R
The expensive part is removing duplicates.
 DBMSes don’t remove duplicates unless the keyword
DISTINCT is specified in a query.

Sorting: Sort on <sid, bid> and remove duplicates.
 Can optimize this by dropping unwanted information
while sorting.

Hashing: Hash on <sid, bid> to create partitions.
 Load partitions into memory one at a time, build inmemory hash structure, and eliminate duplicates.

Index: with both R.sid and R.bid in the search key,
may be cheaper to sort data entries!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
30
Projection: Sorting Based
SELECT DISTINCT
FROM

R.sid, R.bid
Reserves R
Modify Pass 0 of external sort to eliminate unwanted
fields.
 Runs of about 2B pages are produced,
 Benefit: Tuples in runs are smaller than input tuples.
• Size ratio depends on # and size of fields that are dropped.

Modify merging passes to eliminate duplicates.
 Number of result tuples smaller than input.
• Difference depends on # of duplicates.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
31
Projection: Sorting Based

Cost:
SELECT DISTINCT
FROM
R.sid, R.bid
Reserves R
 In Pass 0, read original relation: size M
 Write out same number of smaller tuples.
• But # of page << M.
 In merging passes, fewer tuples written out in each pass.

Example:
 Using Reserves, 1000 input pages reduced to 250 in Pass 0 if
size ratio is 0.25
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
32
Projection Based on Hashing

Partitioning phase (with B buffer pages)
 Read R using one input buffer.
 For each tuple, discard unwanted fields, apply hash
function h1 to choose one of B-1 output buffers.
 Result is B-1 partitions of tuples with no unwanted fields
•

2 tuples from different partitions guaranteed to be distinct.
Duplicate elimination phase
 For each partition
• read it and build an in-memory hash table, using hash function h2
(<> h1) on all fields, while discarding duplicates.

If partition does not fit in memory, can apply hash-based
projection algorithm recursively to this partition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
33
Projection Based on Hashing

Cost:
 M + 2T, T is the pages of all partitions
• T << M
 Why?
• For partitioning,
• We read entire R, hence M pages
• Write out each tuple new tuple, but with fewer fields.
Hence, T pages.
• Read all partitions in next phase, additional T pages.
• This assumes that each partition fits in memory.
• Not always true!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
34
Discussion of Projection

Sort-based approach is the standard;
 Better handling of skew and result is sorted.

If an index on the relation contains all wanted
attributes in its search key, can do index-only scan.


Apply projection techniques to data entries (much smaller!)
If an ordered (i.e., tree) index contains all wanted
attributes as prefix of search key, can do even better:

Retrieve data entries in order (index-only scan), discard
unwanted fields, compare adjacent tuples to check for
duplicates.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
35
Evaluation of Joins
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
36
Equality Joins With One Join Column
SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
37
Typical Choices for Evaluating Joins

Nested Loops Join
 Simple Nested Loops Join: Tuple-oriented
 Simple Nested Loops Join: Page-oriented
 Block Nested Loops Join
• Not covered!
 Index Nested Loops Join
Sort Merge Join
 Hash Join

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
38
Simple Nested Loops Join

R
S
foreach tuple r in R do
foreach tuple s in S do
if ri == sj then add <r, s> to result
Tuple-oriented Algorithm :
For each tuple in outer relation R, we scan inner
relation S.
 Cost :

 Scan of outer + for each tuple of outer, scan of inner
relation.
 Cost = M + (pR * M) * N
 Cost = 1000 + 100*1000*500 I/Os = 50,001,000 I/Os.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
39

Simple Nested Loops Join: Cost
R
S

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
40
Index Nested Loops Join
foreach tuple r in R do
foreach tuple s in S where ri == sj do
add <r, s> to result
An index on join column of one relation
 Which is the inner relation?

 Use S as inner and exploit the index.

Cost:
 Scan the outer relation R
 For each R tuple, sum cost of finding matching S tuples
 Cost: M + ( (M*pR) * cost of finding matching S tuples)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
45
Index Nested Loops Join: Cost

For each R tuple, cost of probing S index is on
average:
 about 1.2 pages for hash index,
 2 - 4 pages for B+ tree.

Cost of retrieving S tuples (assuming Alt. (2) or (3)
for data entries) depends on clustering:


Clustered : 1 I/O (typical),
Unclustered: up to 1 I/O per matching S tuple.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
46
Examples of Index Nested Loops

Hash-index (Alt. 2) on sid of Sailors (as inner):

Scan Reserves:
•
•

For each Reserves tuple:
•
•
•

1000 page I/Os,
There are 100*1000 tuples = 100,000 tuples.
1.2 I/Os to get data entry in index (this is page index),
plus 1 I/O to get (the exactly one) matching Sailors tuple.
Total: 100,000 * (1.2 + 1 ) = 220,000 I/Os.
In total, we have:
•
1000 I/Os + 220,000 I/Os = 221,000 I/Os
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
47
Examples of Index Nested Loops

Hash-index (Alt. 2) on sid of Reserves (as inner):

Scan Sailors:
•
•

500 page I/Os,
80*500 tuples = 40,000 tuples.
For each Sailors tuple:
•
1.2 I/Os to find index page with data entries for Reserves, plus,
•
Assuming uniform distribution:
2.5 reservations per sailor (100,000 / 40,000).
Cost of retrieving them is
•
•
•

Clustered index: 1 page
Unclustered index: up to 2.5 I/Os
Total:
•
Between 500 + 40,000 * (1.2 + 1) and 500 + 40,000* (1.2 + 2.5).
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
48
Summary Index Nested Loops Join


Assume:
M Pages in R, pR tuples per page,
N Pages in S, pS tuples per page,
B Buffer Pages.
Nested Loops Join
 Simple Nested Loops Join
• Tuple-oriented: M + pR * M * N
• Page-oriented: M + M * N
• Smaller relation as outer can give some improvement.
 Block Nested Loops Join
• M + N*M/(B-2) 
• Dividing buffer evenly between R and S helps.
 Index Nested Loops Join
•
•

M + ( (M*pR) * cost of finding matching S tuples)
cost of finding matching S tuples = cost of Index Probe + cost of
retrieving the tuples
Discussion.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
49
Sort-Merge Join (R i=j S)
Recall External Sort!
(1). Sort R and S on the join column.
(2). Scan R and S to do a “merge” on join column
(3). Output result tuples.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
50
Example of Sort-Merge Join
Recall External Sort!
sid
22
28
31
44
58
sname rating age
dustin
7
45.0
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
sid
28
28
31
31
31
58
bid
103
103
101
102
101
103
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
rname
guppy
yuppy
dustin
lubber
lubber
dustin
51
Sort-Merge Join (R i=j S)
(1). Sort R and S on the join column.
(2). Scan R and S to do a “merge” on join col.
(3). Output result tuples.

Merge on Join Column:
•
•
•
Advance scan of R until current R-tuple >= current S tuple,
then advance scan of S until current S-tuple >= current R tuple;
do this until current R tuple = current S tuple.
•
At this point, all R tuples with same value in Ri (current R group) and
all S tuples with same value in Sj (current S group) match;
So output <r, s> for all pairs of such tuples.
•
Then resume scanning R and S (as above)
•
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
52
Join: Sort-Merge (R i=j S)
 Note:
 R is scanned once;
 Each S group is scanned once per
matching R tuple.
 Multiple scans of an S group are likely to
find needed pages in buffer.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
53
Cost of Sort-Merge Join
sid
22
28
31
44
58
sname rating age
dustin
7
45.0
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
sid
28
28
31
31
31
58
bid
103
103
101
102
101
103
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
rname
guppy
yuppy
dustin
lubber
lubber
dustin
Cost of sort-merge :
 Sort R
 Sort S
 Merge R and S
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
54
Example of Sort-Merge Join
sid
22
28
31
44
58
sname rating age
dustin
7
45.0
yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
sid
28
28
31
31
31
58
bid
103
103
101
102
101
103
day
12/4/96
11/3/96
10/10/96
10/12/96
10/11/96
11/12/96
rname
guppy
yuppy
dustin
lubber
lubber
dustin
Discussion
 Best case: ?
 Worst case: ?
 Average case ?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
55
Cost of Sort-Merge Join

Best Case Cost:


(M+N)
Already sorted.
The cost of scanning, M+N
R
S

Worst Case Cost: M log M + N log N + (M+N)
Many pages in R in same partition. ( Worst, all of them).
The pages for this partition in S don’t fit into RAM.
Re-scan S is needed. Multiple scan S is expensive!
Could the scan cost be M*N?

Note: Guarantee M+N if key-FK join, or no duplicates.


Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
56
Cost of Sort-Merge Join

Average Cost:
 ~ In practice, roughly linear in M and N
 So, O ( M log M + N log N + (M+N) )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
57
Comparison with Sort-Merge Join
Average Cost: O(M log M + N log N + (M+N))
 Assume B in {35, 100, 300}; and
R = 1,000 pages, S = 500 pages


Sort-Merge Join
 both R and S can be sorted in 2 passes (why?),
 log M = log N = 2
 total join cost: 2*2*1000 + 2*2*500 + (1000 + 500) =
7,500.

Block Nested Loops Join: 2,500 ~ 15,000
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
58
Hash-Join
Original
Relation
Partition both
relations using same
hash fn h:
R tuples in partition i
will only match S
tuples in partition i.
OUTPUT
1
1
2
INPUT
...
Partitions
2
hash
function
h
B-1
B-1
Disk
B main memory buffers
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Disk
61
Hash-Join
Partitions
of R & S

Read in a partition
of R, hash it using
h2 (<> h!). Scan the
matching partition
of S, search for
matches.
Join Result
hash
fn
Hash table for partition
Ri (k < B-1 pages)
h2
h2
Input buffer
for Si
Disk
Output
buffer
B main memory buffers
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
Disk
62
Cost of Hash-Join

In partitioning phase, read+write both relations:
 2(M+N).

In matching phase, read both relations:
 M+N.

Total : 3(M+N)

E.g., total of 4,500 I/Os in our running example.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
63
Observation on Hash-Join

Memory Requirement: When is total cost 3 (M + N)?
 Partition fit into available memory?
 Assuming B buffer pages. #partitions k <= B-1 (why?), (to
min size of each partition, we choose #partitions = B – 1)
 Assuming uniformly sized partitions, and maximizing k,
we get:
•
•
•
k= B-1, and size of partition = M/(B-1) (M is the number of pages of R)
in-memory hash table to speed up the matching of tuples, a little
more memory is needed: f * M/(B-1) (You can assume f = 1, unless
explicitly specified)
f is fudge factor used to capture the small increase in size between the
partition and a hash table for partition.
 Probing phase, one for S, one for output, B>= f*M/(B-1)+2
for hash join to perform well (i.e., cost of hash join = 3 (M +
N)). In other words, (B – 1) (B – 2) >= f * M
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
64
Observation on Hash Join

Overflow
 If the hash function does not partition uniformly,
one or more R partitions may not fit in memory.
 Significantly degrade the performance.
 Can apply hash-join technique recursively to do
the join of this overflow R-partition with
corresponding S-partition.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
65
Hash-Join vs. Sort-Merge Join

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
69
General Join Conditions

Equalities over several attributes
 (e.g., R.sid=S.sid AND R.rname=S.sname):
 INL-Join : build index on <sid, sname> (if S is inner); or use existing
indexes on sid or sname.
 SM-Join and H-Join : sort/partition on combination of the two join
columns.
could be M*N (very unlikely!)

Inequality conditions
 (e.g., R.rname < S.sname):
 INL-Join: need (clustered!) B+ tree index.
• Range probes on inner; # matches likely to be much higher than for equality
joins.


Hash Join, Sort Merge Join not applicable.
Block NL quite likely to be the best join method here.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
70
Relational Algebra Equivalences
Allow us to choose different join orders and to
`push’ selections and projections ahead of joins.
 Selections: s c1 ...  cn  R   s c1  . . . s cn  R  
(Cascade)

s c1 s c 2  R   s c 2 s c1  R 
(Commute)
(Cascade)

Joins: R  (S  T)
 (R S)  T
(R  S)  (S  R)
+ Show that:
R  (S  T)
(Associative)
(Commute)
 (T  R)  S
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
71
More Equivalences

Commute projection and selection:
 p attr (sCond(R))  sCond (p attr (R)),
if attr  all attributes in Cond
Selection between attributes of the two arguments of
a cross-product converts cross-product to a join.
 A selection on just attributes of R commutes with
R  S. (i.e., s (R  S)  s (R)  S )
 Similarly, if a projection follows a join R  S, we can
`push’ it by retaining only attributes of R (and S) that
are needed for the join or are kept by the projection.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
72
Summary




There are several alternative evaluation algorithms for each
relational operator.
A query is evaluated by converting it to a tree of operators and
evaluating the operators in the tree.
Must understand query optimization in order to fully
understand the performance impact of a given database design
(relations, indexes) on a workload (set of queries).
Two parts to optimizing a query:

Consider a set of alternative plans.
• Must prune search space; typically, left-deep plans only.

Must estimate cost of each plan that is considered.
• Must estimate size of result and cost for each plan node.
• Key issues: Statistics, indexes, operator implementations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
81