Transcript A=a(r)

Chapter 10




Query Optimization
10.1 Overview
10.2 Estimating Statistics of Expression
10.3 Transformation of Relational Expressions
10.4 Choice of Evaluation Plans
10.1
Overview
Query optimization is the process of selecting the most
efficient query-evaluation plan from among the many strategies
usually possible for processing a given query.
Optimization:
① One aspect of optimization occurs at the relational-algebra level,
where the system attempts to find an expression that is equivalent
to the given expression, but more efficient to execute.
② another aspect is selecting a detailed strategy for processing
the query.
10.1
Overview
Find the names of all customers who have an account at any
branch located in Brooklyn.
∏customer-name ( branch-city=“Brooklyn” (branch
(account depositor) ) )

∏customer-name ((branch-city=“Brooklyn” (branch))
∏customer-name

branch-city=Brooklyn
(account
depositor) )
∏customer-name

branch-city=Brooklyn
branch
branch
account
depositor
account
depositor
10.1
Overview
Given a relational-algebra expression, it is the job of the query
optimizer to come up with a query-evaluation plan that computes
the same result as the given expression, and is the least costly way
of generating the result .
To choose among different query-evaluation plans, the optimizer
has to estimate the cost of each evaluation plan.(disk access)
Generation of query-evaluation plans involves two steps:
① generating expressions that are logically equivalent to the given
expression .
② annotating the resultant expressions in alternative ways to
generate alternative query evaluation plans.
10.2.1
Catalog Information
The DBMS catalog stores the following statistical information
about database relations:
nr, the number of tuples in the relation r.
br, the number of blocks containing tuples of relation r.
lr, the size of a tuple of relation r in bytes.
fr, the blocking factor of relation r(that is, the number of tuples of
relation r that fit into one block)
V(A,r), the number of distinct values that appear in the relation r
for attribute A, this value if the same as the size of ∏A(r) . If A is a
key for relation r,V(A,r) is nr.
nr
br=
fr
If the tuples of relation r are stored together physically in a file.
10.2.2
①
Selection Size Estimation
A=a(r) : nr/V(A,r)
a. uniform distribution of values
b. the value a appears in attribute A of some record of r.
②
A<=v(r) :
Example:
(1) v<min(A,r) :0
(2) v>= max(A,r) : nr
(3) min(A,r) <v <= max(A,r) :
v-min(A,r)
nr
max(A,r)-min(A,r)
.
①
R
a. values are uniformly distributed.
A B
C
a
a
4
b
a
4
c
a
4
a
c
1
b
c
1
c
c
1
B=a(r)
nr= 6 V(B,r)=2
nr/V(A,r)=3
①
C<=3(r)
nr= 6 min(A,r)=1
max(A,r)=4
6*(3-1)/(4-1)=4
10.2.2
Selection Size Estimation
③ complex selections:
a. Conjunction:
1∧ 2∧… ∧ n (r)
For each i we estimate the size of the selection i (r), denoted
by si. Thus, the probability that a tuple in the relation satisfies
selection condition is si /nr. (selectivity of the selection i (r))
R
Assuming that the conditions
A B C a.
B=a ∧ C<=3(r)
are independent of each other.



.
.
.
.
n
n
r
S1 S2 … Sn
n
r
a
a
4
b
a
4
c
a
4
a
c
1
b
c
1
c
c
1

B=a (r) : 3 C<=3 (r) : 4
S .S
.
n
r
1
2
2
nr
=
6*(3*4)/6*6=2
10.2.2
Selection Size Estimation

b. Disjunction: 1 ∨ 2 ∨ … ∨ n (r)
let si /nr denote the probability that a tuple satisfies condition i.
nr (1-(1- S1 )(1- S2 )(1- Sn ))
b.
B=a ∨ C<=3(r)
nr
nr
nr
.
c. Negation:
¬ (r)
nr-  (r)

B=a (r) : 3  C<=3 (r) : 4
R
.
A B
C
nr (1-(1- S1 )(1- S2 ))
nr
nr
a
a
4
=6*(1-(1-3/6)(1-4/6))=5
b
a
4
c
a
4
a
c
1
b
c
1
c
c
1
B=a (r)
B=a (r) : 3
n -  (r) =6-3=3
c.
r
10.2.3
Join Size Estimation
The Cartesian product r×s contains nr * ns tuples, each tuple of
r×s occupies lr+ls bytes. Let r(R) and s(S) be relation:
① If R∩S= , then r
s is the same as r×s,and we can use our
estimation technique for Cartesian products.
② If R∩S is a key for R, then we know that a tuple of s will join with
at most tuple from r. therefore, the number of tuples in r
s is no
greater than the number of tuples in s. if R∩S forms a foreign key
of S, referencing R, the number of tuples in r
s is exactly the
same as the number of tuples in s.
10.2.3
Join Size Estimation
③ when R∩S is a key for neither R nor S. we assume that each
value appears with equal probability.
Consider a tuple t of r, and assume R∩S ={A}. We estimate that
ns
tuple t produces
tuples in r
s.
V(A,s)
nr *ns
V(A,s)
tuples in r
s. If we reverse the roles of r and s in the preceding
estimate, we obtain an estimate of nr *ns
V(A,r)
These two estimates differ if V(A,r) = V(A,s). The lower of the two
estimates is probably the more accurate one.
Considering all the tuples in r, we estimate that there are
10.2.3
Join Size Estimation
Example:
Depositor-schema=(customer-name, account-number)
Customer-schema=(customer-name, customer-street, customer-city)
Consider the expression depositor
customer
Assume the following catalog information about the two relations:
ncustomer=10000 fcustomer=25 bcustomer= ncustomer /fcustomer =400
ndepositor=5000 fdepositor=50 bdepositor= ndepositor /fdepositor =10
V(customer_name,depositor)=2500
Assume that customer-name in depositor is a FK on customer
According to ② the size of the result is ndepositor=5000
10.2.3
Join Size Estimation
Without using information about FK. According to ③
5000*10000
nr* ns
ndepositor*ncustomer
=
= 10000 =5000
V(A,s) V(customer-name,customer)
nr* ns
5000*10000
ndepositor*ncustomer
=
=
=20000
V(A,r)
V(customer-name,depositor)
2500
We choose the lower one 5000.
10.2.4 Size Estimation for Other
Operations
① Projection: the estimated size of a projection of the form ∏A(R) is
V(A,r). Since projection eliminates duplicates.
② Aggregation: the size of a projection of AGF(r) is simply V(A,r).
Since there is one tuple in AGF(r) for each distinct value of A.
③ Set operations: if the two inputs to a set operation are selections
on the same relation, we can rewrite the set operation as
disjunctions, conjunctions, or negations.
Example:1(r)∪2 (r) can be rewritten as1∨2 (r)
1(r)∩2 (r) can be rewritten as1∧2 (r)
1(r)-2 (r) can be rewritten as1∧¬2 (r)
10.2.4 Size Estimation for Other
Operations
if the inputs are not selections on the same relation we estimate
the sizes this way:
a. r∪s the sum of the sizes of r and s
b. r∩s the minimum of the sizes of r and s
c. r-s the same size as r
④ Outer join:
a. r
s the size of r
s+ the size of r
b. r
s the size of r
s+ the size of s
c. r
s the size of r
s+ the size of r+the size of s
10.2.5 Estimation of Number of
Distinct Values
For selections, the number of distinct values of an attribute A in the
result of a selection, V(A,
can be estimated in these ways:
), (r)
① If the selection condition  forces A to take on a specified value
(e.g.,A=3), V(A,  (r)) =1

② If  forces A to take on one of a specified set of values (A=1∨A=3
 (r)) is set to the number of specified values.
∨A=4), then V(A,

③ If the selection condition  is of the form A op v, V(A, (r)) is
estimated to be V(A, r)*s, where s is the selectivity of the selection.
④ In all other cases of selections, we assume that the distribution
of A values is independent of the distribution of the values on which
selection conditions are specified, and use an approximate estimate
of min(V(A,r), n (r) ).
10.3.1 Equivalence Rules
An equivalence rule says that expressions of two forms are
equivalent.
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections. (a cascade of )

1∧2
(E)= 1 (2(E))

2. Selection operations are commutative.
 ( (E))= ( (E))
1
2
2
1
3. Only the final operations in a sequence of projection operations
are needed, the others can be omitted. (a cascade of ∏ )
∏L1(∏L2 (…(∏Ln(E))…)) =∏L1(E)
10.3.1 Equivalence Rules
4. Selections can be combined with Cartesian products and theta
joins. a.
 (E1×E2))= E1
 E2

b. 
1
2 E2))=
(E1
E1
1∧2 E2
5. Theta-join operations are commutative.
E1
E2=
E2
E1
6. a. Natural-join operations are associative.
(E1
E2)
E3 =E1
(E2
E3)
b. Theta joins are associative in the following manner:
(E1
1E2)
2∧3E3
=E1
1∧3 (E2
2E3)
10.3.1 Equivalence Rules
7. The selection operation distributes over the theta-join operation
under the following two conditions:
a. It distributes when all the attributes in selection condition 0
involve only the attributes of one of the expressions(say, E1)
being joined.

0
(E1
E2))=(

0
(E1))
E2
b. It distributes when selection condition 1 involve only the
attributes of E1and 2 involve only the attributes of E2

1∧2
(E1
E2)
= (1 (E1))
(

2
(E2))
10.3.1 Equivalence Rules
8. The projection operation distributes over the theta-join operation
under the following conditions:
a. Let L1 and L2 be attributes of E1 and E2, respectively. Suppose
that the join condition  involves only attribute in L1∪L2. Then,
∏L1∪L2(E1
E2))=
(∏L1(E1))
(∏L2(E2))
b. Consider a join E1
E2. Let L1 and L2 be sets of attributes
from E1 and E2, respectively, let L3 be attributes of E1 that are
involved in join condition , but are not in L1∪L2, and let L4 be
attributes of E2 that are involved in join condition , but are not in
L1∪L2. Then,
∏L1∪L2(E1
E2))=∏L1∪L2((∏L1∪L3(E1))
(∏L2∪L4(E2)))
10.3.1 Equivalence Rules
9. The set operations union and intersection are commutative.
E1∪E2 =E2∪E1
E1∩E2 =E2∩E1
10. Set union and intersection are associative.
(E1∪E2)∪E3 =E1∪(E2∪E3)
(E1∩E2)∩E3 =E1∩(E2∩E3)
11. The selection operation distributes over the union, intersection,
and set-difference operations.
P (E1-E2)=P(E1)-P(E2) ∪∩ P (E1-E2) =P(E1)-E2 ∩
12. The projection operation distributes over the union operation.
∏L (E1∪E2)=(∏L1 (E1))∪(∏L(E2))
10.3.2 Examples of
Transformations