Transcript L - ELTE

Optimization of Relational
Algebra Expressions
Database I.
Moore's law
• Moore's law is the observation that, over
the history of computing hardware, the number
of transistors on integrated circuits doubles
approximately every two years.
• The capabilities of many digital electronic devices
are strongly linked to Moore's law: processing
speed, memory capacity, sensors and even the
number and size of pixels in digital cameras.
Our objective --1
• On the other hand, the development of other
factors is linear.
• An example of such a factor is the speed with
which the disk moves in hard disk drives.
• For large data the amount of data moved
between the primary and the secondary
storage devices should be minimized.
Our objective --2
• Data is moved in blocks between the main
memory and the hard disk.
• Thus, in other words, in RDBMS computations
the number of blocks involved in I/O
operations should be kept as low as possible.
• This can be achieved by working with as small
transitory relations as possible.
Equivalence of relational algebra
expressions
• In order to reduce the size of the transitory
relations the relational algebra expressions are
rewritten.
• Two relational algebra queries q, q’ are
equivalent if for all database instances I q(I) is
the same as q’(I).
• Here:
– a database instance consists of relations
– q(I) denotes the result of applying q on I.
Example
• Consider relations
– Likes(drinker, beer)
– Frequents(drinker, bar)
• The following two expressions are equivalent
to each other:
– bar(σF.drinker=L.drinkerbeer= Bud (F  L))
– bar(σF.drinker=L.drinker(F  (σbeer=Bud (L)))).
• In most cases the second query can be
evaluated faster.
Optimization algorithm -- sketch
• The original relational algebra expression is
rewritten into another one in which
– the selection operations are accomplished as soon
as possible
– the unnecessary columns are removed afterwards.
• Next, the selection and the subsequent cross
product operators are substituted with the
appropriate join operators.
Our running example
Likes(drinker, beer)
Bar(name, city)
Frequents(drinker, bar)
ΠL. drinker(σL.drinker=F.drinkername=barbeer=Budcity=N.Y. (L  B  F))
Splitting the conditions
• σC1C2 (E) is equivalent with σC1(σC2 (E)).
ΠL. drinker (σ L.drinker=F.drinkername=barbeer=Budcity=N.Y.(L  B  F))
is equivalent with
ΠL. drinker(σL.drinker=F.drinker(σname=bar(σbeer=Bud(σcity=N.Y. (LBF)))))
Expression trees
ΠL.drinker
ΠL.drinker
σL.dinker=F.drinker
σ L.drinker=F.drinkername=bar
 beer=Budcity=N.Y.
σbeer=Bud

σcity=N.Y.

L
B
σname=bar

F

L
B
F
Pulling down the conditions
• σC(E1ΘE2) ≡ (σC(E1))ΘE2, where attr(C) 
attr(E1) and Θ Є {, ⋈}.
• Here
– attr(C), attr(E1) respectively denote the attributes
appearing in condition C and relational algebra
expression E1
– while ≡ denotes the equivalence relation.
ΠL. drinker(σL.drinker=F.drinker(σname=bar(σbeer=Bud(σcity=N.Y. (LBF)))))
is equivalent with
ΠL. drinker(σL.drinker=F.drinker(σname=bar ((σbeer=Bud (L)) (σcity=N.Y. (B))F)))
ΠL.drinker
ΠL.drinker
σL.dinker=F.drinker
σL.dinker=F.drinker
σname=bar
σname=bar
σbeer=Bud


L

L
B

σbeer=Bud
σcity=N.Y.
F
σcity=N.Y.
B
F
Removal of the unnecessary columns
• ΠX(E1 Θ E2) ≡ ΠY(E1) Θ ΠZ(E2), where X = Y  Z,
Y  attr(E1), Z  attr(E2) and Θ Є {, ⋈}.
• ΠX(σC (E)) ≡ ΠX(σF (ΠY(E))), where
Y = attr(C)  X.
ΠL. drinker(σL.drinker=F.drinker(σname=bar ((σbeer=Bud (L)) (σcity=N.Y. (B))F)))
is equivalent with
ΠL. drinker(σL.drinker=F.drinker(σname=bar(
(Πdrinker(σbeer=Bud (L))) (Πbar(σcity=N.Y. (B)))F)))
ΠL.drinker
ΠL.drinker
σL.dinker=F.drinker
σL.dinker=F.drinker
σname=bar
σname=bar



σbeer=Bud
L
σcity=N.Y.
B

Πdrinker
F
σbeer=Bud
L
Πbar
F
σcity=N.Y.
B
Note: the application of extra projections increases the time of
the evaluation of the query, hence this rewriting step can be
omitted.
Substitution with joins
• By definition,
– E1 ⋈C E2 ≡ σC(E1  E2)
– E1 ⋈ E2 ≡ ΠL(σC(E1  E2)), where in condition C the
common attributes of E1 and E2 are made equal
and these common attributes occur only once in L.
ΠL. drinker(σL.drinker=F.drinker(σname=bar(
(Πdrinker(σbeer=Bud (L))) (Πbar(σcity=N.Y. (B)))F)))
is equivalent with
ΠL. drinker((Πdrinker(σbeer=Bud (L))) ⋈ (Πbar(σcity=N.Y. (B))⋈
name=bar F)))
ΠL.drinker
σL.dinker=F.drinker
ΠL.drinker
σname=bar
⋈


Πdrinker
σbeer=Bud
L
⋈ name=bar
Πdrinker
Πbar
σcity=N.Y.
B
σbeer=Bud
F
L
Πbar
σcity=N.Y.
B
F
Commutativity and associativity
• Commutativity: E1 Θ E2 ≡ E2 Θ E1, where
Θ Є {, ⋈, ⋈C }.
• Associativity: (E1 Θ E2) Θ E3 ≡ E1 Θ (E2 Θ E3),
where Θ Є {, ⋈}
• Note: in general (E1 ⋈C1 E2) ⋈C2 E3 is not
equivalent with E1 ⋈C1 (E2 ⋈C2 E3). Why??
Disjunctions in the conditions
• Disjunctions in the conditions of selection
operators may complicate the situation.
• As a first attempt one may use equivalence
rule σC1C2 (E) ≡ σC1(E)  σC2(E) and then apply
the previous algorithm on σC1(E) and σC2(E).
• However, in this case the relations appearing
in E may be scanned twice, which is costly.