RelatonalAlgebra
Download
Report
Transcript RelatonalAlgebra
Relational Algebra
Relational Query Languages
Query languages: allow manipulation and retrieval of data
from a database
Relational model supports simple, powerful QLs:
Strong formal foundation
Allows for much optimization
Query languages are NOT programming languages
QLs not expected to be “turing complete”
QLs not intended to be used for complex calculations
QLs support easy, efficient and sophisticated access to large data
sets
421B: Database Systems - Relational Algebra
2
Formal Relational Query Languages
Mathematical query languages form the basis for
“real” languages (e.g. SQL) and for implementation:
Relational Algebra:
Operational - a query is a sequence of operations on data
Very useful for representing execution plans, i.e., to describe
how a SQL query is executed internally
Relational Calculus:
descriptive - a query describes how the data to be retrieved
looks like
Understanding Relational Algebra is key to
understanding SQL.
421B: Database Systems - Relational Algebra
3
Preliminaries
A query is applied to a relation instance, and
the result of a query is also a relation instance
(i.e., input = relation instances, output =
relation instance)
Schemas of input relations for a query are fixed
(but query will run regardless of instance)
The schema for the result of a given query is also
fixed! Determined by the definition of query
language constructs
421B: Database Systems - Relational Algebra
4
Example Instances
We assume that names of fields in query results are
‘inherited’ from names of fields in query input relations
Sailors S1
sid sname rating age
28 yuppy
9
35
31
debby
8
55
Reserves R
Boat B
22
conny
5
35
58
lilly
10
35
sid bid day
bid bname
S2
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
45
55
35
22 101
10/24/07
color
101 Interlake blue
58 103 09/30/07
103 Snapper
red
58 101 09/01/07
104 Bingo
green
58 104 10/23/07
421B: Database Systems - Relational Algebra
5
Relational Algebra
Basic Operations
Selection : Selects a subset of rows from a relation
Projection ∏: projects to a subset of columns from a relation
Cross Product 5: Combines two relations
Set Difference : Tuples that are in the first but not the
second relation
Union : tuples in any of the relations to be unified
Additional operations
Relational algebra is closed: since each
operation has input relation(s), and returns a
relation, operations can be composed
Intersection (), join ( ), division, renaming
Not essential but very useful
421B: Database Systems - Relational Algebra
6
Projection
Projection:
Notation: ∏L(Rin)
Returns a subset of the columns of the input relation Rin, i.e, it ignores
the attributes that are not in the projection list L
Schema of result relation contains exactly the attributes of the
projection list, with the same attribute names as in Rin
Projection operator has to eliminate duplicates!
Note: real systems typically do NOT eliminate duplicates unless the user
explicitly asks for it; eliminating duplicates is a very costly operation!
∏ sname,rating (S2)
S2
∏ age(S2)
sid sname rating age
sname rating
age
28 yuppy
yuppy
debby
conny
lilly
35
55
9
35
31
debby
8
55
22
conny
5
35
58
lilly
10
35
421B: Database Systems - Relational Algebra
9
8
5
10
7
Selection and Operator
Composition
Selection: C(Rin)
Returns the subset of the rows of the
input relation Rin that fulfill the condition C
Condition C involves the attributes of Rin
Schema of result relation identical to
schema of Rin
No duplicates (obviously)
Operation Composition: result relation of
one operation can be input for another
relational algebra operation
rating
> 8(S2)
S2
sid sname rating age
28 yuppy
debby
8
55
22
conny
5
35
58
lilly
10
35
∏ sname,rating( rating > 8(S2))
sname rating
28 yuppy
58 lilly
yuppy
lilly
35
35
421B: Database Systems - Relational Algebra
35
31
sid sname rating age
9
10
9
9
10
8
Union, Intersection,
Set-Difference
Notation:
Rin1 Rin2 (Union),
Rin1 Rin2 (Intersection),
Rin1 - Rin2 (Difference),
Usual operations on sets
Rin1 and Rin2 must be union-compatible, i.e., they must
have the same number of attributes and corresponding
attributes must have the same type (note that
attribute names need not be the same)
The result schema is the same as the schema of the
input relations (with possible renamed attributes)
421B: Database Systems - Relational Algebra
9
Union, Intersection,
Set-Difference: Examples
S1 S2
S1
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
45
55
35
S2
sid sname rating age
28 yuppy
9
35
31
debby
8
55
22
conny
5
35
58
lilly
10
35
421B: Database Systems - Relational Algebra
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
28 yuppy 9
22 conny 5
45
55
35
35
35
S1 S2
sid sname rating age
31
58
debby
lilly
8
10
55
35
S1 - S2
sid sname rating age
44 robby
7
45
10
Cross-Product
Cross-Product: Rin1 X Rin2
Each row of Rin1 is paired
with each row of Rin2
Result schema has one
attribute per attribute of
Rin1 and one attribute per
attribute Rin2 with field
names inherited if possible
Conflict if both relations
have an attribute with the
same name: e.g., attribute
names of result relation
concatenate input relation
name with input attribute
name
421B: Database Systems - Relational Algebra
S2
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
45
55
35
sid bid day
22 101
10/24/07
58 103 09/30/07
58 101 09/01/07
58 104 10/23/07
S1 X R1
S1. sname rating age R1. bid day
sid
sid
44 robby
7
44 robby
7
45 22 101 07/24/00
45 58 103 07/30/00
31
debby
8
55
31
debby
8
55
58
lilly
10
35
58
lilly
10
35
22 101 07/24/00
58 103 07/30/00
22 101 07/24/00
58 103 07/30/00
11
Joins
Condition Join (Theta-Join): Rout = Rin1
C Rin2 =
Result schema the same as for cross-product
Fewer tuples than cross-product
S2
sid bid day
sid sname rating age
22 101
44 robby 7
31 debby 8
58 lilly
10
45
55
35
C(Rin1 X Rin2)
10/24/07
58 103 09/30/07
58 101 09/01/07
58 104 10/23/07
S2
S2.sid > R.sid R
S2. sname rating age R.
sid
sid bid day
44 robby
31 debby
7
8
45 22 101 10/24/07
55 22 101 10/24/07
58
10
35 22 101 10/24/07
lilly
421B: Database Systems - Relational Algebra
12
Self-Joins
S2
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
45
55
35
give S2 a second name S2’
S2
S2.
sid
44
31
S2.age > S2’.sid S2’
S2. S2’. S2’. S2’. S2’.
S2.
S2.
sname rating age sid sname rating age
robby 7
45 58 lilly 10
35
debby 8
55 58 lilly 10
35
421B: Database Systems - Relational Algebra
13
Equi-Join
Equi-Join: Rout = Rin1 Rin1.a1 = Rin2.b1, … Rin1.an = Rin2.bn Rin2 =
A special case of condition join where the condition C contains only equalities.
Result schema similar to cross-product, but only one copy of attributes for
which equality is specified
Natural Join: Equijoin on all common attributes, i.e., on all attributes with the
same name
sid bid day
S2
sid sname rating age
44 robby 7
31 debby 8
58 lilly
10
S2
22 101
10/24/07
58 103 09/30/07
45
55
35
58 101 09/01/07
58 104 10/23/07
S2.sid = R.sid
R
S2. sname rating age
sid
bid
day
58 lilly
58 lilly
10
10
35
35
58 lilly
10
35
103
101
104
09/30/07
09/01/07
10/23/07
421B: Database Systems - Relational Algebra
14
Division
Not supported as a primitive operation,
but useful for expressing queries like
Find sailors who have reserved all boats
Let A have 2 fields, x and y; B have only
field y:
A/B = {<x> | <y> B <x,y> A}
I.e., A/B contains all x tuples (sailors) such
that for every y tuple (boat) in B, there is an
xy tuple in A
Or: if the set of y values (boats) associated
with an x value (sailor) in A contains all y
values in B, the x value is in A/B
In general, x and y can be any lists of
fields; y is the list of fields in B, x y is
the list of fields of A.
421B: Database Systems - Relational Algebra
Reserves R1
sid bid
22
58
58
58
101
103
101
104
Boat B1
bid
101
103
104
15
Examples of Division A/b
x
y
x1
y1
x1
y2
x1
y3
x1
y4
x2
y1
x2
y2
x3
y2
x4
y2
x4
y4
A
y
y
y
y2
y2
y4
y1
B1
B2
x2
x3
x4
A/B1
421B: Database Systems - Relational Algebra
y4
B3
x
x1
y2
x
x
x1
x4
x1
A/B2
A/B3
16
Expressing A/B using Basic Operators
Division is not an essential operation; just a
useful shorthand
(Also true of joins, but joins are so common that
systems implement joins specially)
Idea: For A/B, compute all x values that are
not ‘disqualified’ by some y value in B.
x value is disqualified if by attaching y value from B,
we obtain an xy tuple that is not in A
Disqualified x values: ∏X((∏X (A) x B)-A)
A/B: ∏X (A) - all disqualified tuples
421B: Database Systems - Relational Algebra
17
Renaming
Renaming : (Rout(B1,…Bn), Rin(A1, …An))
Produces a relation identical to Rin but named Rout and with attributes
A1, … An of Rin renamed to B1, … Bn
(Temp, S1),
(Temp1(sid1,rating1), S1(sid,rating))
421B: Database Systems - Relational Algebra
18
Examples (discussed in class)
Relations
Queries
Sailors(sid,sname,rating,age)
Reserves(sid,bid,day)
Boats(bid,bname,color)
Find names of sailors who have reserved boat #103 (three
solutions)
Find names of sailors
solutions)
Find names of sailors
(2 solutions)
Find names of sailors
(1 solution)
Find names of sailors
division)
421B: Database Systems - Relational Algebra
who have reserved a red boat (2
who have reserved a red or a green boat
who have reserved a red and a green boat
who have reserved all boats (solution with
19
Summary
The relational model has rigorously defined query
languages that are simple and powerful
Relational algebra is operational; useful as internal
representation for query evaluation plans
Projection, selection, cross-product, difference and
union are the minimal set of operators with which all
operations of the relational algebra can be expressed
Several ways of expressing a given query; a query
optimizer should choose the most efficient version.
Relational Completeness of a query language: A query
language (e.g., SQL) can express every query that is
expressible in relational algebra
421B: Database Systems - Relational Algebra
20
Relational Calculus
Relational Calculus is non-operational but descriptive:
Users define queries in terms of what they want, not in terms of how
to compute it. (Declarativeness).
Comes in two flavors: Tuple relational calculus (TRC) and
Domain relational calculus (DRC)
Both TRC and DRC are simple subsets of first-order logic
Calculus has variables, constants, comparison operations,
logical connectives and quantifiers.
TRC: variables range over tuples
DRC: variables range over domain elements (= field values)
Find the names of all sailors with a rating > 7
TRC:{T | S Sailors(S.rating > 7 S.name =
T.name)}
DRC:{<N> | S,R,A(<S,N,R,A> Sailors R7}
421B: Database Systems - Relational Algebra
21
Domain Relational Calculus
Query has the form:
{<x1,x2,…,xn>} | p(<x1,x2,…xn>)}
The Answer includes all tuples <x1,x2,…,xn>
that make the formula p(<x1,x2,…xn>} be true.
(‘|’ should be read as “such that”).
Formulas are recursively defined, starting with
simple atomic formulas (getting tuples from
relations or making comparisons of values), and
building bigger and better formulas using the
logical connectives.
421B: Database Systems - Relational Algebra
22
DRC Formulas
Variables X,Y, x1,… range over domain elements (=
field values)
Atomic formula: Let op be one of ,,,,,
<x1,x2,…xn> Relation-name, or
X op Y, or
X op constant
Formula
An atomic formula, or
p, pq, pq, where p and q are formulas, or
X(p(X)), where variable X is free in p(X), or
X(p(X)), where variable X is free in p(X)
The use of quantifiers X and X is said to bind X. A variable
that is not bound is free.
421B: Database Systems - Relational Algebra
23
Free and Bound Variables
The use of quantifiers X and X is said to
bind X. A variable that is not bound is free.
Let us revisit the definition of a query:
{<x1,x2,…,xn>} | p(<x1,x2,…xn>)}
There is an important restriction: the variables
x1,…,xn that appear to the left of ‘|’ must be
the only free variables in the formula p(…).
421B: Database Systems - Relational Algebra
24
Example
Find all sailors with a rating above 7:
{<I,N,T,A> | <I,N,T,A> Sailors T7}
The condition <I,N,T,A>Sailors ensures
that the domain variables I,N,T, and A are
bound to fields of the same Sailors tuple.
The condition T7 ensures that all tuples in the
answers have a rating above 7
The term <I,N,T,A> to the left of ‘|’ says
that every tuple of the Sailors relation with a
rating above 7 is in the answer
421B: Database Systems - Relational Algebra
25
Further Examples (discussed in class)
Relations
Queries
Sailors(sid,sname,rating,age)
Reserves(sid,bid,day)
Boats(bid,bname,color)
Find sailors who are older than 18 or have a rating under 9, and
are called ‘debby’’
Find sailors with a rating above 7 that have reserved boat
#103 (use of )
Find sailors rated > 7 who have reserved a red boat (use of )
Find sailors who have reserved all boats (use of )
Transfer to “find all sailors I such that for each 3-tuple (B,BN,C>
either it is not a tuple in Boats or there is a tuple in Reserves
showing that sailor I has reserved the boat.
421B: Database Systems - Relational Algebra
26
Unsafe Queries and Expressive Power
Unsafe queries: It is possible to write syntactically
correct calculus queries that have an infinite number of
answers! Such queries are called unsafe.
E.g., {<I,N,T,A> | (<I,N,T,A> Sailors)}
Expressive Power: It is known that every query that
can be expressed in relational algebra can be expressed
as a safe query in DRC/TRC; the converse is also true.
Relational Completeness of a query language: A query
language (e.g., SQL) can express every query that is
expressible in relational algebra/calculus
421B: Database Systems - Relational Algebra
27
Some rules and definitions
Equivalence: Let R, S, T be relations; C, C1, C2
conditions; L projection lists of the relations R and S
Commutativity:
∏L(C(R)) = C(∏L(R)) if C only considers attributes of L
R1
R2 = R2 R1
Associativity:
R1
(R2 R3) = (R1
R2)
R3
Idempotence:
∏L2 (∏L1(R)) = ∏L2 (R) if L2 L1
C2(C1(R)) = C1C2(R))
421B: Database Systems - Relational Algebra
28