Transcript **** 1 - Ki

More Relation
Operations
2016, Fall
Pusan National University
Ki-Joune Li
Relational Algebra on Bags
• A bag is like a set, but an element may appear more than once.
o
o
Multiset is another name for “bag.”
Example:


{1,2,1,3} is a bag.
{1,2,3} is also a bag that happens to be a set.
• Bags also resemble lists, but order in a bag is unimportant.
o
Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.
2
Why Bags?
• SQL, the most important query language for relational
databases is actually a bag language.
o
SQL will eliminate duplicates, but usually only if you ask it to do so
explicitly.
• Some operations, like projection, are much more efficient
on bags than sets.
o
Projection: No more operation except deleting other attributes
3
Operations on Bags
• Selection:
o
o
Applies to each tuple,
so its effect on bags is like its effect on sets.
• Projection
o
As a bag operator, we do not eliminate duplicates.
• Products and joins
o
done on each pair of tuples, so duplicates in bags have no effect on
how we operate.
4
Examples
• Given two relations R(A,B) and S(B,C)
S
R
RXS
SELECTA+B<5 (R)
A
B
B
C
1
2
3
4
5
6
1
2
7
8
ProjectA(R)
A
B
A
1
2
1
1
2
5
1
R
R.B<S.B
S
A
R.B
S.B
C
1
2
3
4
A
R.B
S.B
C
5
6
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
7
8
1
2
7
8
5
6
7
8
5
6
7
8
1
2
7
8
1
2
7
8
5
Bag Union, Intersection, Difference
• Union, intersection, and difference need new definitions for
bags.
• Union: An element appears in the union of two bags
the sum of the number of times it appears in each bag.
o
Example: {1,2,1} U {1,1,2,3,1} = {1,1,1,1,1,2,2,3}
• Intersection: An element appears in the union of two bags
the minimum of the number of times it appears in either.
o
Example: {1,2,1} ∩ {1,2,3} = {1,2}.
• Difference: An element appears in the difference A – B of
bags as many times as it appears in A, minus the number of
times it appears in B.
o
o
But never less than 0 times.
Example: {1,2,1} – {1,2,3} = {1}.
6
Beware: Bag Laws != Set Laws
• Not all algebraic laws that hold for sets also hold for
bags.
o
Example: Commutative law for union (R UNION S = S
UNION R ) does hold for bags.
 Since addition is commutative, adding the number of times x
appears in R and S doesn’t depend on the order of R and S.
o
Counter Example:
 Set union is idempotent, meaning that S UNION S = S.
 However, for bags, if x appears n times in S, then it appears 2n
times in S UNION S.
 Thus S UNION S != S in general
7
The Extended Algebra
• δ: eliminate duplicates from bags.
• τL: sort tuples.
o
o
L is a list of attributes
Result of Sorting: a List (not set neither bag)
• Extended projection : arithmetic, duplication of columns.
• γ: grouping and aggregation.
• OUTERJOIN ( ◦ ):avoids “dangling tuples” = tuples that do not
join with anything.
8
Extended Projection
• Using the same Π L operator, we allow the list L to contain
arbitrary expressions involving attributes, for example:
Arithmetic on attributes, e.g., A+B.
o Duplicate occurrences of the same attribute.
o
o
Example: ΠA+B,A,A (R)
9
Aggregation Operators
• Aggregation operators are not operators of relational
algebra.
• Rather, they apply to entire columns of a table and produce
a single result.
• The most important examples: SUM, AVG, COUNT, MIN, and
MAX.
10
Grouping Operator
• R1 := γ L (R2).
o
L is a list of elements that are either:


Individual (grouping ) attributes.
AGG(A), where AGG is one of the aggregation operators and A is an attribute.
• Example
o
o
o
StarsIn(title, year, starName)
γ starName, MIN(year)minYear,Count(title)ctTitle (StarsIn)
Find the earliest year in which they appeared in movie for each star who
has appeared at least three times



R1:= γ starName, MIN(year)  minYear, Count(title)ctTitle (StarsIn)
R2 := SELECT ctTitle >= 3 (R1)
R3 := PROJECT minYear (R2)
11
Grouping Operator
Title
Year
StarName
Star Wars
1977
Carrie Fisher
StarName
minYear
ctTitle
Star Wars
1977
Mark Hamill
Carrie Fisher
1977
1
Star Wars
1977
Harrison Ford
Harrison Ford
1977
2
Mighty Ducks
1991
Emilio Estevez
Emilio Estevez
1991
1
Wayne’s World
1992
Dana Carvey
Dana Carvey
1992
1
My world
1977
Mark Hamill
Mark Hamill
1977
3
Raiders of the Lost Ark
1981
Harrison Ford
Tiki's story
1977
Mark Hamill
12
δ (eliminating duplications) and γ (grouping)
• δ is a special case of γ
• Suppose R(A1, A2, ..., An),
o
o
then δ(R)= γA1, A2, ..., An (R )
but γA1, A2, ..., An (R ) = Π A1, A2, ..., An (R ), only if R is a set (not a bag)
13
Outerjoin
• Suppose we join R S.
• A tuple of R that has no tuple of S with which it joins is said to
be dangling.
o
Similarly for a tuple of S.
• Outerjoin preserves dangling tuples by padding them with a
special NULL symbol in the result.
o
R
S
R
S
R
S
A
B
C
B
C
D
A
B
C
D
1
2
3
2
3
10
1
2
3
10
4
5
6
2
3
11
1 2 3 11
7
8
9
6
7
12
4
5
6
Ʇ
7
8
9
Ʇ
Ʇ
6
7
12
14