Relational Algebra

Download Report

Transcript Relational Algebra

Κεφάλαιο 3
ΣΧΕΣΙΑΚΟ ΜΟΝΤΕΛΟ
YV - Relational Model and Algebra
110
DATABASE SYSTEMS: The Relational Model and
Relational Database Systems

OUTLINE
– Informal and Formal Definition of the Model
Structures, Constraints, Operations
– Relational Algebra
– Relational Calculus
– The languages SQL and QBE
– Views - Integrity Constraints using SQL
– Normalization and Relational Database Design
– Relational Database Systems
YV - Relational Model and Algebra
111
Relational Model:




Informal Definition
Proposed in 1970 by E.F. Codd (“A relational model for large
shared data banks”, CACM), as a theory of a database model
Spurred tremendous research in the database field and became the
most popular logical data model - many relational DBMSs are
today available on nearly all platforms
A relational database is a set of relations
RELATION: A table of values. Each column in the table has a
header, called an attribute (field). Each row in the table is called
a tuple (record) and stands for an entity or a relationship.
YV - Relational Model and Algebra
112
Formal Definition

STRUCTURES
– Only one kind: relations (which have a name)
A domain D is a set of values , D= {d1, d2 , ..., dn}
e.g., DOMAIN OF NAMES = the set of all names
DOMAIN of WEIGHT = the set of all weights
CHAR STRINGS from 1 to 10 in length, etc.
An attribute A names a property of interest in a relation and
takes its values from some associated domain D(A).
e.g., EMPLOYEE_NAME, WEIGHT, etc.
Attributes are the column names (headers) in a relation
(Notation: R.A, or R[A] where R is the relation name)
YV - Relational Model and Algebra
113
Structure Definitions (2)
A relation schema R is the name and attributes of a relation, with
the underlying domains for the attributes.
When
obvious, the domains are ignored
Notation: R(A1 , A2 , ... An)
e.g., STUDENT(Name, SSN, BirthDate, Address)
The degreee n of a relation R is the number of attributes in R
A database schema S is a set of relation schemas.
Notation: S = {R1 , R2 , ... Rm }
e.g., COMPANY = { EMPLOYEE, PROJECT, ... }
YV - Relational Model and Algebra
114
Structure Definitions (3)
-- A tuple t of a relation R(A1 , A2 , ... An) is an (ordered) set of
values t = <v1 , v2 , ... vn >, where each vi is an element of the
domain D(Ai).
-- A relation instance r(R), simply, relation, is a set of tuples
r(R) = { t1 , t2 , ... tk }
alternatively, it is a subset of the Cartesian product
r(R)  D(A1) x D(A2) x ... x D(An)
-- The cardinality of R is the number of tuples in r(R), it is
denoted by CARDR
-- A relational database is a set of relations (instances)
YV - Relational Model and Algebra
115
Characteristics of Relations





ORDERING of attributes in a relational schema is essential
ORDERING of tuples in a relation is not important
Every tuple is stored only ONCE in a relation (it is a set)
A value may appear MULTIPLE TIMES in a column and is
considered ATOMIC (indivisible) - at times this is referred as
First Normal Form (1-NF) relation
A special value, called NULL, is used to represent values that
are inapplicable or unknown to the database
– e.g, the PhoneNumber value of someone without a phone,
Address value for someone who did not supply his address

Notation: component value of a tuple t, t[Ai] = vi
YV - Relational Model and Algebra
116
Constraints in the Model

CONSTRAINTS
– Three kinds of inherent to the model constraints: KEY,
ENTITY INTEGRITY, and REFERENTIAL INTEGRITY.
– Three basic explicit constraints: DOMAIN, COLUMN and
USER-DEFINED (some other explicit constraints, like the
Functional Dependencies, will be discussed later.)

KEY CONSTRAINTS: The various keys, as defined for
Entities and Relationships, hold in the Relational Model.
– Note that a key is a property of the relational schema (not a
property of the relation)
YV - Relational Model and Algebra
117
Inherent Constraints (1)
– A set of attributes SK of a relation schema R for which each tuple
in any relation instance r(R) must have unique value(s) is a
superkey. That is, for distinct t1 and t2, t1[SK]  t2[SK]
For instance, SSN of EMPLOYEE, NAME and ADDRESS of
EMPLOYEE, SSN and NAME of EMPLOYEE, etc.
– A candidate key K is a minimal superkey (that is, no subset of the
attributes in K is a superkey). K is also called key.
For instance, SSN is a candidate key for EMPLOYEE, but the
combination {SSN, NAME} is not.
– A primary key PK is one of the candidate keys that is agreed to
serve as an identifier for the relation (primary keys are usually
distinguished by underlining)
For instance, SSN is the primary key of relation EMPLOYEE.
YV - Relational Model and Algebra
118
Inherent Constraints (2)

ENTITY INTEGRITY: The primary key attributes PK
in a relation schema R cannot have NULL values in any
tuple of a relation instance r(R).
t[PK]  NULL, for all t in r(R)
– The reason behind the above constraint is that a primary key
is used to identify a tuple in a relation.
– Note that more attributes in R may be constrained to have
no NULLS by explicit constraints.
YV - Relational Model and Algebra
119
Inherent Constraints (3)

REFERENTIAL INTEGRITY: These constraints
involve TWO relations and are used to specify a
relationship among tuples of the two relations. They are
also called foreign keys.
– A foreign key FK is a set of one or more attributes of a
relation R1 that forms a primary key for another relation R2.
A tuple t1 in r(R1) is said to reference tuple t2 in r(R2), IF
t1[FK] = t2[FK]
For instance, for the relation WORKING-ON the attribute
SSN is a foreign key (it is the primary key of EMPLOYEE).
YV - Relational Model and Algebra
120
Explicit Constraints (1)

DOMAIN CONSTRAINTS: They are the rules defined in the
domain definition and inherited by columns (attributes) based on
that domain.
– A domain can be defined, together with all its integrity rules (e.g.,
the domain of integers having all rules that apply to integers).
They are usually the basic data types.
– The ideal support is through strong data typing (very rare)

COLUMN CONSTRAINTS: They are additional to the
domain constraints and maintain values in a column.
– Column rules go beyond the rules inherited by the domain (e.g.,
the column of small integers or integers between 1 and 10, etc.
that further restrict the domain of integers.)
– In many systems, support is given with a CHECK option
YV - Relational Model and Algebra
121
Explicit Constraints (2)





USER-DEFINED CONSTRAINTS: Any integrity rule,
not among the ones discussed before, is classified as userdefined.
To enforce certain business rules, integrity constraints of
arbitrary complexity are required.
Such constraints are expressed either procedurally or
declaratively (preferred way)
Several mechanisms can be used to implement the
enforcement of such rules: stored procedures, triggers,
methods (for object-oriented systems)
Generally, relational DBMS are weak in enforcing rules
YV - Relational Model and Algebra
122
The COMPANY Database in the
Relational Model - Schema
EMPLOYEE ( SSN, Name, BirthDate, Address, Sex, Salary, SupSSN, DNumber)
DEPARTMENT ( DNumber, DName, MgrSSN, MgrStartDate)
PROJECT ( PNumber, PName, Location, DNumber)
DEPT_LOCATION ( DNumber, DLocation )
WORKS_ON ( SSN, PNumber, HoursPW)
DEPENDENT ( SSN, DependName, Sex, BirthDate, Relationship)
YV - Relational Model and Algebra
123
The COMPANY Database in the
Relational Model - Instance
.
EMPLOYEE
SSN
1234
3344
9998
9876
6668
4534
9879
8886
Name
john
frank
alice
jenny
rama
joyce
jack
james
BDate
9.1.55
8.9.45
7.6.50
2.6.41
5.8.56
3.7.62
2.3.59
1.1.40
Address Sex
kifisia
m
athina
m
ekali
f
patra
f
korinth
m
kiato
f
maroussi
m
psihico
m
Salary
30000
55000
25000
43000
38000
25000
25000
60000
SupSSN
3344
8886
9876
8886
3334
3334
9876
N U LL
DNumber
5
5
4
4
5
5
4
1
DEPT_LOCATION
DEPARTMENT
DNumber
5
4
1
DName MgrSSN
research
3334
admin
9876
headqrtr
8886
YV - Relational Model and Algebra
MgrSD
22.3.78
14.1.85
28.6.73
DNumber
1
4
5
5
5
DLocation
kolonaki
zografou
gkizi
patisia
kolonaki
124
The COMPANY Database in the Relational Model - Instance
PROJECT
WORKS_ON
.
SSN
1234
1234
3344
3344
3344
3344
9998
9998
9876
9876
6668
4534
4534
9879
9879
8886
PNumber
1
2
2
3
10
20
30
10
30
20
3
1
2
10
30
20
HoursPW
32
7
10
10
10
10
30
10
20
15
40
20
20
35
5
N U LL
YV - Relational Model and Algebra
PNumber
1
2
3
10
20
30
PName
prodx
prody
prodz
computer
reorganize
benefits
Location
gkizi
patisia
kolonaki
zografou
kolonaki
zografou
DNumber
5
5
5
4
1
4
DEPENDENT
ESSN
3334
3334
3334
9876
1234
1234
1234
Name
aliki
theo
joy
abner
mike
aliki
elisa
Sex
f
m
f
m
m
f
f
BDate
1976
1973
1948
1932
1978
1979
1957
Relation
daught
son
spouse
spouse
son
daught
spouse
125
Definition of the Model: Operations

OPERATIONS
– We can distinguish them in (a) UPDATE, (b) RETRIEVAL

UPDATE operations on Relations
– INSERT a tuple
– DELETE a tuple
– MODIFY a tuple

Integrity constraints should not be violated by the
execution of any update operation. For this, updates may
propagate to cause other updates automatically.
– e.g., when a tuple of EMPLOYEE is deleted, all tuples in
WORKING_ON which have the same value for SSN are also
deleted (non-existent employees cannot work on projects!)
YV - Relational Model and Algebra
126
Operations: Relational Languages

RETRIEVAL operations on Relations
– There are two flavors:
– (a) RELATIONAL ALGEBRA -- somewhat procedural, tells
how to compute the result
– (b) RELATIONAL CALCULUS -- somewhat declarative, tells
what properties the result should have

No database system supports the two flavors of retrieval
languages in their pure forms. This is because the issues of
“ease of use”, “convenience”, etc., play an essential role in
user interaction. Yet, the languages supported in DBMSs
have their roots in either relational algebra or calculus.
YV - Relational Model and Algebra
127
Relational Languages


Query languages: Allow manipulation and retrieval of
data from a database.
Relational model supports simple, powerful QLs:
– Strong formal foundation based on logic.
– Allows for much optimization.

Query Languages != programming languages!
– QLs not expected to be “Turing complete”.
– QLs not intended to be used for complex calculations.
– QLs support easy, efficient access to large data sets.
YV - Relational Model and Algebra
128
Relational Algebra Operations




RELATIONAL ALGEBRA
A set of operators each of which maps one or more
relations into a new relation (the algebra is CLOSED).
The operators, just as in arithmetic algebra, can be nested,
since the result of each operation is itself a relation.
There are two types of operators:
– traditional (regular) set operators
» union, intersection, difference, ...
– database specific set operators
» projection, selection, join, ...
YV - Relational Model and Algebra
129
Relational Algebra: Set Ops

Traditional Set Operators
– Union, Intersection, Difference, Cartesian Product
– For the first three operators to apply, we must have UNION
COMPATIBILITY between the two operand relations.
That
is:
R1 ( A1 , A2 , ... , An ) and R2 ( B1 , B2 , ... , Bn ) must have the
same number of attributes and the domains of the
corresponding attributes must be compatible, i.e.,
D(Ai) = D(Bi) , for i = 1, 2, ..., n
– By convention, the resulting relation for these operators has
the same attribute names as the first operand relation R1
YV - Relational Model and Algebra
130
Relational Algebra :
Example Database
Database Schema
STUDENT (SName, SAge) ,
INSTRUCTOR (IName, IAge)
relation schema R
relation schema S
Database Instance
R
SName
mary
jack
paul
barb
john
(STUDENT)
SAge
20
22
25
22
24
YV - Relational Model and Algebra
S (INSTRUCTOR)
IName
helen
paul
barb
kris
IAge
22
25
22
24
CARDR = 5
CARDS = 4
131
Relational Algebra: Set Ops (2)

UNION - Put all the tuples of two relations in one relation
– Notation: R  S
CARDR S <= CARDR + CARDS
– Formally: R  S = { t | t is in R or t is in S }
– Example: STUDENT  INSTRUCTOR
SName
mary
jack
paul
barb
john
SAge
20
22
25
22
24
YV - Relational Model and Algebra

IName
helen
paul
barb
kris
IAge
22
25
22
24
=
SName
mary
jack
paul
barb
john
helen
kris
SAge
20
22
25
22
24
22
24
132
Relational Algebra: Set Ops (3)

INTERSECTION - Put the common tuples of two
relations in one relation
– Notation: R  S
CARDR S <= max(CARDR , CARDS )
– Formally: R  S = { t | t is in R and t is in S }
– Example: STUDENT  INSTRUCTOR
SName
mary
jack
paul
barb
john
SAge
20
22
25
22
24
YV - Relational Model and Algebra

IName
helen
paul
barb
kris
IAge
22
25
22
24
=
SName
paul
barb
SAge
25
22
133
Relational Algebra: Set Ops (4)

SET DIFFERENCE - Select the tuples of the first relation
which are not members of the second relation
CARDR -S <= CARDR
– Notation: R - S
– Formally: R - S = { t | t is in R and t is not in S }
– Example: STUDENT - INSTRUCTOR
SName
mary
jack
paul
barb
john
SAge
20
22
25
22
24
YV - Relational Model and Algebra
-
IName
helen
paul
barb
kris
IAge
22
25
22
24
=
SName
mary
jack
john
SAge
20
22
24
134
Relational Algebra: Set Ops (5)

CARTESIAN PRODUCT - Combine each tuple of one
relation with each tuple of the other
CARDR 5 S = CARDR x CARDS
– Notation: R 5 S
– Formally: R 5 S = { t | t is the concatenation of a tuple
in R with a tuple in S }
– Example: STUDENT 5 INSTRUCTOR
SName
mary
mary
mary
mary
jack
jack
jack
jack
paul
SAge
20
20
20
20
22
22
22
22
25
Iname
helen
paul
barb
kris
helen
paul
barb
kris
helen
YV - Relational Model and Algebra
IAge
22
25
22
24
22
25
22
24
22
paul
paul
paul
barb
barb
barb
barb
john
john
john
john
25
25
25
22
22
22
22
24
24
24
24
paul
barb
kris
helen
paul
barb
kris
helen
paul
barb
kris
25
22
24
22
25
22
24
22
25
22
24
135
Relational Algebra Operators:
SELECTION

SELECTION - Selects the subset of tuples of a relation that
satisfy a certain condition (qualification) c , which
is an arbitrary Boolean expression on the attributes of R
(“horizontal” subset of R)
– Notation: sc (R)
or
R[c]
– Formally: sc (R) = { t | t is in r(R) and
condition c holds for t }
– Examples: sDNumber = 4 (EMPLOYEE), sSalary>30000 (EMPLOYEE)
s(Salary>30000 AND DNumber = 4 ) OR DNumber = 5 (EMPLOYEE),
EMPLOYEE [ Dnumber = 4 ], EMPLOYEE [ Salary > 30000 ]
YV - Relational Model and Algebra
136
Relational Algebra Operators:
SELECTION (2)
– Selection is both commutative and associative
(a)
sc1 ( sc2 (R) ) = sc2 ( sc1 (R) )
(b)
sc1 ( sc2 (R) ) = sc1 AND c2 (R) = sc1 , c2 (R)
(c)
sc1 ( sc2 ( sc3 (R) ) ) ) = sc2 ( sc3 ( sc1 (R) ) ) )
– Example Result:
sDNumber = 4 (EMPLOYEE)
All Employees in department number 4
SSN
9998
9876
9879
Name
alice
jenny
jack
YV - Relational Model and Algebra
BDate
7.6.50
2.6.41
2.3.59
Address Sex
ekali
f
patra
f
maroussi
m
Salary
25000
43000
25000
SupSSN
9876
8886
9876
DNumber
4
4
4
137
Relational Algebra Operators: PROJECTION

PROJECTION - Keeps only certain attributes (specified
by a list L) and eliminates the other attributes of a relation
R and also all duplicate tuples (“vertical” subset of R)
– Notation: pL (R) or
R[L]
– Formally: pL (R) = { t[L] | t is in r(R) and L  R }
– Example: pLocation (PROJECT), or PROJECT[Location]
Location
gkizi
patisia
kolonaki
zografou
YV - Relational Model and Algebra
All Locations where projects are
138
Relational Algebra Operators: JOINS

There are several types of JOINs - all combining two relations
to form a new one:
– (theta) join, equality join, natural join, semi-join, outer join

THETA (CONDITION) JOIN: Connect tuples from two
relations that match (satisfy a Boolean condition c) on certain
attributes
– A theta-join is equivalent to a Cartesian product followed by a
selection on the condition c.

– Notation: R
or
R[c]S
c S
– The resulting relation has ALL the attributes of R and of S
R  c S  s c ( R  S)
YV - Relational Model and Algebra
139
Relational Algebra Operators: THETA JOIN
– Example: DEPARTMENT
 MgrSSN > MgrSSN DEPARTMENT
All department-department combinations where the first
department’s number is greater than the second’s
DNumber
4
4
1
DName MgrSSN
admin
9876
admin
9876
headqrtr
8886
YV - Relational Model and Algebra
MgrSD
14.1.85
14.1.85
28.6.73
DNumber
1
5
5
DName MgrSSN
headqrtr
8886
research
3334
research
3334
MgrSD
28.6.73
22.3.78
22.3.78
140
Relational Algebra Operators: EQUALITY JOIN

EQUALITY JOIN: Connect tuples from two relations
that match (have equal values) on certain attributes. This
is exactly like THETA JOIN, except that the condition c is
only allowed to have equalities.
– Notation: R
c S
or
– Example: WORKS_ON
SSN
9879
9879
9879
PNumber
30
30
30
HoursPW
5
5
5
R[c]S
HoursPW = DNumber PROJECT
PNumber
1
2
3
PName
prodx
prody
prodz
Location
gkizi
patisia
kolonaki
DNumber
5
5
5
A totally MEANINGLESS Relation
YV - Relational Model and Algebra
141
Relational Algebra Operators: NATURAL JOIN

NATURAL JOIN: Connect tuples from two relations that
match (have equal values) on all common attributes. In the
result, the common attributes are kept only once
– Notation: R
S

or
– Example: DEPARTMENT
DNumber
5
5
5
4
1
YV - Relational Model and Algebra
DName
research
research
research
admin
headqrtr
R[X=X]S

MgrSSN
3334
3334
3334
9876
8886
DEPT_LOCATION
MgrSD
22.3.78
22.3.78
22.3.78
14.1.85
28.6.73
DLocation
gkizi
patisia
kolonaki
zografou
kolonaki
142
Relational Algebra Operators: SEMI--JOIN

SEMI-JOIN: Select the subset of one relation that joins
with another. A semi-join is equivalent to a join followed
by a projection.
– Notation: R c S
or
R<c]S
– Example: EMPLOYEE SSN=MgrSSN DEPARTMENT
SSN
3344
9876
8886
Name
frank
jenny
james
BDate
8.9.45
2.6.41
1.1.40
Address
athina
patra
psihico
Sex
m
f
m
Salary
55000
43000
60000
SupSSN
8886
8886
N U LL
DNumber
5
4
1
Semi-joins are USEFUL in distributed database operations
YV - Relational Model and Algebra
143
Relational Algebra Operators: OUTER--JOIN

Motivation: In a regular join operation, tuples in relations
R or S that do not have matching tuples in the other
relation do not appear in the result.
In some queries, all tuples in R (or S) must appear in the
result - when no matching tuples are found, NULLs are
placed for the missing attribute values.
– Notation:

RS
OUTER-JOINs are distinguished in:
– Left outer join
– Right outer join
– Full outer join
YV - Relational Model and Algebra
(all tuples in R appear in the result)
(all tuples in S appear)
(all tuples in R and S appear)
144
Relational Algebra Operators: DIVISION

DIVISION: Given relations R(X,Y) and S(Y), where X,
Y are sets of attributes, a tuple t is a member of the
division (denoted: (R / S)[X] ) IF for all tS in S there
exist tR in R, such that: tR [Y] = tS [Y] and tR [X] = t [X]
– Analogy with number arithmetic:
The quotient q of a/b is the largest number s.t. qb <= a
The quotient Q of R  S is the maximal relation s.t. Q X S  R
YV - Relational Model and Algebra
145
Relational Algebra Queries (1)

A series of queries in relational algebra are presented in the
sequel, using an example relational database that involves
SAILORS who RESERVE some BOATS.
SAILORS (Sid, SName, Rating)
BOATS (Bid, BName, Color)
RESERVE (Sid, Bid, Date)
YV - Relational Model and Algebra
146
Relational Algebra Queries (2)

QUERY1: Find the names of sailors who have reserved
boat number 2
( RESERVE [Bid=2] [Sid=Sid] SAILORS ) [SName]
pSName ( sBid=2 RESERVE

 Sid=Sid SAILORS )
QUERY2: Find the names of sailors who have reserved a
red boat
( BOAT [Color=red] [Bid=Bid] RESERVE [Sid=Sid] SAILORS )
[SName]
pSName ( sColor=red BOAT
YV - Relational Model and Algebra

Bid=Bid
RESERVE

Sid=Sid
SAILORS )
147
Relational Algebra Queries (3)

QUERY3: Find the colors of the boats reserved by eleni
(SAILORS [SName=eleni] [Sid=Sid] RESERVE [Bid=Bid] BOATS) [Color]
pColor ( sSName=eleni SAILORS


Sid=Sid
RESERVE

Bid=Bid
BOATS )
QUERY4: Find the names of the sailors who have reserved
at least one boat
( RESERVE [Sid=Sid] SAILORS ) [SName]
pSName ( RESERVE
YV - Relational Model and Algebra

Sid=Sid
SAILORS )
148
Relational Algebra Queries (4)

QUERY5: Find the names of sailors who have reserved a
red or a green boat
pSName ( (sColor=red BOATS  sColor=green BOATS )
pBid=Bid RESERVE
Sid=Sid SAILORS )



QUERY6: Find the names of sailors who have reserved
both a red and a green boat
RESERVE ) 

 RESERVE ) ) 
BOATS 
pSName ( ( pSid ( sColor=red BOATS
pSid ( sColor=green
SAILORS )
YV - Relational Model and Algebra
Bid=Bid
Bid=Bid
Sid=Sid
149
Relational Algebra Queries (5)

QUERY7: Find the names of sailors who have reserved
all boats
pSName ( ( pSid, Bid RESERVE / pBid BOATS )


Sid=Sid
SAILORS )
QUERY8: Find the names and ratings of sailors who have
reserved all red boats
pSName, Rating ( (pSid, Bid RESERVE / pBid (sColor=red BOATS ) )
Sid=Sid SAILORS )

YV - Relational Model and Algebra
150
Relational Algebra: Comments

There are several properties that hold in a relational
algebra expression (commutatitivity, associativity, etc.)
– Examples:
sc1 ( pL (R) ) = pL ( sc1 (R) )

 c2 S
sc1 ( R
c2 S ) = sc1 ( R )
sc1 (R  S) = sc1(R)  sc1(S)
....

Such properties are very useful in query optimization
YV - Relational Model and Algebra
151
Relational Algebra: Comments

COMPLETE SET OF OPERATIONS
– The set of operators {sp-5} is called a complete set of
relational algebra operations. The implication is that ALL
other operators can be described as a sequence of the
above operators.
– For example, the division operator can be described as:
R / S = pX(R) - ( (pX (R) 5 S) - R )
where X are the non-common attributes in R and S
– Equivalently, it is expressed as:
(R / S) [X] = R[X] - ( ( R[X] x S ) - R )[X]
YV - Relational Model and Algebra
152
Relational Algebra Competeness


There are several combinations of relational algebra operators
that define a complete set.
Any Query Language equivalent to a complete set of
operations is called RELATIONALLY COMPLETE
– NOTE: This does not imply that the language is adequate to do
all database operations (e.g., a good language must support
aggregates, many forms of joins, built-in functions,...)

An interesting operator -which goes beyond the expressive
power of the relational set of operators as defined by Codd- is
that of transitive closure. This is a form of recursion in
relational databases and is very useful in many applications.
YV - Relational Model and Algebra
153