week9 - Arms-A

Download Report

Transcript week9 - Arms-A

Normalization

Purpose: process to eliminate redundancy in
relations due to functional or multi-valued
dependencies.

Decompose relation schema into Normal forms:
– Boyce-Codd Normal Form (BCNF)
– Third Normal Form (3NF)
– Fourth Normal Form (4NF)

To obtain the new relations, project the schemas
onto the original relation schema (e.g. Movie)

To recover information (I.e. Movie) from the new
relations: natural join the new relations.
1
© D. Wong 2003
BCNF Decomposition Example 3.24 pp 104

Relation: Movie(title, year, length, filmType, studioName,
starName)

Key: {title, year, starName}


FD’s: title year  length filmType studioName is a BCNF
violation, so Movie not in BCNF
Decomposition:
Schema 1: {title, year, length, filmType, studioName}
Schema 2: {title, year, starName}

To obtain the new relations, project the schemas onto Movie

To recover information (I.e. Movie) from the new relations:
natural join the new relations. Does not lose information.
2
© D. Wong 2003
Functional Dependencies (FD)

Given: relation schema R(A1, …, An), and X and
Y be subsets of (A1, … An).
FD : X  Y means X functionally determines Y
e.g. A1A2…An  B1B2…Bm

A1A2…An  B1B2…Bm is an assertion about R
that two attributes or sets of attributes in R are
dependent of one another.
3
© D. Wong 2003
Mutivalued Dependencies (MVD)
relation schema R, and A1A2…An and B1B2…Bm be
subsets of attributes of R.
 Given:
MVD : A1A2…An  B1B2…Bm holds in R if :
For each pair of tuples t and u of relation R that agree on
all the A’s, we can find in R some tuple v that agrees:
1. With both t and u on the A’s,
2. With t on the B’s, and
3. With u on all attributes of R that are not among the
A’s or B’s
 A1A2…An  B1B2…Bm is an assertion about R that two
attributes or sets of attributes in R are independent of one
another.
 Cause redundancy not related to FD’s in a BCNF schema.
 Most common source: putting 2 or more many-many
relationships in a single relation.
4
© D. Wong 2003
MVD Rules

Trivial dependencies rule
If A1A2…An  B1B2…Bm holds for R, then A1A2…An
 C1C2…Ck holds where the C’s are the B’s + one or
more of the A’s. The converse also hold.

Transitive rule
If A1A2…An  B1B2…Bm and B1B2…Bm  C1C2…Ck
then A1A2…An  C1C2…Ck

Splitting rule does not hold
E.g. name  street city, but not name  street
So, always start with set of attributes on the R.S. because
splitting rule does not hold.
5
© D. Wong 2003
More MVD Rules

Every FD is an MVD
Because If FD A1A2…An  B1B2…Bm, then swapping B’s
between tuples that agree on A’s doesn’t create new tuples.
A’s
B’s
t
u

Complementation rule
If X  Y, then X  Z, where Z is all attributes not in
X or Y
e.g. Star_Star_In {name, street, city, title, year}
name  street city
name  title year
6
© D. Wong 2003
Nontrivial MVD
A1A2…An  B1B2…Bm for a relation R is
nontrivial if:
1.
B1B2…Bm is not a subset of A1A2…An
2.
A1A2…An  B1B2…Bm is not all attributes of R
7
© D. Wong 2003
Fourth Normal Form (4NF)

Decompose relations that has MVD’s into 4NF to
eliminate MVD’s.

Definition:
R is in 4NF if A1A2…An  B1B2…Bm is a
nontrivial MVD, {A1A2…An} is a superkey.

Since every FD is an MVD, so 4NF is more
stringent than BCNF

Only nontrivial MVD’s has the potential to violate
4NF
8
© D. Wong 2003
4NF Decomposition
Given: relation R, and nontrivial MVD X  Y that violate
4NF
1.
Decompose X  Y into XY and X  (R-Y)
R
X
Y
2.
Produce the relations by projecting R onto XY and
 (R-Y)
X
3.
Reconstruct R from the new relations using natural join
e.g. Star_Star_In {name, street, city, title, year} and
name  street city
Decompose Star_Star_In using name  street city into
{name, street, city} and {name, title, year}
9
© D. Wong 2003
Relationships among normal forms

4NF is the most stringent

4NF  BCNF  3NF
10
© D. Wong 2003
Lossless-join decomposition
Given: Relation R, decomposed into schemes R1, R2, …
Rk, and D is a set of dependencies.
Definition: R1, R2, … Rk is a lossless-join (w.r.t. D) if for
every relation r for R satisfying D:
r = R1(r)
R2(r)
…
Rk(r)
i.e. Every relation r for R is the natural join of
its projections onto the Ri’s.
The lossless-join property is necessary if the decomposed
relation is to be recoverable from its
decomposition.
However, joins are expensive. So, don’t over decompose!
11
© D. Wong 2003
Structured Query Language (SQL)

A DDL and DML for relational DBMSs

History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3)

SQL-99 extends SQL2 with object-relational features and
other new features

Most DBMS vendors implements the core, and then add
bells and whistles and variations

Query capability is close to relational algebra, with lots of
extensions.

Case insensitive except characters inside quoted strings ' '
e.g. 'Smith'  'SMITH'

; as statement delimiter
12
© D. Wong 2003
Example database schema
Movie(title, year, length, inColor, studioName, producerC#)
StartIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
13
© D. Wong 2003
SQL Quries – basic form
SELECT attribute/s
FROM relations / views /subqury
WHERE conditional expression;
14
© D. Wong 2003
SQL query examples
1.
Example 1:
SELECT *
FROM Movie;
2.
-- * => all attributes of Movie
Example 2:
SELECT *
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
3.
Example 3:
SELECT title, length
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
15
© D. Wong 2003
Duplicates

SQL generally operates using bags instead of sets

Exception: UNION, INTERSECT, EXCEPT
operation

To eliminate duplicates, add keyword DISTINCT
to the SELECT clause
e.g. SELECT DISTINCT starName
FROM StarsIn;

Duplicate elimination is costly. Use judiciously.
16
© D. Wong 2003
SQL Correspondence to Relational Algebra
SELECT L
--  R.A. project
FROM R
--  R.A. operands
WHERE C ;
--  R.A. select
R.A. expression: L(C(R))
When reading and writing queries:
1.
FROM
-- what relations are involved
2.
WHERE
-- what's the tuples selection criteria
3.
SELECT
-- what columns to output
17
© D. Wong 2003
Union, Intersection, Difference of Queries

UNION : R1 UNION R2
or (Q1) UNION (Q2)
e.g. (SELECT title, year FROM Movie)
UNION
(SELECT movieTitle AS title, movieYear AS
year FROM StarsIn);

INTERSECT : R1 INTERSECT R2 or
(Q1) INTERSECT (Q2)

EXCEPT: R1 EXCEPT R2
-- difference
(Q1) EXCEPT (Q2)
18
© D. Wong 2003
Union, Intersection, Difference of Queries (continued)

Q1 and Q2 are queries that produce relations

R1 and R2, or results of Q1 and Q2 should have
the same list of attributes and attribute types.
Rename if necessary.

Duplicates are eliminated automatically

Add the keyword ALL after UNION,
INTERSECT, or EXCEPT to prevent duplicates
elimination
19
© D. Wong 2003
SQL and Relational Algebra

The six independent operations are implemented
by SQL

SQL is relational complete
20
© D. Wong 2003
Some data values in SQL
1.
Strings
2.
Dates and Times
3.
Null values
4.
Truth value of Unknown
21
© D. Wong 2003
1. Strings

Comparison operators (according to lexicographical order)
<, >, <=, >= =

LIKE -- pattern matching
% -- matches any sequence of 0 or more characters
_ -- matches any one character
E.g.: title LIKE 'Star _ _ _ _'
E.g.: title LIKE '%''s%'
Can specify escape character
 E.g.
title LIKE 'x%%x%' ESCAPE 'x'
22
© D. Wong 2003
2. Dates and Times

Date constant: DATE '2002-10-01'

Time constant: TIME '15:00:02.5'

Timestamp (combines dates and times):
TIMESTAMP '2002-10-01 15:00:02.5‘
(beware of implementation differences!)

Comparison operators apply
23
© D. Wong 2003
3. Null Values





NULL to represent:
1. Value unknown
2. Value inapplicable
3. Value withheld
Operations involving NULL
1. Arithmetic operation: result is NULL
2. Comparison: result is UNKNOWN
NULL is not a constant, therefore NULL cannot be used
explicitly as an operand.
IS NULL and IS NOT NULL checks
Read "Pitfalls Regarding Nulls" pp. 250
24
© D. Wong 2003
4. UNKNOWN

Consider TRUE = 1, FALSE = 0, UNKNOWN =
0.5
1. AND of 2 truth-value = min. of the 2 values
2. OR of 2 truth-value = max. of the 2 values
3. Negation of v = 1-v

Refer Fig. 6.2 pp. 250 for truth table for 3-valued
logic
25
© D. Wong 2003
The Six Clauses in SQL Queries
1.
SELECT
-- required
2.
FROM
-- required
3.
WHERE
4.
GROUP BY
5.
HAVING
6.
ORDER BY

Subqueries may appear in the FROM clause and the
WHERE clause

Comments begins with ‘--’
-- if used, must follows a group by clause
26
© D. Wong 2003
Table level SQL (ref. 6.6, pp. 292)

Create table – to define the schema of a base table
(Ref. 6.6.1 for data types syntax)
E.g.
create table EMP
(
empno int not null,
lastName varchar(30) not null,
firstName varchar(30) not null,
num_of_children int,
constraint pk_EMP primary key (empno)
);

Drop table – to destroy a base table
e.g. drop table EMP;
27
© D. Wong 2003
Tuple Modification Statements (ref. 6.5, pp. 286)

Insert – to add a row
Syntax: insert into R(A1..An) values (v1…vn)
– E.g. insert into emp(empno, lastName, firstName,
num_of_children) values (12345, ‘Doe’, ‘John’, 1)
– Or insert into emp values (12345, ‘Doe’, ‘John’, 1)


Delete – to remove a row
Syntax: delete from R where <condition>
– E.g. delete from emp where empno = 12345
Update – to modify the contents of a row
Syntax: update R set Ai = value where Aj = targetValue
– E.g. update emp set num_of_children = 2 where empno =
12345
28
© D. Wong 2003
Some JOINS in SQL. (ref. pp. 270)

CROSS JOIN
--  R.A. cartesian product
e.g. Movie CROSS JOIN StarsIn;

JOIN … ON
--  R.A. theta-join
e.g. Movie JOIN StarsIn ON title = movieTitle AND year =
movieYear;

[NATURAL] JOIN
--  R.A. natural join
e.g. MovieStar NATURAL JOIN MovieExec; or
MovieStar JOIN MovieExec;

OUTERJOINS
-- joins that include dangling tuples
29
© D. Wong 2003
OUTERJOINS

An operator to augment the result of a join by the
dangling tuples, padded with null values.

Full outerjoin of R1 and R2 is a join that includes all
rows from R1 and R2 matched or not. Unmatched rows
are padded with NULLs.

LEFT outerjoin of R1 and R2 is a join that includes all
rows from R1, matched or not, plus the matching values
from R2. Unmatched rows are padded with NULLs.

RIGHT outerjoin of R1 and R2 is a join that includes all
rows from R2, matched or not, plus the matching values
from R1. Unmatched rows are padded with NULLs.

The joining may be NATURAL or theta join
30
© D. Wong 2003
Outerjoins Syntax
1.
R1 NATURAL {FULL | LEFT | RIGHT} OUTER
JOIN R2;
E.g. 1. MovieStar NATURAL FULL OUTER
JOIN MovieExec;
E.g. 2. MovieStar NATURAL LEFT OUTER
JOIN MovieExec;
E.g. 3. MovieStar NATURAL RIGHT OUTER
JOIN MovieExec;
31
© D. Wong 2003
Outerjoins Syntax (continued)
1.
R1 {FULL | LEFT | RIGHT} OUTER JOIN R2
ON conditional expression;
E.g. 1. Movie FULL OUTER JOIN StarsIn ON
title = movieTitle AND year = movieYear;
E.g. 2. MovieStar LEFT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
32
© D. Wong 2003
Use result of joins as subqueries in queries

E.g.
SELECT title, year, length, inColor, studioName,
producerC#, starName
FROM Movie JOIN StarsIn ON
title = movieTitle AND year = movieYear;
33
© D. Wong 2003