week9 - Arms-A
Download
Report
Transcript week9 - Arms-A
Normalization
Purpose: process to eliminate redundancy in
relations due to functional or multi-valued
dependencies.
Decompose relation schema into Normal forms:
– Boyce-Codd Normal Form (BCNF)
– Third Normal Form (3NF)
– Fourth Normal Form (4NF)
To obtain the new relations, project the schemas
onto the original relation schema (e.g. Movie)
To recover information (I.e. Movie) from the new
relations: natural join the new relations.
1
© D. Wong 2003
BCNF Decomposition Example 3.24 pp 104
Relation: Movie(title, year, length, filmType, studioName,
starName)
Key: {title, year, starName}
FD’s: title year length filmType studioName is a BCNF
violation, so Movie not in BCNF
Decomposition:
Schema 1: {title, year, length, filmType, studioName}
Schema 2: {title, year, starName}
To obtain the new relations, project the schemas onto Movie
To recover information (I.e. Movie) from the new relations:
natural join the new relations. Does not lose information.
2
© D. Wong 2003
Functional Dependencies (FD)
Given: relation schema R(A1, …, An), and X and
Y be subsets of (A1, … An).
FD : X Y means X functionally determines Y
e.g. A1A2…An B1B2…Bm
A1A2…An B1B2…Bm is an assertion about R
that two attributes or sets of attributes in R are
dependent of one another.
3
© D. Wong 2003
Mutivalued Dependencies (MVD)
relation schema R, and A1A2…An and B1B2…Bm be
subsets of attributes of R.
Given:
MVD : A1A2…An B1B2…Bm holds in R if :
For each pair of tuples t and u of relation R that agree on
all the A’s, we can find in R some tuple v that agrees:
1. With both t and u on the A’s,
2. With t on the B’s, and
3. With u on all attributes of R that are not among the
A’s or B’s
A1A2…An B1B2…Bm is an assertion about R that two
attributes or sets of attributes in R are independent of one
another.
Cause redundancy not related to FD’s in a BCNF schema.
Most common source: putting 2 or more many-many
relationships in a single relation.
4
© D. Wong 2003
MVD Rules
Trivial dependencies rule
If A1A2…An B1B2…Bm holds for R, then A1A2…An
C1C2…Ck holds where the C’s are the B’s + one or
more of the A’s. The converse also hold.
Transitive rule
If A1A2…An B1B2…Bm and B1B2…Bm C1C2…Ck
then A1A2…An C1C2…Ck
Splitting rule does not hold
E.g. name street city, but not name street
So, always start with set of attributes on the R.S. because
splitting rule does not hold.
5
© D. Wong 2003
More MVD Rules
Every FD is an MVD
Because If FD A1A2…An B1B2…Bm, then swapping B’s
between tuples that agree on A’s doesn’t create new tuples.
A’s
B’s
t
u
Complementation rule
If X Y, then X Z, where Z is all attributes not in
X or Y
e.g. Star_Star_In {name, street, city, title, year}
name street city
name title year
6
© D. Wong 2003
Nontrivial MVD
A1A2…An B1B2…Bm for a relation R is
nontrivial if:
1.
B1B2…Bm is not a subset of A1A2…An
2.
A1A2…An B1B2…Bm is not all attributes of R
7
© D. Wong 2003
Fourth Normal Form (4NF)
Decompose relations that has MVD’s into 4NF to
eliminate MVD’s.
Definition:
R is in 4NF if A1A2…An B1B2…Bm is a
nontrivial MVD, {A1A2…An} is a superkey.
Since every FD is an MVD, so 4NF is more
stringent than BCNF
Only nontrivial MVD’s has the potential to violate
4NF
8
© D. Wong 2003
4NF Decomposition
Given: relation R, and nontrivial MVD X Y that violate
4NF
1.
Decompose X Y into XY and X (R-Y)
R
X
Y
2.
Produce the relations by projecting R onto XY and
(R-Y)
X
3.
Reconstruct R from the new relations using natural join
e.g. Star_Star_In {name, street, city, title, year} and
name street city
Decompose Star_Star_In using name street city into
{name, street, city} and {name, title, year}
9
© D. Wong 2003
Relationships among normal forms
4NF is the most stringent
4NF BCNF 3NF
10
© D. Wong 2003
Lossless-join decomposition
Given: Relation R, decomposed into schemes R1, R2, …
Rk, and D is a set of dependencies.
Definition: R1, R2, … Rk is a lossless-join (w.r.t. D) if for
every relation r for R satisfying D:
r = R1(r)
R2(r)
…
Rk(r)
i.e. Every relation r for R is the natural join of
its projections onto the Ri’s.
The lossless-join property is necessary if the decomposed
relation is to be recoverable from its
decomposition.
However, joins are expensive. So, don’t over decompose!
11
© D. Wong 2003
Structured Query Language (SQL)
A DDL and DML for relational DBMSs
History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3)
SQL-99 extends SQL2 with object-relational features and
other new features
Most DBMS vendors implements the core, and then add
bells and whistles and variations
Query capability is close to relational algebra, with lots of
extensions.
Case insensitive except characters inside quoted strings ' '
e.g. 'Smith' 'SMITH'
; as statement delimiter
12
© D. Wong 2003
Example database schema
Movie(title, year, length, inColor, studioName, producerC#)
StartIn(movieTitle, movieYear, starName)
MovieStar(name, address, gender, birthdate)
MovieExec(name, address, cert#, netWorth)
Studio(name, address, presC#)
13
© D. Wong 2003
SQL Quries – basic form
SELECT attribute/s
FROM relations / views /subqury
WHERE conditional expression;
14
© D. Wong 2003
SQL query examples
1.
Example 1:
SELECT *
FROM Movie;
2.
-- * => all attributes of Movie
Example 2:
SELECT *
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
3.
Example 3:
SELECT title, length
FROM Movie
WHERE studioName = 'Disney' AND year = 1990;
15
© D. Wong 2003
Duplicates
SQL generally operates using bags instead of sets
Exception: UNION, INTERSECT, EXCEPT
operation
To eliminate duplicates, add keyword DISTINCT
to the SELECT clause
e.g. SELECT DISTINCT starName
FROM StarsIn;
Duplicate elimination is costly. Use judiciously.
16
© D. Wong 2003
SQL Correspondence to Relational Algebra
SELECT L
-- R.A. project
FROM R
-- R.A. operands
WHERE C ;
-- R.A. select
R.A. expression: L(C(R))
When reading and writing queries:
1.
FROM
-- what relations are involved
2.
WHERE
-- what's the tuples selection criteria
3.
SELECT
-- what columns to output
17
© D. Wong 2003
Union, Intersection, Difference of Queries
UNION : R1 UNION R2
or (Q1) UNION (Q2)
e.g. (SELECT title, year FROM Movie)
UNION
(SELECT movieTitle AS title, movieYear AS
year FROM StarsIn);
INTERSECT : R1 INTERSECT R2 or
(Q1) INTERSECT (Q2)
EXCEPT: R1 EXCEPT R2
-- difference
(Q1) EXCEPT (Q2)
18
© D. Wong 2003
Union, Intersection, Difference of Queries (continued)
Q1 and Q2 are queries that produce relations
R1 and R2, or results of Q1 and Q2 should have
the same list of attributes and attribute types.
Rename if necessary.
Duplicates are eliminated automatically
Add the keyword ALL after UNION,
INTERSECT, or EXCEPT to prevent duplicates
elimination
19
© D. Wong 2003
SQL and Relational Algebra
The six independent operations are implemented
by SQL
SQL is relational complete
20
© D. Wong 2003
Some data values in SQL
1.
Strings
2.
Dates and Times
3.
Null values
4.
Truth value of Unknown
21
© D. Wong 2003
1. Strings
Comparison operators (according to lexicographical order)
<, >, <=, >= =
LIKE -- pattern matching
% -- matches any sequence of 0 or more characters
_ -- matches any one character
E.g.: title LIKE 'Star _ _ _ _'
E.g.: title LIKE '%''s%'
Can specify escape character
E.g.
title LIKE 'x%%x%' ESCAPE 'x'
22
© D. Wong 2003
2. Dates and Times
Date constant: DATE '2002-10-01'
Time constant: TIME '15:00:02.5'
Timestamp (combines dates and times):
TIMESTAMP '2002-10-01 15:00:02.5‘
(beware of implementation differences!)
Comparison operators apply
23
© D. Wong 2003
3. Null Values
NULL to represent:
1. Value unknown
2. Value inapplicable
3. Value withheld
Operations involving NULL
1. Arithmetic operation: result is NULL
2. Comparison: result is UNKNOWN
NULL is not a constant, therefore NULL cannot be used
explicitly as an operand.
IS NULL and IS NOT NULL checks
Read "Pitfalls Regarding Nulls" pp. 250
24
© D. Wong 2003
4. UNKNOWN
Consider TRUE = 1, FALSE = 0, UNKNOWN =
0.5
1. AND of 2 truth-value = min. of the 2 values
2. OR of 2 truth-value = max. of the 2 values
3. Negation of v = 1-v
Refer Fig. 6.2 pp. 250 for truth table for 3-valued
logic
25
© D. Wong 2003
The Six Clauses in SQL Queries
1.
SELECT
-- required
2.
FROM
-- required
3.
WHERE
4.
GROUP BY
5.
HAVING
6.
ORDER BY
Subqueries may appear in the FROM clause and the
WHERE clause
Comments begins with ‘--’
-- if used, must follows a group by clause
26
© D. Wong 2003
Table level SQL (ref. 6.6, pp. 292)
Create table – to define the schema of a base table
(Ref. 6.6.1 for data types syntax)
E.g.
create table EMP
(
empno int not null,
lastName varchar(30) not null,
firstName varchar(30) not null,
num_of_children int,
constraint pk_EMP primary key (empno)
);
Drop table – to destroy a base table
e.g. drop table EMP;
27
© D. Wong 2003
Tuple Modification Statements (ref. 6.5, pp. 286)
Insert – to add a row
Syntax: insert into R(A1..An) values (v1…vn)
– E.g. insert into emp(empno, lastName, firstName,
num_of_children) values (12345, ‘Doe’, ‘John’, 1)
– Or insert into emp values (12345, ‘Doe’, ‘John’, 1)
Delete – to remove a row
Syntax: delete from R where <condition>
– E.g. delete from emp where empno = 12345
Update – to modify the contents of a row
Syntax: update R set Ai = value where Aj = targetValue
– E.g. update emp set num_of_children = 2 where empno =
12345
28
© D. Wong 2003
Some JOINS in SQL. (ref. pp. 270)
CROSS JOIN
-- R.A. cartesian product
e.g. Movie CROSS JOIN StarsIn;
JOIN … ON
-- R.A. theta-join
e.g. Movie JOIN StarsIn ON title = movieTitle AND year =
movieYear;
[NATURAL] JOIN
-- R.A. natural join
e.g. MovieStar NATURAL JOIN MovieExec; or
MovieStar JOIN MovieExec;
OUTERJOINS
-- joins that include dangling tuples
29
© D. Wong 2003
OUTERJOINS
An operator to augment the result of a join by the
dangling tuples, padded with null values.
Full outerjoin of R1 and R2 is a join that includes all
rows from R1 and R2 matched or not. Unmatched rows
are padded with NULLs.
LEFT outerjoin of R1 and R2 is a join that includes all
rows from R1, matched or not, plus the matching values
from R2. Unmatched rows are padded with NULLs.
RIGHT outerjoin of R1 and R2 is a join that includes all
rows from R2, matched or not, plus the matching values
from R1. Unmatched rows are padded with NULLs.
The joining may be NATURAL or theta join
30
© D. Wong 2003
Outerjoins Syntax
1.
R1 NATURAL {FULL | LEFT | RIGHT} OUTER
JOIN R2;
E.g. 1. MovieStar NATURAL FULL OUTER
JOIN MovieExec;
E.g. 2. MovieStar NATURAL LEFT OUTER
JOIN MovieExec;
E.g. 3. MovieStar NATURAL RIGHT OUTER
JOIN MovieExec;
31
© D. Wong 2003
Outerjoins Syntax (continued)
1.
R1 {FULL | LEFT | RIGHT} OUTER JOIN R2
ON conditional expression;
E.g. 1. Movie FULL OUTER JOIN StarsIn ON
title = movieTitle AND year = movieYear;
E.g. 2. MovieStar LEFT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn
ON title = movieTitle AND year = movieYear;
32
© D. Wong 2003
Use result of joins as subqueries in queries
E.g.
SELECT title, year, length, inColor, studioName,
producerC#, starName
FROM Movie JOIN StarsIn ON
title = movieTitle AND year = movieYear;
33
© D. Wong 2003