Insert Title Here

Download Report

Transcript Insert Title Here

Introduction to Databases:
Relational and XML Models
and Languages
Instructors:
Bertram Ludaescher
Kai Lin
Overview
•
•
•
•
•
•
•
•
•
09:15-10:20
10:20-10:30
10:30-11:50
11:50-13:15
13:15-13:45
13:45-15:10
15:10-15:30
15:30-16:30
16:30-17:00
Relational Databases (1h05’)
BREAK (10’)
Relational Databases (1h20’)
LUNCH (1h25’)
Demo & Hands-on (30’)
XML: Basics (1h25’)
BREAK (20’)
XML: Querying (1h)
Demo & Hands-on (30’)
Introduction to Databases, B. Ludaescher & K. Lin
2
Scope
• Today: Introduction to Databases, in particular
–
–
–
–
–
–
Relational database model
Relational Operations and Queries
Constraints
XML “data” model
Querying and transforming XML
Some demos & simple hands-on exercise
• Tomorrow:
– Introduction to Knowledge Representation and
Ontologies
• But first: … déjà vu …
Introduction to Databases, B. Ludaescher & K. Lin
3
Introduction to Databases, B. Ludaescher & K. Lin
4
What is a Database System?
• Database (system) =
– Database Instance (set of tables of rows)
– Database Management System (DBMS)
• Origins in the commercial world:
– to organize, query, and manipulate data more
effectively, efficiently, and independently
• Scientific databases
– often special features:
• spatial, temporal, spatiotemporal, GIS, units,
uncertainty, raw & derived data, …
Introduction to Databases, B. Ludaescher & K. Lin
5
Why not just use files as “databases”?
• For some applications: yeah… why not?
• But in general:
– scanning & ’grep’ing large files can be very inefficient
– no language support for selecting desired data, joining
them, etc.
• cannot express the kinds of questions/queries you’d like to ask
• ‘grep’ is no substitute for a query language
– redundant and/or inconsistent storage of data
– no transaction management and concurrency control
among multiple users
– no security
– no recovery
– no data independence (application data)
– no data modeling support
– …
Introduction
to Databases, B. Ludaescher & K. Lin
6
Features of a Database System
• A data model (relational, object-oriented,
XML) prescribes how data can be organized:
– as relations (tables) of tuples (rows)
– as classes of (linked) objects
– as XML trees
• A (database) schema (stored in the “data
dictionary”) defines the structure of a
specific database instance:
– Relational schema
– OO schema
– XML Schema (or XML DTD)
Introduction to Databases, B. Ludaescher & K. Lin
7
Features of a Database System
• Data is treated uniformly and separately from the
application
• Efficient data access
• Queries and views are expressed over the schema
• Integrity constraints (checking and enforcement)
• Transactions combine sets of operations into
logical units (all-or-nothing)
• Synchronization of concurrent user transactions
• Recovery (after system crash)
– not to be confused w/ backup
– instead: guarantee consistency by “roll-back” of partially
executed transactions (how? Hint: logging)
• …
Introduction to Databases, B. Ludaescher & K. Lin
8
DB features, e.g., Concurrency Control
• Concurrent execution of simultaneous requests
– long before web servers where around...
– transaction management guarantees consistency despite
concurrent/interleaved execution
• Transaction (= sequence of read/write operations)
– Atomicity: a transaction is executed completely or not at
all
– Consistency: a transaction creates a new consistent DB
state, i.e., in which all integrity constraints are
maintained
– Isolation: to the user, a transaction seems to run in
isolation
– Durability: the effect of a successful (“committed”)
transaction remains even after system failure
Introduction to Databases, B. Ludaescher & K. Lin
9
Levels of Abstraction: Architecture Overview
User
Conceptual
…
View 1
View 2
logical data
independence
Level
ER-Model
(Entity-Relationship)
OO Models (Classes…)
View n Export schemas
Logical (“conceptual”) level Tables
part of DB design
 conceptual design
… often lost in the
process…
Introduction to Databases, B. Ludaescher & K. Lin
physical data independence
Physical level Index structures
DB
instances
10
Database Design: Entity-Relationship (ER) Model
Name
Salary
Employee
•
•
•
•
Name
works-for
Entities:
Relationships:
Attributes:
ER Model:
Manager
Department
since
– initial, high-level DB design (conceptual model)
– easy to map to a relational schema (database tables)
– comes with more constraints (cardinalities, aggregation) and
extensions: EER (is-a => class hierarchies)
– related: UML (Unified Modeling Language) class diagrams
Introduction to Databases, B. Ludaescher & K. Lin
11
The Relational Model
• Relation/Table Name:
Employee
Emp
tom
tim
sally
carol
carol
….
Salary
60k
57k
45k
30k
35k
DNo
1
1
3
1
2
FK: foreign key,
pointing to another key
Department
DNo
1
2
3
Name
Toys
Comp.
Shoes
Mgr
carol
carol
sam
Introduction to Databases, B. Ludaescher & K. Lin
– employee, dept
• Attributes = Column Names:
– Emp, Salary, DeptNo, Name, Mgr
• Relational Schema:
– employee(Emp:string, Salary:integer,
DeptNo:integer), ...
• Tuple = Row of the table:
– (“tom”, “60000”, “1”)
• Relation = Set of tuples:
– {(...), (...), ...}
12
Ex: Creating a Relational Database in SQL
CREATE TABLE employee (
ssn
CHAR(11),
name
VARCHAR(30),
deptNo
INTEGER,
PRIMARY KEY (ssn),
FOREIGN KEY (deptNo) REFERENCES department
)
CREATE TABLE department (
deptNo
INTEGER,
name
VARCHAR(20),
manager
CHAR(11),
PRIMARY KEY (deptNo),
FOREIGN KEY (manager) REFERENCES employee(ssn) )
Introduction to Databases, B. Ludaescher & K. Lin
13
What is a Query?
• Intuitively:
– An “executable question” in terms of a database schema
– Evaluating a query Q against a database instance D yields
a set of answer objects:
• Relational tuples or XML elements
• Example:
– Who are the employees in the ‘Toys’ dept.?
– Who is (are) the manager(s) of ‘Tom’?
– Show all pairs (Employee, Mgr)
• Technically:
– A mapping from an input schema (the given table
schemas) to a result schema (the new columns you are
interested in) defined in some query language
Introduction to Databases, B. Ludaescher & K. Lin
14
Why (Declarative) Query Languages?
“If you have a hammer,
everything looks like a nail.”
,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”
“The limits of my language mean the limits of my world.”
Ludwig Wittgenstein, Tractatus Logico-Philosophicus
• Things we talk and think about in PLs and QLs …
– Assembly languages:
• registers, memory locations, jumps, ...
– C and the likes:
• if-then-else, for, while, memory (de-)allocation, pointers, ...
– Object-oriented languages:
•
•
•
•
C++: C plus objects, methods, classes, ...
Java: objects, methods, classes, references, ...
Smalltalk: objects, objects, objects, ...
OQL: object-query language
Introduction to Databases, B. Ludaescher & K. Lin
15
Why (Declarative) Query Languages?
• Things we talk and think about in PLs and QLs …
– Functional languages (Haskell, ML):
• (higher-order) functions, fold(l|r), recursion,
patterns, ...
=> Relational languages (SQL, Datalog)
• relations (tables), tuples (rows); conceptual level: ER
• relational operations: , , , , ..., ,,,,,..., , , |X|
=> Semistructured/XML (Tree) & Graph Query
Languages
• trees, graphs, nodes, edges, children nodes, siblings, …
• XPath, XQuery, …
• Also:
– Focus on what, and not how!
Introduction to Databases, B. Ludaescher & K. Lin
16
Example: Querying a Relational Database
input tables
Employee
Department
Emp
anne
john
DeptNoMgr
1
anne
2
anne
Salary DeptNo
62k
2
60k
1
join
SQL query
(or view def.)
SELECT e.Emp, d.Mgr
FROM Employee e, Department d
we don’t say how to
WHERE e.DeptNo = d.DeptNo
evaluate this
expression
result
answer
(or view)
Introduction to Databases, B. Ludaescher & K. Lin
Emp
john
anne
17
Mgr
anne
anne
Example Query: SQL vs DATALOG
• “List all employees and their managers”
• In SQL:
SELECT e.name, d.manager
FROM Employee e, Department d
WHERE e.deptNo = d.deptNo
a “join” operation
• In DATALOG:
q(E, M) :- employee(E, S, D), department(D, N, M).
Introduction to Databases, B. Ludaescher & K. Lin
18
Important Relational Operations
• select(R, Condition)
– filter rows of a table wrt. a condition
• project(R, Attr)
– remove unwanted columns; keep rest
• join(R1, A2, R2, A2, Condition)
– find “matches” in a “related” table
– e.g. match R1.foreign key = R2.primary key
• cartesian product(R1, R2)
• union (“OR”), intersection (“AND”)
• set-difference (“NOT IN”)
Introduction to Databases, B. Ludaescher & K. Lin
19
Relational Operations (in DATALOG)
(query) output :– (query) input
condition
Y1=Y2Y
independent
same
multiple rules
 union results
Introduction to Databases, B. Ludaescher & K. Lin
20
Demo
Relational
Queries in
DATALOG
Introduction to Databases, B. Ludaescher & K. Lin
21
Queries, Views, Integrity Constraints
• … can all be seen as “special queries”
• Query q(…) :- …
ad-hoc queries
• View v(…) :- …
exported views;
• Integrity Constraints
– ic (…) :- …. MgrSal < EmpSal …
– say what shouldn’t happen
– if it does: alert the user (or refuse an update, …)
Introduction to Databases, B. Ludaescher & K. Lin
22
Query Evaluation vs Reasoning
• Query evaluation
– Given a database instance D and a query Q, run Q(D)
– What databases do all the time
• Reasoning (aka “Semantic Query Optimization”)
– Given a query Q and a constraint C, “optimize” Q&C (e.g.,
given C, Q might be unsatisfiable)
– Given Q1 and Q2 decide whether Q1 Q2
– Given Q1,Q2, C decide whether Q1 Q2 | C
– Note: we are NOT given a database instance D here; just
the schema and the query/IC expressions
Introduction to Databases, B. Ludaescher & K. Lin
23
Summary QLs for Relational Databases
Natural Hoin: same attribute name
 add condition that values must match
Introduction to Databases, B. Ludaescher & K. Lin
24
Relational Algebra
Introduction to Databases, B. Ludaescher & K. Lin
25
Relational Algebra
Introduction to Databases, B. Ludaescher & K. Lin
26
Relational Algebra
Introduction to Databases, B. Ludaescher & K. Lin
27
Relational Algebra
Introduction to Databases, B. Ludaescher & K. Lin
28
Relational Algebra
Introduction to Databases, B. Ludaescher & K. Lin
29
Hands-on Part
DBDesigner 4
Introduction to Databases, B. Ludaescher & K. Lin
30
DBDesigner 4
•
1.
2.
3.
4.
5.
6.
7.
8.
An open source (GPL) database design tool
Goto http://www.fabforce.net/dbdesigner4/
Download and install (5 min)
Open the example schema Order (File -> Open -> Order), and
examine the relations between the tables forumtopic and forumpost.
Find the foreign key used in the relation postHasTopic (5 min)
Connect to a sample MySQL database
host: geon07.sdsc.edu
port: 3306
database: summer_institute
username: root
password: [blank]
(5 min)
Select all records in the table forumtopic (2 min)
Select all records in the table forumpost, and sort the result
according to their idforumpost (3min)
Find all forum posts with the topic Cars (5 min)
Insert a record into the table forumpost (5 min)
Introduction to Databases, B. Ludaescher & K. Lin
31
Additional Material
(not presented)
Introduction to Databases, B. Ludaescher & K. Lin
32
Non-Relational Data Models
• Relational model is “flat”: atomic data values
– nesting is modeled via “pointers” (foreign keys) and “Skolem-ids”
– extension: nested relational model (“tables within tables”, cf.
nested HTML tables)
– values can be nested lists {...}, tuples (...), sets [...]
– ISO standard(s): SQL
– identity is value based
• Object-oriented data model:
– complex (structured) objects with object-identity (oid)
– class and type hierarchies (sub-/superclass, sub-/supertype)
– OODB schema may be very close to “world model” (no translation
into tables)
(+) queries fit your OO schema
(-) (new) queries that don’t fit nicely
– ODMG standard, OQL (Object Query Language)
Introduction to Databases, B. Ludaescher & K. Lin
33
Example: Object Query Language (OQL)
SELECT DISTINCT STRUCT(
E: e.name,
C: e.manager.name,
M: ( SELECT c.name
FROM c IN e.children
WHERE FOR ALL d IN e.manager.children: c.age > d.age ) )
FROM e IN Employees;
• Q: what does this OQL query compute?
• Note the use of path expressions like
e.manager.children
=> Semistructured/Graph Databases
Introduction to Databases, B. Ludaescher & K. Lin
34
A Graph Database
Introduction to Databases, B. Ludaescher & K. Lin
35
Querying Graphs with OO-Path Expressions
?- dblp."Inf. Systems".L."Michael E. Senko".
Answer:
L="Volume 1, 1975”;
L="Volume 5, 1980".
?- dblp."Inf. Systems".L.P, substr("Volume",L),
P : person.spouse[lives_in = P.lives_in].
Introduction to Databases, B. Ludaescher & K. Lin
36
Constructs for Querying Graphs
Example: ?- dblp . any* . (if(vldb)| if(sigmod))
Introduction to Databases, B. Ludaescher & K. Lin
37
Keys, Keys, and more Keys
• A key is a minimal set of attributes that:
– uniquely identify a tuple
– determine every other attribute value in a tuple
• There may be many keys for a relation; we
designate one as the primary key
• The phrase candidate key is used in place of “key”
where “the key” denotes the primary key
• A superkey is a superset of a key (i.e., not
necessarily minimal)
Introduction to Databases, B. Ludaescher & K. Lin
38
Normalization of Relations
Example of “good” design
Employee(EName, SSN, BDate, Address, DNumber)
Department(DName, DNumber, DMgrSSN)
Example of “bad” design (why is it bad?)
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
Introduction to Databases, B. Ludaescher & K. Lin
39
What’s Wrong?
Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)
The description of the department (DName, DMgrSSN) is
repeated for every employee that works in that
department.
Redundancy!
The department is described redundantly.
This leads to update anomalies! (… and wastes space)
Digression (for experts only):
• redundancy can be used to increase performance; e.g.
“materialized views”)
Introduction to Databases, B. Ludaescher & K. Lin
40
Update Anomalies (caused by redundancy)
Insertion Anomalies
If you insert an employee
Need to know which department he/she works
Need to know the description information for that department
If you want to insert a department, you can’t … until there
is at least one employee
Deletion Anomalies: if you delete an employee, is that dept.
gone? was this the last employee in that dept?
* Modification Anomalies: if you change DName, for example,
it needs to be changed everywhere!
Introduction to Databases, B. Ludaescher & K. Lin
41
Null values also cause problems
Null values might help in special cases, but are not a
general solution to update anomalies
For example, they may:
–
–
–
–
Waste space
Make it hard to specify and understand joins
Make it hard to aggregate (count, sum, etc.)
Have different meanings:
• Attribute does not apply to this tuple
• Attribute value is unkown
• Value is known but absent (not yet recorded)
– Causes problems with queries (can’t interpret query
answers)
Introduction to Databases, B. Ludaescher & K. Lin
42
Why worry about normalization?
• To …
• … reduce redundancy & update anomalies
• … reduce the need for null values
• Last not least: it’s a solved problem:
•all algorithms & proofs have been worked out
Here: Normalization based on FDs (there’s more…)
Introduction to Databases, B. Ludaescher & K. Lin
43
Functional Dependencies
Statement that if two tuples agree on attributes A
they must agree on attributes B
AB
If the value of the first attribute(s), A, is known,
then the value of the second attribute(s), B, is
known (read “A determines B”)
We want to know if it is always true in the application
Introduction to Databases, B. Ludaescher & K. Lin
44
Functional Dependencies
Examples of functional dependencies:
social-security-number  employee-name
course-number  course-title
Examples that are NOT functional
dependencies
course-number - book
course-number - car-color
Introduction to Databases, B. Ludaescher & K. Lin
45
What is a functional dependency?
Remember what it means to be a function:
x f(x)
1 2
1 3
2 5
3 5
x g(x)
1
2
2
2
3
5
x h(x)
1 10
2 20
3 30
f is not a function
for x=1, f(x) is not unique
we are looking for functional relationships
(that must occur in a relation) among attribute values
Introduction to Databases, B. Ludaescher & K. Lin
46
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
Introduction to Databases, B. Ludaescher & K. Lin
47
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
1
2
5
3
6
4
9
10
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
7
8
Introduction to Databases, B. Ludaescher & K. Lin
48
Why do we care?
EMPLOYEE (SSN, NAME, SALARY, JOB_DESC)
If all FDs are “implied by the key”
it means that the DBMS only enforces keys (not FDs) and
the DBMS is going to enforce the keys anyway
otherwise, we have to perform expensive operations to
maintain consistency (code or check statements)
Introduction to Databases, B. Ludaescher & K. Lin
49
Decomposition
Main refinement technique: decomposition
(replacing ABCD with, say, AB and BCD, or
ACD and ABD) based on the projection
operator.
Decomposition should be used judiciously:
– Is there reason to decompose a relation?
(via Normal Forms)
– What problems (if any) does the
decomposition cause? (lost information or
dependencies?)
Introduction to Databases, B. Ludaescher & K. Lin
50
How can we decompose using the
Project Operator?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
2
1
EMP1(SSN, ENAME, BDATE, ADDRESS, DNUM)
3
4
X(DNUM, DNAME, DMGRSSN)
5
6
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
EMP2(SSN,8ENAME)
9
X(PNUM, PNAME, PLOCATION)
10
Y(SSN, PNUM, HOURS)
7
Introduction to Databases, B. Ludaescher & K. Lin
51
Correct Decompositions
How do we know if a decomposition is correct?
That we haven’t lost anything?
• We have three goals:
lossless-join decomposition
(don’t throw any information away)
(be able to reconstruct the original relation)
dependency preservation
all of the FDs end up in just one relation
(not split across two or more relations)
Boyce-Codd Normal Form (BCNF) - no redundancy
Introduction to Databases, B. Ludaescher & K. Lin
52
Lossless Decompositions
• What is a lossless decomposition?
• What is a lossy decomposition?
When R is decomposed into R1 and R2
Check to see if (R1
R2) = R
if it is a lossy decomposition, then R1
gives you TOO MANY tuples.
Note: we are doing a natural join
Introduction to Databases, B. Ludaescher & K. Lin
53
R2
Example: a lossy decomposition
Original:
Employee SSN
1
2
3
Name
Smith
Jones
Smith
Decomposition:
Employee SSN
1
2
3
Name
Smith
Jones
Smith
Project
p1
p1
p2
Ptitle
accounting
accounting
billing
Project PID
p1
p1
p2
Ptitle
accounting
accounting
billing
Name
Smith
Jones
Smith
Now join them using natural join: we get extra
tuples!!!
1
smith
Introduction to Databases, B. Ludaescher & K. Lin
p2
billing
54
Testing for a Lossless Decomposition
For decomposition into two relations
Let R1 and R2 form a decomposition of R.
R1 and R2 are both sets of attributes from R.
For the decomposition to be lossless ...
The attributes in common must be a key for 1 of the
relations!
Note: You are joining a key and a foreign key
Introduction to Databases, B. Ludaescher & K. Lin
55
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition:
Employee (SSN, name)
Project (project, p-title, name)
Which attribute is in common?
Employee.Name and Project.Name
Is name a key for either of these two tables?
NO! We have a problem.
Introduction to Databases, B. Ludaescher & K. Lin
56
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition:
Employee (SSN, name)
Project (project, p-title)
Which attribute is in common?
None
Is this decomposition lossless?
NO! We have a problem.
Introduction to Databases, B. Ludaescher & K. Lin
57
Example: test for a lossless decomposition
Employee(SSN, name, project, p-title)
decomposition:
Employee (SSN, name, project)
Project (project, p-title)
Which attribute is in common?
Employee.project and Project.project
Is project a key for either of these two tables?
YES! We have a lossless decomposition.
Introduction to Databases, B. Ludaescher & K. Lin
58
Some Preliminary Normal Form Definitions
Prime Attribute - an attribute A is prime if it
is a member of a key, otherwise it is
nonprime
Partial Dependency - given an FD XY, Y is
partially dependent on X if there is a proper
subset X of X such that XY.
Transitive Dependency - A is transitively
dependent upon X if XY, Y-/X, and YA
and AXY.
Introduction to Databases, B. Ludaescher & K. Lin
59
What are the FDs?
EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)
5
2
1
3
6
4
9
10
EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)
7
8
Introduction to Databases, B. Ludaescher & K. Lin
60
Normal Forms Based on FDs
1NF - all attribute values (domain values) are atomic
(part of the definition of the relational model)
2NF - 1NF + no key partial dependencies (every nonprime
attribute fully depends on every key of R)
R(A B C D)
BC
(not allowed)
3NF - 2NF + no nonprime attribute is transitively dependent
upon any key
R(A B C D)
AB C, C  D
(not allowed)
BCNF - 3NF + no attribute is transitively dependent upon any
key (all FDs determined by a key)
R(A B C D)
Introduction to Databases, B. Ludaescher & K. Lin
AB C, C  B
61
(not allowed)
What’s the Goal?
BCNF, Lossless, and Dependency-Preserving
(first choice)
3NF, Lossless and Dependency-Preserving
(second choice)
because sometimes you can’t preserve all dependencies
Introduction to Databases, B. Ludaescher & K. Lin
62
Finding all of the FDs
Armstrong’s Axioms
Reflexivity: If X  Y, then X  Y
Trivially, all attributes A  A,
e.g., name  name and gender  gender
Augmentation: If X  Y, then XZ  YZ for any Z
Transitivity: If X  Y and Y  Z, then X  Z
X, Y, and Z are sets of attributes in R
Armstrong’s Axioms are a sound & complete set of inference rules
Union: If X  Y and X  Z, then X  YZ
Decomposition: If X  YZ, then X  Y and X  Z
Introduction to Databases, B. Ludaescher & K. Lin
63
Finding all FDs: the closure of a set of FDs
The closure of a set of FDs:
Let F be a set of FDs and F+ the closure of F.
F+ is the set of all FDs implied (or derivable)
from F using Armstrong’s Axioms
F+ is computed by applying the inference rules
until no new FDs can be found
Introduction to Databases, B. Ludaescher & K. Lin
64
What is “Dependency Preserving”
Suppose F is the original set of FDs and G is
the set of FDs after decomposition
If we compute F+ and G+, and F+ = G+ then the
decomposition is dependency preserving
Introduction to Databases, B. Ludaescher & K. Lin
65
Other Results (textbook)
Algorithms To:
– Compute F+
– Compute the Minimal Cover for F
– Find a dependency-preserving
decomposition into 3NF
– Find a lossless-join decomposition into
BCNF
– Find a lossless-join & dependency
preserving decomposition into 3NF
Introduction to Databases, B. Ludaescher & K. Lin
66
Example: Not dependency preserving
addr(number, street, city, state, zip)
number, street, city, state  zip
zip  state
If we decompose, this FD won’t occur within
one relation. Thus, since the DBMS only
enforces keys (and not FDs directly), this
can’t be enforced.
(If we leave it alone, it is in 3NF)
Introduction to Databases, B. Ludaescher & K. Lin
67
Lossless join decomposition algorithm
1. Set D = {R}
(the current set of relations)
2. While there is a relation in D that’s not in BCNF
Choose a relation scheme Q that is not in BCNF
Find a FD X  Y in Q that violates BCNF
Replace Q in D by (Q – Y) and (X Y)
End While;
3. Identify dependences that are not preserved (X
A).
4. Add XA as a table to the set D
Introduction to Databases, B. Ludaescher & K. Lin
68
Tuning the Conceptual Schema
The choice of conceptual schema should be guided by
the workload, in addition to redundancy issues:
– We may settle for a 3NF schema rather than BCNF.
– E.g., the workload may influence our choice to decompose
a relation into 3NF or BCNF.
– We may further decompose a BCNF schema!
– We might denormalize (i.e., undo a decomposition step),
or add fields to a relation. We must take care to avoid
the problems caused by the redundancy!
Introduction to Databases, B. Ludaescher & K. Lin
69
Decomposition of one table into several
A relation is replaced by a collection of relations that are
projections. Most important case. (Vertical partitioning)
• Sometimes, we might want to replace a relation by a
collection of relations that are selections. (Horizontal
partitioning)
– Each new relation has the same schema as the original, but a subset
of the rows.
– Collectively, new relations contain all rows of the original. Typically,
the new relations are disjoint.
Why might we do this?
Introduction to Databases, B. Ludaescher & K. Lin
70
Tuning the Conceptual Schema
Vertical decomposition using the project
operator
Course
c#
cname
Course1
c# room
instructor
Course3
c#
instructor
days
Course2
c#
cname
Introduction to Databases, B. Ludaescher & K. Lin
room days
71
Tuning the Conceptual Schema
Horizontal partition/decomposition using the
select operator
Course
c#
cname
instructor
room days
Undergraduate-Course
c#
cname
instructor
room days
Graduate-Course
c#
cname
room days
Introduction to Databases, B. Ludaescher & K. Lin
instructor
72
Tuning the Conceptual Schema
Denormalizing……always introduces redundancy!
Course-Offering (offering#, quarter, c#, instructor, time, days)
Course (c#, cname)
Instructor-Room (instructor, room)
Course-Offering
(offering#, quarter, c#, cname, instructor, room, time, days)
This introduces redundancy - which must be managed.
Introduction to Databases, B. Ludaescher & K. Lin
73
Denormalization
Denormalization interferes with dependency
preservation
Course-Offering
(offering#, quarter, c#, cname, instructor, room, time, days)
Note that having the FD instructor  room (and
nothing else) in a single table allows the FD to be
enforced (by enforcing the candidate key of
instructor).
Instructor-Room (instructor, room)
Introduction to Databases, B. Ludaescher & K. Lin
74