Slides - Computer Science - University of Birmingham

Transcript Slides - Computer Science - University of Birmingham

Fundamentals/ICY: Databases
2010/11
REVISION WEEK
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
EXAM
(May/June;
fine detail for Resit may differ)
See info re past exams on my module webpages.
Structure of Exam
One and a half hours.
Four questions.
DO THREE.
Material on mathematical relations and relational algebra is
only used in one of the four Questions. (That Question may
also involve other material.)
The Remaining Three Questions
They will range from precise technical things to more
general considerations.
SQL query expressions required in several parts of the
three questions. Amount to about 28% of the marks for
those three questions.
Extra credit of 8% is available in one of the questions for
providing SQL create expressions.
Extra credit of 8% is available in one of the questions for
providing creative ERD notation suggestions.
Material Needed for Exam
 Anything in the required textbook reading may be useful in the
exam, except of course that a detailed memory of specific, data-full
examples is not expected, and except for some SQL detail (see next
slide).
 You need to study all Additional Notes, except that:


The exam will not rely on the treatment of functional dependencies and
normalization there (in the 1st of the three parts in the Week 9 batch).
The exam will not rely on material on physical design (in the 3rd of the three
parts in the Week 9 batch).
 You need to study all Exercise Answer Notes.
 The content of Funmi’s lecture on experiences in industry will not be
relied upon (but of course may help you in overall understanding of
some issues).
Textbook Parts (R&C 2009)
See my module website (top page).
On SQL: the exam doesn’t rely on fine detail beyond
what’s in the handouts (and occasional lectures).
MODULE REVIEW
What We Mainly Studied
 The nature of relational databases, the central modern type of
database. Entity types represented as tables, holding relations.
 Some basic mathematical concepts underpinning relational
databases, and useful also in many other branches of CS.
 Key aspects of how to develop the conceptual/logical design of
relational databases.
 In particular, how to achieve certain types of good structuring, to
help achieve certain types of correctness and efficiency.
 How to create and query databases using a particular database
language, PostgreSQL (a version of SQL: very widely used in
various forms).
Initial Considerations
What a database is, and how it relates to other types
of data structure/repository in CS.
Data integrity, data redundancy, data anomalies.
Associative links between parts of a database, as
opposed to pointing.
Ways data is stored/linked in physical human media
such as diaries, address books and timetables.
Various complications in tables in human
documents.
Restricted type of table used in relational DBs.
Entities and Relationships
Any relational DB as consisting of entity types
and relationships between them: Entity
Relationship Model (ERM) in general.
Specific ERMs for specific applications, and
distinction from Entity Relationship Diagrams
(ERDs).
Entity Types as represented by tables.
The question of what types of thing should
correspond to “entity types,” and hence tables,
depends on the application and your design
judgment.
Attribute Determination and Keys
One or more attributes determining another attribute. Can
also be described as that attribute being functionally
dependent on the former attributes.
Various notions of key, especially superkeys, candidate keys
and primary keys …
And foreign keys as the (main) implementation of
relationships between entity types.
Strength and Weakness
Strong and weak relationships. (Also called identifying
and non-identifying relationships respectively.)
Weak entity types, as defined according to the strength of
their relationships to other entity types and existencedependence with respect to those types.
Depiction of strength or weakness in different styles of
ERD.
Connectivity, Cardinality & Participation
Connectivity: uniqueness or multiplicity of entities at
either end of a relationship.
Cardinality: precise numerical info about how many
entities allowed or required at either end of a
relationship.
Participation: optionality or mandatoriness of a
relationship, in either direction.
Overlap between these notions.
Notation in ERDs.
Table Representation of Relationships
of Different Connectivities
 Basic case is 1:M non-recursive. (Recursive is when two or more
entity types in a relationship are the same.)
 1:M recursive—can often be handled within a single table.
 M:N, M:N:P, etc. standardly handled by breaking down into two,
three, etc. 1:M relationships going to a new entity type: a
“bridging” or “linking” type.
 Symmetric relationships (e.g. marriage). Special problems of
redundancy arise. Can be avoided by a special implementation
involving two extra tables, together serving a bridging function.

Symmetric relationships are recursive and are either 1:1 or M:N.
1-1 Relationships
 If you find yourself using a 1-1 relationship, especially when
unchanging, mandatory in both directions, and non-recursive: ask
yourself whether the two entity types could be combined into one
without causing difficulty.

E.g., if there were an unchanging 1-1 relationship, mandatory both ways
round, between employees and phone stations, you could probably combine
into one entity type without difficulty, increasing efficiency of some
operations.

But if the relationship changes frequently, easier to have 2 types/tables.

And if the relationship is optional in at least one direction, using one type
leads to more wasted space.
1-1 Relationships, contd
 Some good, standard uses of 1-1 relationships:

[As just mentioned] Cases where there is a significant amount of change or
optionality.

Subtype/supertype relationships: naturally 1-1; useful to keep the types
separate.

Symmetric 1-1 relationships such as marriage: only one entity type anyway.

Cases where one of the two types is hidden from some users by access
controls.
Other Representation Issues
Multivalued attributes. OK in themselves in early stages
of design, but should eventually be broken down into
single-valued attributes in some way.
A main divergence in ways of doing this is based on
whether the different values are for stably identifiable
sub-attributes.
Generalization hierarchies. Exhaustiveness, disjointness.
Normalization
What normalization is and what role it plays in the
database design process.
The normal forms 1NF, 2NF, 3NF, BCNF.

4NF (two versions) was left as optional material.
How tables can be transformed from lower normal
forms to higher normal forms.
That normalization and ER modeling are used
concurrently to produce a good database design,
helping to eliminate data redundancies & anomalies.
That some situations benefit from non-normalization to
gain efficiency for some operations.
Creating ER Models/Diagrams
Designing an ER model for a database is an iterative
process, because, e.g.:

As you proceed, you think of new ways of conceiving what’s
going on (much as in ordinary programming)

Multivalued attributes need to be re-represented

M:N relationships can be included as such at an early stage,
but generally need to be replaced by bridging entity types at
some point

1:1 relationships raise a red flag, though may be justified.

Special supertype/subtype notation needs eventually to be
converted into more standard diagram notation, to correspond
to the actual tables used.

Conversion to Normal Form (NB: different parts of the DB
may have different needs)
SQL
Mainly, the module only covers basic SQL mechanisms for
querying, creating and updating tables.
See manual and textbook for much more if you want!
MATHEMATICAL VIEW
Tuples, Relations and Tables
A relation on some sets A, B, C, … is simply a set of tuples
all of the same length, where in each tuple the first element
is from A, the second from B, etc.
A relation is therefore a subset of the Cartesian product of
those sets.
A row is a tuple. Hence a table at any given moment
induces a relation over the value domains of the table
(augmented as appropriate with the value NULL).
The table consists of not just the induced relation but also
the attributes themselves, their domains, specification of
primary and foreign keys, etc.
Functionality of Relations
 Functional relation from A, B, …C to some sets:
for each choice of values from A, B, …, the relation contains at most
one tuple starting with those values.

Also called a partial function.
 Functional dependence relationship, i.e determination relationship
X, Y, …, Z  U within a table:
induces a functional relation from the value domains for X, Y, …, Z
to the value domain for the determined attribute U.
Some Operations on Sets in General
Union, intersection, difference, symmetric difference and
Cartesian product of two sets X and Y (of any sort).
When X and Y are relations:
the set of all possible concatenations of a tuple within X
and a tuple within Y.
I have called this the flattened Cartesian product of X and
Y, notated as X  Y as opposed to X  Y.
Often called the relational product in the DB world.
(Shouldn’t be, but often is, called the Cartesian product.)
Relational DB Operators &
Relational Algebra
Defines theoretical way of manipulating tables using
“relational DB operators” that mainly manipulate the
relations in the tables.
• SELECT
• UNION
• PROJECT
• DIFFERENCE
• JOIN (various sorts)
• PRODUCT
• INTERSECT
• (DIVIDE)
Use of relational DB operators on existing tables
produces new tables. Strong connection to SQL
commands/operators.
Relational algebra puts relational DB operators into a
mathematical notation that is more convenient than, e.g.,
SQL operators.
QUESTIONS?

Slides - Computer Science - University of Birmingham

Transcript Slides - Computer Science - University of Birmingham

Directory