Challenges in Natural Language Processing:

Download Report

Transcript Challenges in Natural Language Processing:

Fundamentals/ICY: Databases
2012/13
REVISION WEEK
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
Structure of Exam
 One and a half hours. Do THREE Questions.
Qn 1: on SQL: obligatory (Fundamentals and ICY). Weight: 32%
 Fundamentals:



Do two out of remaining three Questions. Weight: each 34%.
One is on maths (relations, relational algebra)
No maths in the other two questions.
 ICY:



Do two out of remaining four Questions. Weight: each 34%.
One is on maths (relations, relational algebra).
No maths in the other three questions.
 Hence Fundamentals and ICY students have same DEGREE OF
CHOICE (2/3) over the material they are EXPECTED to know.
Fundamentals students have more pressure to do the Maths question.
Structure of Exam, contd
The Maths question is on the connections of the maths
material to
 SQL (what SQL operators do, translation to and from relational algebra).
 database concepts (nature of tables, functional dependencies, operations on
tables, nature of relationships between entity types)
Material Needed for Exam
 Lecture material except when specified as optional ((incl. by double brackets))
 Required textbook reading:

Anything may be useful in the exam, except of course that a detailed memory of
specific, data-full examples is not expected, and except for some SQL detail (see next
slide).
 All Additional Notes, except that:


The exam will NOT rely on the treatment of functional dependencies and normalization
there (in the 1st of the three parts in the Week 9 batch).
The exam will NOT rely on material on physical design (in the 3rd of the three parts in
the Week 9 batch).
 All Exercise Answer Notes.
 BUT NOT the content of Keerthi’s lecture on his industrial experience (but of
course it may help you in overall understanding of some issues).
Textbook Parts
See my module website (top page).
On SQL: the exam doesn’t rely on fine detail beyond
what’s in the handouts (and occasional lectures).
MODULE REVIEW
What We Mainly Studied
 The nature of relational databases, the central modern type of
database. Entity types represented as tables, holding relations.
 Some basic mathematical concepts underpinning relational
databases, and useful also in many other branches of CS.
 Key aspects of how to develop the conceptual/logical design of
relational databases.
 In particular, how to achieve certain types of good structuring, to
help achieve certain types of correctness and efficiency.
 How to create and query databases using a particular database
language, PostgreSQL (a version of SQL: very widely used in
various forms).
Initial Considerations
What a database is, and how it relates to other types
of data structure/repository in CS.
Data integrity, data redundancy, data anomalies.
Associative links between parts of a database, as
opposed to pointing.
Ways data is stored/linked in physical human media
such as diaries, address books and timetables.
Various complications in tables in human
documents.
Restricted type of table used in relational DBs.
Entities and Relationships
Any relational DB as consisting of entity types
and relationships between them: Entity
Relationship Model (ERM) in general.
Specific ERMs for specific applications, and
distinction from Entity Relationship Diagrams
(ERDs).
Entity Types as represented by tables.
The question of what types of thing should
correspond to “entity types,” and hence tables,
depends on the application and your design
judgment.
Attribute Determination and Keys
One or more attributes determining another attribute. Can
also be described as that attribute being functionally
dependent on the former attributes.
Various notions of key, especially superkeys, candidate keys
and primary keys …
And foreign keys as the (main) implementation of
relationships between entity types.
Strength and Weakness
Strong and weak relationships. (Also called identifying
and non-identifying relationships respectively.)
Weak entity types, as defined according to the strength of
their relationships to other entity types and existencedependence with respect to those types.
Depiction of strength or weakness in different styles of
ERD.
Connectivity, Cardinality & Participation
Connectivity: uniqueness or multiplicity of entities at
either end of a relationship.
Cardinality: precise numerical info about how many
entities allowed or required at either end of a
relationship.
Participation: optionality or mandatoriness of a
relationship, in either direction.
Overlap between these notions.
Notation in ERDs.
Table Representation of Relationships
of Different Connectivities
 Basic case is 1:M non-recursive. (Recursive is when two or more
entity types in a relationship are the same.)
 1:M recursive—can often be handled within a single table.
 M:N, M:N:P, etc. standardly handled by breaking down into two,
three, etc. 1:M relationships going to a new entity type: a
“bridging” or “linking” type.
 Symmetric relationships (e.g. marriage). Special problems of
redundancy arise. Can be avoided by a special implementation
involving two extra tables, together serving a bridging function.

Symmetric relationships are recursive and are either 1:1 or M:N.
1-1 Relationships
 If you find yourself using a 1-1 relationship, especially when
unchanging, mandatory in both directions, and non-recursive: ask
yourself whether the two entity types could be combined into one
without causing difficulty.

E.g., if there were an unchanging 1-1 relationship, mandatory both ways
round, between employees and phone stations, you could probably combine
into one entity type without difficulty, increasing efficiency of some
operations.

But if the relationship changes frequently, easier to have 2 types/tables.

And if the relationship is optional in at least one direction, using one type
leads to more wasted space.
1-1 Relationships, contd
 Some good, standard uses of 1-1 relationships:

[As just mentioned] Cases where there is a significant amount of change or
optionality.

Subtype/supertype relationships: naturally 1-1; useful to keep the types
separate.

Symmetric 1-1 relationships such as marriage: only one entity type anyway.
Other Representation Issues
Multivalued attributes. OK in themselves in early stages
of design, but should eventually be broken down into
single-valued attributes in some way.
A main divergence in ways of doing this is based on
whether the different values are for stably identifiable
sub-attributes.
Generalization hierarchies. Exhaustiveness, disjointness.
Normalization
What normalization is and what role it plays in the
database design process.
The normal forms 1NF, 2NF, 3NF, BCNF.

4NF (two versions) was left as optional material.
How tables can be transformed from lower normal
forms to higher normal forms.
That normalization and ER modeling are used
concurrently to produce a good database design,
helping to eliminate data redundancies & anomalies.
That some situations benefit from non-normalization to
gain efficiency for some operations.
Creating ER Models/Diagrams
Designing an ER model for a database is an iterative
process, because, e.g.:

As you proceed, you think of new ways of conceiving what’s
going on (much as in ordinary programming)

Multivalued attributes need to be re-represented

M:N relationships can be included as such at an early stage,
but generally need to be replaced by bridging entity types at
some point

1:1 relationships raise a red flag, though may be justified.

Special supertype/subtype notation needs eventually to be
converted into more standard diagram notation, to correspond
to the actual tables used.

Conversion to Normal Form (NB: different parts of the DB
may have different needs)
SQL
Mainly, the module only covers basic SQL mechanisms for
querying, creating and updating tables.
See manual and textbook for much more if you want!
MATHEMATICAL VIEW
Tuples, Relations and Tables
A relation on some sets A, B, C, … is simply a set of tuples
all of the same length, where in each tuple the first element
is from A, the second from B, etc.
A relation is therefore a subset of the Cartesian product of
those sets.
A row is a tuple. Hence a table at any given moment
induces a relation over the value domains of the table
(augmented as appropriate with the value NULL).
The table consists of not just the induced relation but also
the attributes themselves, their domains, specification of
primary and foreign keys, etc.
Functionality of Relations
 Functional relation from A, B, …C to some sets:
for each choice of values from A, B, …, the relation contains at most
one tuple starting with those values.

Also called a partial function.
 Functional dependence relationship, i.e determination relationship
X, Y, …, Z  U within a table:
induces a functional relation from the value domains for X, Y, …, Z
to the value domain for the determined attribute U.
Relations from Entity Relationships
 The connection between relationships in ERMs and mathematical
relations.

E.g., the EMPLOYED-BY relationship from the People entity type to the
Organizations entity type says that
the database (at any moment) stores a relation from the People entity set to
the Organizations entity set.
 The connection between connectivity of a relationship between
entity types and the issue of whether the corresponding relation is
one-to-one, functional, etc.
 The connection between mandatoriness of a relationship from
entity set E to entity set F and notion of relation totality. (A
mandatory relation is total on the set of current E things.)
Some Operations on Sets in General
Union, intersection, difference, symmetric difference and
Cartesian product of two sets X and Y (of any sort).
When X and Y are relations:
Note the set of all possible concatenations of a tuple within
X and a tuple within Y.
I have called this the flattened Cartesian product of X and
Y, notated as X  Y as opposed to X  Y.
Often called the relational product in the DB world.
(Shouldn’t be, but often is, called the Cartesian product.)
Relational DB Operators &
Relational Algebra
Defines theoretical way of manipulating tables using
“relational DB operators” that mainly manipulate the
relations in the tables.
• SELECT
• UNION
• PROJECT
• DIFFERENCE
• JOIN (various sorts)
• PRODUCT
• INTERSECT
• (DIVIDE)
Use of relational DB operators on existing tables
produces new tables. Strong connection to SQL
commands/operators.
Relational algebra puts relational DB operators into a
mathematical notation that is more convenient than, e.g.,
SQL operators.
QUESTIONS?