Challenges in Natural Language Processing:
Download
Report
Transcript Challenges in Natural Language Processing:
Fundamentals/ICY: Databases
2012/13
REVISION WEEK
John Barnden
Professor of Artificial Intelligence
School of Computer Science
University of Birmingham, UK
Structure of Exam
One and a half hours. Do THREE Questions.
Qn 1: on SQL: obligatory (Fundamentals and ICY). Weight: 32%
Fundamentals:
Do two out of remaining three Questions. Weight: each 34%.
One is on maths (relations, relational algebra)
No maths in the other two questions.
ICY:
Do two out of remaining four Questions. Weight: each 34%.
One is on maths (relations, relational algebra).
No maths in the other three questions.
Hence Fundamentals and ICY students have same DEGREE OF
CHOICE (2/3) over the material they are EXPECTED to know.
Fundamentals students have more pressure to do the Maths question.
Structure of Exam, contd
The Maths question is on the connections of the maths
material to
SQL (what SQL operators do, translation to and from relational algebra).
database concepts (nature of tables, functional dependencies, operations on
tables, nature of relationships between entity types)
Material Needed for Exam
Lecture material except when specified as optional ((incl. by double brackets))
Required textbook reading:
Anything may be useful in the exam, except of course that a detailed memory of
specific, data-full examples is not expected, and except for some SQL detail (see next
slide).
All Additional Notes, except that:
The exam will NOT rely on the treatment of functional dependencies and normalization
there (in the 1st of the three parts in the Week 9 batch).
The exam will NOT rely on material on physical design (in the 3rd of the three parts in
the Week 9 batch).
All Exercise Answer Notes.
BUT NOT the content of Keerthi’s lecture on his industrial experience (but of
course it may help you in overall understanding of some issues).
Textbook Parts
See my module website (top page).
On SQL: the exam doesn’t rely on fine detail beyond
what’s in the handouts (and occasional lectures).
MODULE REVIEW
What We Mainly Studied
The nature of relational databases, the central modern type of
database. Entity types represented as tables, holding relations.
Some basic mathematical concepts underpinning relational
databases, and useful also in many other branches of CS.
Key aspects of how to develop the conceptual/logical design of
relational databases.
In particular, how to achieve certain types of good structuring, to
help achieve certain types of correctness and efficiency.
How to create and query databases using a particular database
language, PostgreSQL (a version of SQL: very widely used in
various forms).
Initial Considerations
What a database is, and how it relates to other types
of data structure/repository in CS.
Data integrity, data redundancy, data anomalies.
Associative links between parts of a database, as
opposed to pointing.
Ways data is stored/linked in physical human media
such as diaries, address books and timetables.
Various complications in tables in human
documents.
Restricted type of table used in relational DBs.
Entities and Relationships
Any relational DB as consisting of entity types
and relationships between them: Entity
Relationship Model (ERM) in general.
Specific ERMs for specific applications, and
distinction from Entity Relationship Diagrams
(ERDs).
Entity Types as represented by tables.
The question of what types of thing should
correspond to “entity types,” and hence tables,
depends on the application and your design
judgment.
Attribute Determination and Keys
One or more attributes determining another attribute. Can
also be described as that attribute being functionally
dependent on the former attributes.
Various notions of key, especially superkeys, candidate keys
and primary keys …
And foreign keys as the (main) implementation of
relationships between entity types.
Strength and Weakness
Strong and weak relationships. (Also called identifying
and non-identifying relationships respectively.)
Weak entity types, as defined according to the strength of
their relationships to other entity types and existencedependence with respect to those types.
Depiction of strength or weakness in different styles of
ERD.
Connectivity, Cardinality & Participation
Connectivity: uniqueness or multiplicity of entities at
either end of a relationship.
Cardinality: precise numerical info about how many
entities allowed or required at either end of a
relationship.
Participation: optionality or mandatoriness of a
relationship, in either direction.
Overlap between these notions.
Notation in ERDs.
Table Representation of Relationships
of Different Connectivities
Basic case is 1:M non-recursive. (Recursive is when two or more
entity types in a relationship are the same.)
1:M recursive—can often be handled within a single table.
M:N, M:N:P, etc. standardly handled by breaking down into two,
three, etc. 1:M relationships going to a new entity type: a
“bridging” or “linking” type.
Symmetric relationships (e.g. marriage). Special problems of
redundancy arise. Can be avoided by a special implementation
involving two extra tables, together serving a bridging function.
Symmetric relationships are recursive and are either 1:1 or M:N.
1-1 Relationships
If you find yourself using a 1-1 relationship, especially when
unchanging, mandatory in both directions, and non-recursive: ask
yourself whether the two entity types could be combined into one
without causing difficulty.
E.g., if there were an unchanging 1-1 relationship, mandatory both ways
round, between employees and phone stations, you could probably combine
into one entity type without difficulty, increasing efficiency of some
operations.
But if the relationship changes frequently, easier to have 2 types/tables.
And if the relationship is optional in at least one direction, using one type
leads to more wasted space.
1-1 Relationships, contd
Some good, standard uses of 1-1 relationships:
[As just mentioned] Cases where there is a significant amount of change or
optionality.
Subtype/supertype relationships: naturally 1-1; useful to keep the types
separate.
Symmetric 1-1 relationships such as marriage: only one entity type anyway.
Other Representation Issues
Multivalued attributes. OK in themselves in early stages
of design, but should eventually be broken down into
single-valued attributes in some way.
A main divergence in ways of doing this is based on
whether the different values are for stably identifiable
sub-attributes.
Generalization hierarchies. Exhaustiveness, disjointness.
Normalization
What normalization is and what role it plays in the
database design process.
The normal forms 1NF, 2NF, 3NF, BCNF.
4NF (two versions) was left as optional material.
How tables can be transformed from lower normal
forms to higher normal forms.
That normalization and ER modeling are used
concurrently to produce a good database design,
helping to eliminate data redundancies & anomalies.
That some situations benefit from non-normalization to
gain efficiency for some operations.
Creating ER Models/Diagrams
Designing an ER model for a database is an iterative
process, because, e.g.:
As you proceed, you think of new ways of conceiving what’s
going on (much as in ordinary programming)
Multivalued attributes need to be re-represented
M:N relationships can be included as such at an early stage,
but generally need to be replaced by bridging entity types at
some point
1:1 relationships raise a red flag, though may be justified.
Special supertype/subtype notation needs eventually to be
converted into more standard diagram notation, to correspond
to the actual tables used.
Conversion to Normal Form (NB: different parts of the DB
may have different needs)
SQL
Mainly, the module only covers basic SQL mechanisms for
querying, creating and updating tables.
See manual and textbook for much more if you want!
MATHEMATICAL VIEW
Tuples, Relations and Tables
A relation on some sets A, B, C, … is simply a set of tuples
all of the same length, where in each tuple the first element
is from A, the second from B, etc.
A relation is therefore a subset of the Cartesian product of
those sets.
A row is a tuple. Hence a table at any given moment
induces a relation over the value domains of the table
(augmented as appropriate with the value NULL).
The table consists of not just the induced relation but also
the attributes themselves, their domains, specification of
primary and foreign keys, etc.
Functionality of Relations
Functional relation from A, B, …C to some sets:
for each choice of values from A, B, …, the relation contains at most
one tuple starting with those values.
Also called a partial function.
Functional dependence relationship, i.e determination relationship
X, Y, …, Z U within a table:
induces a functional relation from the value domains for X, Y, …, Z
to the value domain for the determined attribute U.
Relations from Entity Relationships
The connection between relationships in ERMs and mathematical
relations.
E.g., the EMPLOYED-BY relationship from the People entity type to the
Organizations entity type says that
the database (at any moment) stores a relation from the People entity set to
the Organizations entity set.
The connection between connectivity of a relationship between
entity types and the issue of whether the corresponding relation is
one-to-one, functional, etc.
The connection between mandatoriness of a relationship from
entity set E to entity set F and notion of relation totality. (A
mandatory relation is total on the set of current E things.)
Some Operations on Sets in General
Union, intersection, difference, symmetric difference and
Cartesian product of two sets X and Y (of any sort).
When X and Y are relations:
Note the set of all possible concatenations of a tuple within
X and a tuple within Y.
I have called this the flattened Cartesian product of X and
Y, notated as X Y as opposed to X Y.
Often called the relational product in the DB world.
(Shouldn’t be, but often is, called the Cartesian product.)
Relational DB Operators &
Relational Algebra
Defines theoretical way of manipulating tables using
“relational DB operators” that mainly manipulate the
relations in the tables.
• SELECT
• UNION
• PROJECT
• DIFFERENCE
• JOIN (various sorts)
• PRODUCT
• INTERSECT
• (DIVIDE)
Use of relational DB operators on existing tables
produces new tables. Strong connection to SQL
commands/operators.
Relational algebra puts relational DB operators into a
mathematical notation that is more convenient than, e.g.,
SQL operators.
QUESTIONS?