Logical Data Modelling

Download Report

Transcript Logical Data Modelling

44271: Database Design & Implementation
Logical Data Modelling
(Avoiding Database Anomalies)
Ian Perry
Room: C49
Tel Ext.: 7287
E-mail: [email protected]
http://itsy.co.uk/ac/0405/sem3/44271_DDI/
What is a Logical Data Model?

A ‘robust’ representation of the initial
decisions made when building our
Conceptual Data Model, which was
composed of:




Entities
Attributes
Relationships
When I say ‘robust’ I mean that this
model MUST ‘perform’ well with respect
to a specific style/type of software.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 2
Database Theories & Software

Hardware independent, the match to
‘type’ of software is only concern, e.g.:




Hierarchical DBMS
Relational DBMS
Object-based DBMS
Each Database Theory addresses:



Ian Perry
Data Structure
Data Integrity
Data Manipulation
44271: Database Design & Implementation: Logical Data Modelling
Slide 3
Database Theory = Relational Model

First proposed by Dr. E. F. Codd in June 1970.



Codd's model is now accepted as the definitive model
for relational database management systems
(RDBMS).
Structured English QUEry Language ("SEQUEL") was
developed by IBM Corporation, Inc., to use Codd's
model.


Codd E F, (1970), A Relational Model of Data for Large
Shared Data Banks, Communications of the ACM, Vol. 13, No.
6, Pgs 377 – 387.
SEQUEL later became SQL.
In 1979, Relational Software, Inc. (now Oracle
Corporation) introduced the first commercial
implementation of SQL.

Ian Perry
SQL is the most widely used RDBMS manipulation language.
44271: Database Design & Implementation: Logical Data Modelling
Slide 4
Relations look like Entities, but …

Entity

Staff(SCode, Name, Address, DoB, …)


Relation


Staff(SCode, Name, Address, DoB, DoE)
Entity

Contract(CCode, Site, Begin, End, …)


May discover requirement for ‘extra’ Attributes, and also
need to ‘complete’ our list of Attributes for each Relation.
Can’t draw relationship lines, so need to ‘add’ extra
attributes to Relations at the ‘M’ end of any ‘1:M’
relationships; e.g. 1 Staff “take part in” M Contract.
Relation

Ian Perry
Contract(CCode, Site, Begin, End, SCode)
44271: Database Design & Implementation: Logical Data Modelling
Slide 5
Use Tables to ‘flesh-out’ your Logical Model

Staff(SCode, Name, Address, DoB, DoE)
SCode
9491
7416
8912
Name
Smith
Day
Jones
Address
6 Shaw St
2 Sale St
15 Ayr Av
DoB
13/02/65
14/01/57
28/12/76
DoE
03/10/98
22/11/02
01/03/04
 Contract (CCode, Site, Begin, End, SCode)
CCode
279
665
183
Site
Hull
York
York
Begin
27/02/05
14/09/04
04/03/05
End
03/03/05
02/12/04
16/06/05
SCode
9491
7416
9491
 NB. Tables ARE NOT Relations!
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 6
Primary & Foreign Keys

Most important Attributes in a Relation are
know as ‘Keys’:


Primary Key:


One, or more, Attribute(s) that identify a unique
occurrence of the ‘Entity’ that this ‘Relation’
represents.
Foreign Key:


of which there are two types.
Attributes used (i.e. instead of the lines of an ER
Diagram) to represent the presence of
relationships.
Often referred to as:

Ian Perry
The Primary/Foreign Key Mechanism.
44271: Database Design & Implementation: Logical Data Modelling
Slide 7
Attributes, Domains & Relationships

Attribute Values should be atomic (i.e.
simple/single values only); e.g.:


Set of eligible Attribute Values is known as an
Attribute’s Domain; e.g.:


‘address’ should be separated into ‘street’ & ‘town’
& ‘postcode’.
if we only have 100 members of staff, then the
Domain of the ‘SCode’ Attribute could be “whole
numbers between 1 & 100”.
The Relational Model is weak at explicitly
modelling relationships:

Ian Perry
Attributes in different Relations MUST HAVE
same Attribute Domain for relationship to be
possible.
44271: Database Design & Implementation: Logical Data Modelling
Slide 8
Codd’s Rules

Each Tuple (i.e. row) MUST BE unique, i.e.:


Therefore:


each Relation MUST HAVE a Primary Key.
There may be many Candidates for the job of
Primary Key, so select on basis of:


need a way to discriminate between Tuples.
uniqueness AND/OR minimality.
Keys with more than one Attribute:

Ian Perry
are know as composite keys.
44271: Database Design & Implementation: Logical Data Modelling
Slide 9
Rules for Integrity


No Attribute that is part of the Primary
Key can assume a ‘null’ value, else:
 how could we discriminate between
Tuples?
Foreign Key Attributes must take values
that are either ‘null’, or from same
Domain as the Primary Key Attribute to
which they are logically linked, else:
 we will lose the possibility of making
relationships.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 10
Avoiding Database Anomalies

Most Database books have a section
describing a mathematically-based
technique called Normalisation:


I will show you a much easier way of
achieving the same result.
What we want to achieve is a ‘robust’
Logical Data Model; i.e. by:



Ian Perry
Transforming a Conceptual Data Model into
a set of Relations.
Checking these Relations for any Anomalies.
Documenting them as a Database Schema.
44271: Database Design & Implementation: Logical Data Modelling
Slide 11
What is an Anomaly?


Anything we try to do with a database
that may lead to unexpected and/or
unpredictable results.
Three types of Anomaly; i.e.:




insert
delete
update
Need to check your database design
carefully:

Ian Perry
the only good database is an anomaly free
database.
44271: Database Design & Implementation: Logical Data Modelling
Slide 12
Insert Anomaly

When we want to enter a value into a data cell
but the attempt is prevented, as the primary
key value is not known.
CoNo Tutor Room RSize EnLimit
353 Smith A532
45
40
351 Smith C320
100
60
355
Clark H940
400
300
456 Turner H940
400
45
 e.g. We have built a new Room (e.g. B123), but
it has not yet been timetabled for any courses
(so we don’t have a CoNo value).
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 13
Delete Anomaly

When a value we want to delete also means we
will delete values we wish to keep.
CoNo Tutor Room RSize EnLimit
353 Smith A532
45
40
351 Smith C320
100
60
355
Clark H940
400
300
456 Turner H940
400
45
 e.g. CoNo 351 has ended, but Room C320 will
be used elsewhere.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 14
Update Anomaly

When we want to change a single data item
value, but must update multiple entries
CoNo Tutor Room RSize EnLimit
353
Smith A532
45
40
351
Smith C320
100
60
355
Clark H940
400
300
456 Turner H940
400
45
 e.g. Room H940 has been improved, it is now of
RSize = 500.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 15
Conceptual Model & Translation Process

Conceptual Model:
Course
M
M
Staff
1
M
Student
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)
Student(Enrol-No, Name, Address, OLevelPoints, ...)
Course(CourseCode, Name, Duration, ...)
 Translation Process:




Ian Perry
Entities become Relations
Attributes become Attributes(?)
Key Attribute(s) become Primary Key(s)
Relationships are represented by additional Foreign Key
Attributes;
 for those Relations that are at the ‘M’ end of each 1:M
Relationship.
44271: Database Design & Implementation: Logical Data Modelling
Slide 16
The ‘Staff’ & ‘Student’ Relations
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...)
becomes:
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Student(Enrol-No, Name, Address, OLevelPoints, ...)
becomes:
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
NB. Foreign Key Tutor references Staff.Staff-ID
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 17
The ‘Staff’ & ‘Course’ Relations
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration, ...)
becomes:
Course(CourseCode, Name, Duration)
NB. Can’t ‘simply’ add extra attributes to act as
Foreign Keys; as BOTH Relations have a ‘M’ end:
 I warned you about leaving M:M relationships
in your Conceptual Data Model.
MUST create an ‘artificial’ linking Relation.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 18
‘Staff’, ‘Course’ & ‘Team’ Relations
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
NB. In the ‘artificial’ Team Relation:
Primary Key is a ‘composite’ of CourseCode & Staff-ID
Foreign Key CourseCode references Course.CourseCode
Foreign Key Staff-ID references Staff.Staff-ID
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 19
4 Relations from 3 Entities?
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
OK, BUT are they anomaly free?
• Is every Tuple unique?
• i.e. is there a Primary Key.
• Are the Attributes Atomic?
• i.e. do they store only ONE item of data.
• Does every Attribute within each Relation
‘depend’ upon the Primary Key?
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 20
What if the checks fail?

If any Relation fails ‘checks’:


we MUST split that Relation into
multiple Relations:


especially those checking dependency.
until they pass the tests.
but MUST remember to leave behind a
Foreign Key:

Ian Perry
to ‘point’ forwards to the Primary Key of
the ‘new’ split-off Relation.
44271: Database Design & Implementation: Logical Data Modelling
Slide 21
Are they Anomaly Free?
NOT very Atomic;
Could easily be split into
‘Street’, ‘Town’ & ‘PostCode’.
Student(Enrol-No, Name, Address, OLevelPoints, Tutor)
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Ian Perry
NOT Dependent
upon Staff-ID;
Requires a slightly
more complex ‘solution’.
44271: Database Design & Implementation: Logical Data Modelling
Slide 22
‘Fixing’ the Dependency ‘Problem’
Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB)

The Attribute ‘RateOfPay’ depends upon
‘ScalePoint’ NOT ‘Staff-ID’.

So, we MUST remove ‘RateOfPay’ from the ‘Staff’
Relation, like this:
Staff(Staff-ID, Name, ScalePoint, DOB)
Pay(ScalePoint, RateOfPay)
NB. In the ‘Staff’ Relation:
Foreign Key ScalePoint references Pay.ScalePoint
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 23
5 Relations from 3 Entities
Student(Enrol-No, Name, Street, Town, PostCode,
OLevelPoints, Tutor)
Staff(Staff-ID, Name, ScalePoint, DOB)
Course(CourseCode, Name, Duration)
Team(Staff-ID, CourseCode)
Pay(ScalePoint, RateOfPay)

Now all we need to do:

Ian Perry
Is to document our ‘Anomaly Free’
Relations as a Database Schema.
44271: Database Design & Implementation: Logical Data Modelling
Slide 24
Document Relations as a Database Schema

A Database Schema:


We may/should have ‘discovered’ a number of
constraints during our analysis of the Business
situation, e.g:




defines all Relations, lists all Attributes (with their
Domains), and identifies all Primary & Foreign
Keys.
the College only delivers 10 Courses.
there are only 12 Points on the Pay Scale.
Staff MUST be at least 21 Years Old.
These constraints can/should be expressed as
the ‘Domains’ of the Database Schema.
Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 25
Logical Schema 1 - Domains


Schema College
Domains









Ian Perry
StudentIdentifiers = 1 - 9999;
StaffIdentifiers = 1001 - 1199;
GeneralNames = TextString (15 Characters);
Addresses = TextString (20 Characters);
PostCodes = TextString (7 or 8 Characters);
CourseIdentifiers = 101 - 110;
OLevelPoints = 0 - 100;
ScalePoints = 1 - 12;
StaffBirthDates = Date (dd/mm/yyyy), >21 Years
before Today;
44271: Database Design & Implementation: Logical Data Modelling
Slide 26
Logical Schema 2 - Relations

Relation Student









Ian Perry
Enrol-No: StudentIdentifiers;
Name: GeneralNames;
Street: Addresses;
Town: Addresses;
PostCode: PostCodes;
OLevelPoints: OLevelPoints;
Tutor: StaffIdentifiers;
Primary Key: Enrol-No
Foreign Key Tutor references Staff.Staff-ID
44271: Database Design & Implementation: Logical Data Modelling
Slide 27
Logical Schema 3 - Relations

Relation Staff

Staff-ID: StaffIdentifiers;
Name: GeneralNames;
ScalePoint: ScalePoints;
DOB: StaffBirthDates;

Primary Key: Staff-ID

Foreign Key ScalePoint references Pay.ScalePoint



Ian Perry
44271: Database Design & Implementation: Logical Data Modelling
Slide 28
Logical Schema ...

Relation Course





CourseCode: CourseIdentifiers;
Name: GeneralNames;
… etc.
Continue to define each of the Relations
in a similar manner.
NB. Make sure that you define ALL of
the Relations, including:


Ian Perry
‘artificial’ ones (e.g. Team)
‘split-off’ ones (e.g. Pay)
44271: Database Design & Implementation: Logical Data Modelling
Slide 29
This Week’s Workshop

The purpose of this week’s Workshop is
to practice developing ‘robust’ logical
data models that conform to the ‘rules’
of Codd’s Relational Model.



Ian Perry
Exploring the ‘definition’ of Relations.
Identifying potential anomalies in a Table
of data, and ‘solving’ these ‘problems’.
Documenting a Database Schema (i.e. a
Logical Model), in the format required by
Part 2 of the Assignment.
44271: Database Design & Implementation: Logical Data Modelling
Slide 30