ER Modelling

Download Report

Transcript ER Modelling

The Relational Model
01/28/2014 – Material from Chapter 4 (Chap2 and Chap3 make an appearance)
Project Proposal - ???’s
 Added some clarification.
 What questions do you have about the project?
 Data sets are available online at
 psql –h db.cs.jmu.edu vlds – to play with the data
 if already inside psql \c databaseName
Homework debrief
 Scripts – If all is working correctly, you can run through psql.
 psql –h hostname –f filename
 -- comment to end of line (inline comment)
 /* C-like comment, possibly multiple lines */ (block comment)
 easier to build a text file and cut and paste commands in. Easier to
change.
 Can also use the –e option in psql to echo queries sent to server to
standard output. > to pipe to a file.
Datatypes
 http://www.postgresql.org/docs/9.3/static/datatype.html
 Of note: varchar (n). Will save a maximum of n characters in
the corresponding field.
 If all of that space is not needed, then it will use less space.
 money
 date and time types
Review
2.2.1
acctNo
type
balance
12345
savings
12000.00
23456
checking
1000.00
34567
savings
25.25
a.
b.
c.
d.
e.
f.
g.
h.
attributes
tuples
one tuple from each relation
relation schema
database schema
a domain for each attribute
an equivalent relation
a possible key to the
relation
firstName
lastName
idNo
account
Robbie
Banks
901-222
12345
Lena
Hand
805-333
12345
Lena
Hand
805-333
23456
Review
Exercise – 2.2.1
Accounts(balance : float, acctNo : integer,
type : string)
acctNo
type
balance
12345
savings
12000.00
23456
checking
1000.00
34567
savings
25.25
a.
b.
c.
d.
e.
f.
g.
h.
attributes
tuples
one tuple from each relation
relation schema
database schema
a domain for each attribute
an equivalent relation
a possible key to the
relation
Customer(firstName : string, lastName : string,
idNo : string, account : integer)
firstName
lastName
idNo
account
Robbie
Banks
901-222
12345
Lena
Hand
805-333
12345
Lena
Hand
805-333
23456
Model (freeonlinedictionary.com)
A schematic description of a system, theory, or phenomenon
that accounts for its known or inferred properties and may be
used for further study of its characteristics: a model of
generative grammar; a model of an atom; an economic model; a
database model.
Gives us a way of understanding the database and the
interaction among its objects.
Design vs Implementation
Ideas
High level
design
Database
Schema
Relational
DBMS
The Relational Model itself
 The database is formed from a series of interconnected
tables.
 Each table has one or more attributes (fields).
 Each table has zero or more rows (tuples).
 Tables can be connected by keys.
How do we express that model in design?
 Various design tools
 ER Diagramming (Peter Chen – 1976) (Remember Codd’s
paper in 1970)
 Databases can be expressed as “entities” who are connected
by “relationships”.
From Chen’s paper
In the ER-Diagram Chen Style
Id
Entity
Teacher
Name
Teaches
Relationship
Number
Attribute
Classes
Class
Multiplicity
 Each connection can have a number associated with it.
 1:1 – one to one - each entity in one relation has at most one entity
in the other and vice versa.
 ex: a faculty member and their office.
 1:M – one to many – each entity in one relation has many related
entities in another.
 ex: one faculty member teaches many courses but each course is
taught by only one faculty member.
 M:M (often shown as M:N) many to many - each entity in one
relation may have many related entities in another and vice versa.
 ex: a supplier supplies many products and each product is supplied
by many suppliers.
In the ER-Diagram Chen Style
Id
Entity
Name
1
Teaches
Relationship
Number
Attribute
Teacher
M
Classes
Class
Instances
 Each ERD describes a database schema.
 If that schema is realized as a database, an instance of the
database is a snapshot of the database at a particular period
of time.
A database instance
Accounts(balance : float, acctNo : integer,
type : string)
acctNo
type
balance
12345
savings
12000.00
23456
checking
1000.00
34567
savings
25.25
Customer(firstName : string, lastName : string,
idNo : string, account : integer)
firstName
lastName
idNo
account
Robbie
Banks
901-222
12345
Lena
Hand
805-333
12345
Lena
Hand
805-333
23456
Keys (Chapter 3.1.2)
 “A key for an entity set E is a set K of one or more attributes
such that, given any two distinct entities e1 and e2 in E, e1 and
e2 cannot have identical values for each of the attributes in
the key.” Ullman
 A key is a set of attributes that uniquely define a row of a
table.
 There may be more than one key. One is selected as the
primary key.
 One additional requirement – No proper subset of the key
attributes can themselve be a key.
Keys
 Person(id, email, last, first, dob, SSN)
 id, email, last-first-dob, SSN may all be considered keys.
 id-email, last-first-dob-SSN would not be keys. In each, there is a
proper subset of elements that itself is a key.
 We would call id-email and last-first-dob-SSN superkeys…a set of
attributes that contain a key.
 Every key is a superkey but not every superkey is a key.
 We sometimes use the term candidate keys for the possible keys to
the relation. From the candidates, a single key is chosen.
Keys in ERD
composite key
title
Id
year
Movies
length
Own
Studios
Runs
Presidents
genre
name
Constraints (Referential Integrity)
Many-one relationship->optional tuple
Referential integrityrequired tuple
Movies
Own
Studios
Runs
Presidents
So what does this ERD tell us?
The curve arrow is a 1 relationship. The filled arrow is a
0..1 relationship. (I may have a studio that does not
currently have a president, but if I have a president, she
must run a studio.)
A studio may own 0..* Movies, but a Movie must be
owned by a single Studio.
Constraints (Referential Integrity)
Movies
0..m
Own
1
Studios
1
Runs
0..1
Presidents
Alternate expression of the same idea
The curve arrow is a 1 relationship. The filled arrow is a
0..1 relationship. (I may have a studio that does not
currently have a president, but if I have a president, she
must run a studio.)
A studio may own 0..* Movies, but a Movie must be
owned by a single Studio.
Other kinds of relationships - self
A movie may have many sequels, but a sequel can
have only one original.
original
Movies
sequel
of
sequel
Other kinds of relationships - multiway
Movies
pay
Stars
Contracts
Producing studio, the arrow
implies that there
is at most one producing studio
Stars’ studios, the arrow implies
a star can have at most one studio
Studios
A contract is between a star, the movie that star is in, the producing
studio and the star’s contracting studio.
Other diagramming tools
 Crow’s Foot
maximum cardinality
Movies
PK title
PK year
length
genre
own
Means, a movie must be owned
by one and only one studio. A
studio may own no or many
movies.
 UML – See Chapter 4.7
Presidents
PK Id
name
Studios
minimum cardinality
Means a studio may have one or
no presidents, but if there is a
president then she must preside
over exactly one studio.
Some other terms
 Weak entity (represented by double lines)
 An entity that depends upon the existence of another. Usually it
will have as one of its key components the key from another
table.
 Ex. A student-class table is a weak entity of student and class as
it depends on both.
 We will see that relationships are often implemented as tables
within a db. They become weak entities.
Design considerations
 While we will be looking more at design through the
semester, a few things of note:
 Faithfulness (or fitfulness) – The data should “faithfully”
represent the underlying real world model and be “fit” for the
application.
 Ex: Stars and Movies – Implies a many to many relationship
since movies have multiple stars and stars appear in multiple
movies.
 It also depends on the enterprise in which the data will be used.
 A school that has no “team” teaching could say one teacher
teaches many courses but a class can only be taught by one teacher
vs
 JMU which has team teaching so we have a many to many
relationship between teachers and courses.
Design
 Avoid redundancy – Redundant data is often inconsistent
data.
 Simplicity – Avoid extra elements or elements that do not
support the enterprise.
 Choosing the right relationships – Look at the relationships
among the entities and determine the best way to represent
them. (See diagram 143)
 What is right for the type of element? Is it an attribute of an
entity or a standalone entity in its own right? Is it an entity or
simply a relationship between entities.
Design
 Choose the right data elements (Atomicity)
 Do we make name an attribute or do we break it up into first
middle last. name can be derived from the individual data.
 Avoid derived data (Staleness)
 Data that can be derived from other data perhaps does not
belong in the db.
 If I have date of birth I can derive age at any time that I need it.
 If I have vendor transactions, I can derive the total amount
purchased.
Design – Referential Integrity
 Where do we want to enforce the notion of required tuples?
 When must we have a corresponding row?
 Sometimes we build tables to help us maintain data integrity.