E-R Modeling - CSCI 6442

Download Report

Transcript E-R Modeling - CSCI 6442

CSCI 6442
Entity-Relation Data Modeling
Copyright 2012, David C. Roberts, all rights reserved
2
Agenda
• Data Modeling
• Natural and Generated Keys
3
Relational Database
Remember that a relational database is a
collection of time-varying, independent relations
Each relation models some entity type in the
problem space (outside the computer)
There is a close correspondence between entity
types and relations
4
Data Modeling
• We can say that the relational database models
the real-world problem
• So construction of a relational database
becomes the selection of the entity types to put
into the data model
• Sometimes entity type selection is simple; other
times entity types are not apparent and selection
can be difficult
5
Notation for Data Models
• Because a relational database is comprised of
independent tables, how do you deal with relationships
during design?
• Chen developed the entity-relation data model, which
considered entity types and their relationships
• He expected that a new type of database system would
evolve, directly implementing the E-R data model
• Instead, the E-R notation was found to be an excellent
tool for relational database design and the relational
approach has taken over
• Since then, popular automated tools have tended to
change the notation for easier computer printing
6
Chen’s Notation
•
We will use Chen’s notation here for two
reasons:
1.
•
It separates the notion of a relationship from the
notion of an entity type
2. Conversion of the design into relational tables
occurs at the end of the design process, so the
difference between the data model and physical
design is clear
In your own work, once you are familiar with
the concepts, you will be able to use the
popular “crow’s foot” notation used by
DBDesigner4
7
Entities
• Strong Entity—has independent existence
Employee
• Weak Entity—exists only when the entity it
depends on exists
Dependent
Question: What are some
Examples of strong entity
types? Weak entity types?
8
Examples of Entity Instances
An instance of an entity is a specific occurrence of
an entity type:
• Bill Gates is an Employee of Microsoft
• Spam is a Product
• Greenpeace is an Organization
• Flour is an ingredient
9
Attributes
• Attributes are indicated inside ovals; identifier
attributes are underlined. For complex
diagrams, attributes are often omitted
EMPNO
Employee
JOB
Question: Which
attributes in this diagram
should be underlined?
SALARY
10
Examples of Attributes
An attribute is a characteristic of an entity type:
EmployeeID
Social Security Number
First Name
Last Name
Street Address
City
State
ZipCode
Date Hired
Health Benefits Plan
11
Multi-Valued Attribute
• A multi-valued attribute is shown as a double
line
EMPNO
Employee
JOB
Question: Which
attribute should have a
double underline?
SALARY
TELEPHONE
12
Relationship
A relationship is an association between two entity
types, for example:
▫
▫
▫
▫
A CUSTOMER places a CUSTOMER ORDER
An EMPLOYEE takes a CUSTOMER ORDER
A STUDENT enrolls in a COURSE
A COURSE is taught by a FACULTY MEMBER
• Some say that entity types should be nouns and
relationships should be verbs
13
Relationship
EMPLOYEE
WORKS FOR
DEPARTMENT
14
Categorizing Relationships
• Number of entity types participating:
▫ Unary: one
▫ Binary: two
▫ Ternary: three
• Existence of related instances(a.k.a. optionality):
▫ Mandatory
▫ Optional
• Cardinality of the relationship:
▫ One-many
▫ Many-many
▫ One-one
15
One-to-One Relationships
• Usually the only time a one-to-one relationship is used is for a
dependent entity. Otherwise, usually if there is a one-to-one
relationship, careful consideration will show that there is just one
entity type.
EMPLOYEE
1
HAS
1
SPOUSE
16
Relationship And Cardinality
• A relationship is an association between two or
more entity types, drawn as a diamond.
Relationships may be one-to-many, many-tomany or one-to-one
DEPARTMENT
1
HAS
N
EMPLOYEE
STUDENT
N
GRADE
N
COURSE
EMPLOYEE
1
HAS
1
COMPANY CAR
17
Recursive Relationship
• Recursive relationship is how a repeated
hierarchy is represented
Employee
WORKS FOR
Question: How else can this hierarchy be represented?
18
Relationship of Higher Degree
• A ternary relationship, also said to have degree 3.
STUDENT
N
TEACHER
N
COURSENO
REG
N
COURSE
GRADE
19
Optionality of Participation
DEPARTMENT
1
HAS
N
EMPLOYEE
STUDENT
N
GRADE
N
COURSE
EMPLOYEE
1
HAS
1
COMPANY CAR
20
Supertypes
• It can be useful to consider a supertype that
includes several entity types as subtypes
• Supertypes can be
▫ Complete: every instance of the supertype is one
of the subtypes
▫ Distinct: no single entity type can be a member of
two subtypes
21
Notation
• Chen’s notation for supertypes and subtypes is
cumbersome, so let’s use “crow’s-foot” notation
• A supertype is shown as a box surrounding the
boxes for all the subtypes
• Relationship lines end on the supertype or the
subtype boundaries as appropriate
22
Subtypes
• Subtypes can be either mutually exclusive
(disjoint) or overlapping (inclusive).
▫ For a mutually exclusive category, an entity
instance may be in only one subtype.
▫ For an overlapping category, an entity instance
may be in two or more subtypes.
• The completeness constraint: all instances of a
subtype must be represented in the supertype.
23
Example Subtypes
PARTY
PERSON
EMPLOYEE
Question: How would
this be implemented as
database tables?
PARTY(PARTYID)
PERSON(PARTYID,FNAME,MI,LNAME,DOB,POB)
EMP(PARTYID,EMPID,DHIRED,SAL,JOB,DEPTNO)
APP(PARTYID,DAPPLIED,STATUS)
APPLICANT ORG(PARTYID,ORGNAME,ORGCITY,ORGSTATE,ORG
ZIP)
CUST(PARTYID,CUSTID,DOFO,DOLO,LYVOL)
SUPP(PARTYID,SUPPID,DOFO,DOLO,LYVOL)
ORGANIZATION
CUSTOMER
SUPPLIER
24
Why Use Subtypes and Supertypes Like This?
• Relationships are simplified
▫ The relationship can take place at the highest
possible level in the hierarchy
▫ Fewer relationship tables are generally required
• Programming can be simplified
▫ All subtypes can be processed in the same way
▫ For example, write a check to an organization or a
check to a supplier with the same code
25
Relationships
DEPT
WORKS
IN
BELONGS
TO
26
Converting an ERD to Relational
• Recall that when we did an ERD we were not
designing tables, we were defining the problem
in terms of entity types and relationships
• Now the design must be translated into
relational tables
Question: What are the symbols on the ERD that will become tables in the
relational database?
27
Converting to Relational
• Every entity type becomes a table
• For one-many relationships, put the identifier of
the one with each of the many
• For one-one relationships, put the identifier of
each with the other
• For many-many relationships, identify an entity
type that connects them, and make it a table,
using the primary key of both partners as a
composite primary key
28
Conversion
• We see that every rectangle in the ERD will
become a table in the relational database
• Some of the diamonds will become tables and
some will not
29
Students and Courses
Grade
Student
Registration
Course
Name
StudentID
Credits
Number
Grad YR
Location
30
What About Subtypes?
• Each subtype and supertype is a separate table
• Common attributes are stored with the highest
level entity that shares them
• The root level supertype may have many
attributes, or it may have only a key
31
OK, How Do I Do It?
You will have your own ideas about how to proceed,
but here’s an idea:
1. Enumerate things in the problem space to keep
track of (these are entity types)
2. Enumerate what to keep track of about each thing
(these are attributes)
3. Enumerate relationships to keep track of
4. Draw a draft ERD and review it for difficulty of
doing the operations that are needed
5. Review it for possible simplifications and
alternative approaches
32
Here’s Another Approach
• List all the facts you want to keep track of and their
subjects
• Group your list by subject
• Consider each subject as an entity type, the facts
about it as attributes
• Be sure to use separate entity types for repeated
groups of facts
• Now draw your draft ERD and review it for difficulty
of programming the operations that are needed
• Review it for simplifications and alternate
approaches
34
Definitions
Natural key—a key that occurs in the data, that
uniquely identifies rows. AKA candidate key.
Generated key—a key that is generated at the time
data is inserted into the database, that did not
occur naturally
35
Postulate:
Question: does 3NF require that there be a singleattribute primary key?
In a normalized data model, every row has a natural
key
Proof: 3NF requires that the value of every attribute
be ffd by the primary key. By construction, primary
keys uniquely identify instances and are unique.
Therefore, each row has a different value for the
primary key and is hence unique.
Result: There is no reason to add a primary key to
normalized data in order to make rows unique.
They already are.
36
Generated Keys
In some organizations, programmers have the idea that a
generated key should be included in every relation, a
misunderstanding of the relational approach. Often they will
say “it’s to make rows unique” or “to provide a unique
identifier.” This is terribly wrong!
That’s a fine idea for an Excel spreadsheet but it shows a
misunderstanding of what a relation is and what a tuple is.
Tuples are naturally unique because each one corresponds to a
distinct instance of an entity type in the real world. You don’t
have to add anything to make them unique.
If your database design is correct, you have unique rows even if
you haven’t added a key to each row.
37
Making the Point Again, Again
• Each row corresponds to an occurrence of a
entity instance
• Each entity instance is unique; that’s why it is
tracked as an instance
• Therefore, every row is different from every
other row without adding anything to it
Question: what if entity instances are not
unique? How do we track them in a database?
38
Claim That’s Made for Generated Keys
• Some think that generated keys allow more flexibility
• For example, in our EMP table, can we allow a person’s
name to change?
• If we use a generated key, then the name can change and
all other associated data will still be associated with the
correct person
• What do you think of this statement?
• Can you use telephone number as the primary key? Why
or why not?
• Can primary keys ever change in value?
• And what if the primary key is a foreign key in another
table?
39
SQL and Primary Keys
• SQL allows primary keys to change; they are just
a value that changes
• What about foreign keys? Can they change?
• In the CREATE TABLE statement, you state ON
UPDATE or ON DELETE CASCADE,
RESTRICT, NO ACTION, SET NULL or SET
DEFAULT. For ON DELETE, DELETE can also
be an action
40
Limitation of SQL Database Systems
• In general, the primary key is not allowed to
have null value—which is fine
• But a composite primary key is not allowed to
have any part that is null
• The composite primary key limitation does not
correspond to reality, can force us to use a
generated key
41
Benefits of Natural Keys:
• Assures proper attention to the data model,
because unique values must be identified
• Tables have fewer columns
• Avoid storage of meaningless data
• The key value itself may have value to an
application, hence will avoid retrieval of the row
itself
42
When to Use Generated Keys
• Sometimes you have a compound key that is very
long
• Lots of queries will be very complicated and tables
with foreign keys will copy most of another table
• As a guideline, the only time to use a generated key
is when the natural key has at least three columns
• A special case of this rule is the case where the whole
row would have to be a composite key without a
generated key
• Example: internal person
43
Another Case for Generated Keys
• Suppose we don’t have any kind of single or
composite identifier that is always present
• Database systems generally don’t allow for part
of a designated composite key to be null
• Even though it’s legitimate, because of database
system limitations, you may have to use
generated keys if you don’t have a common
identifier without any null parts, ever
44
Example
• ID Card for entry to GWU buildings
• Students may not have SSN but all have student
number
• Contractors may not have GWU employee # but
all have SSN
45
The Bottom Line
• Don’t use generated keys. Ever.
• Unless you must!