Transcript primary key

IT For Engineers
Entity-Relationship Diagrams
INFO 203
Dr. Jennifer Booker
INFO 203
Week #4
1
Databases


Storing and manipulating data are the heart
of an information system
Data must be





INFO 203
Available when needed
Accurate
Consistent in its definition
Efficiently stored and retrieved
Presented in some useful form for planning,
making decisions, etc.
Week #4
2
Databases

There are two approaches for database
design – flat file and relational

A flat (or ‘conventional’) file database keeps
separate files for each application



A relational database stores data in a single
structure for use by many applications

INFO 203
COBOL systems, and FileMaker Pro
Like using Microsoft Excel as a database
Every major database system; Oracle, DB2, Access,
MySQL, MS SQL Server, Informix, Sybase, etc.
Week #4
3
Databases

Flat file systems are, by definition, redundant


Data is copied to each report table only when the
report is generated, so later changes make the
table obsolete
Relational databases keep the connections
(relationships) to each set of data


INFO 203
Each record exists once in the system
Is much more flexible to generate custom views
of the data, without changing its structure
Week #4
4
Reality, Data, Metadata

A database models some collection of
data which appears in reality


Reality is the thing being modeled
Data is used to describe the key attributes
of that thing


INFO 203
About what aspects of that thing do we wish to
keep track?
Metadata (data about data) describes the
characteristics of the data
Week #4
5
Data Modeling


Data modeling (or database or information
modeling) is a way of organizing and
describing the data in a system
It is a logical model to describe the specific
data fields (elements) we wish to capture,
and how they are related to each other
INFO 203
Week #4
6
Where to start?



Recall that data modeling starts with thinking
about the things involved in your system
These things are formally called “entities” –
nouns, if you will
Start by identifying all of the places,
people, events, and ideas which are
affected by your system
INFO 203
Week #4
7
Permanent vs. Transient Data


A key for relational data modeling is that we
are primarily concerned with data we need
to keep permanently
Data which is only needed briefly isn’t
modeled in an ERD

INFO 203
That’s a major difference between relational and
object-oriented analysis
Week #4
8
Lots of Types of Files






Master files contain permanent information
Transaction files describe business events
Document files contain copies of historic data
Archival files contain master and
transaction file records offline, which
are no longer needed
Table look-up files contain static data for
reference calculations (tax tables, Zip codes)
Audit files record changes to other files
INFO 203
Week #4
9
Characterize Entities


Then examine each entity and determine the
attributes which you are interested in – what
data do you need to know in order to
describe one such entity meaningfully?
Consider if some attributes can be readily
grouped together, thereby forming compound
attributes (e.g. name)
INFO 203
Week #4
10
Characterize Entities

Entities are generally one of two types:



A set of data you want to keep permanently
(customer orders, product information, etc.), or
A lookup list or table (types of status codes,
shipping rates, tax rates, etc.)
Data which is transient is generally kept in
local variables, and doesn’t appear in an
ERD (e.g. change of address info)
INFO 203
Week #4
11
Keep it or not?



In trying to decide if data needs to be kept,
consider whether someone might want
to analyze that data in the future
For examples, to look for sales patterns,
trace relocation history, keep record of
data changes (who modified what data
and when?)
When in doubt, keep it for now
INFO 203
Week #4
12
Attributive Entities

An entity which depends on the existence
of another entity, but only indicates multiple
examples of that ‘parent’ entity, is an
attributive entity


INFO 203
Hence it’s referred to as describing ‘repeating
groups of data’, or they are ‘parent/child’ entities
Shown using part of an oval inside the entity
box (p. 365)
Week #4
13
Database Notation

In week 1 we covered the basic notation
for cardinality






Only one
Zero or one
One or many
Zero, one, or many
Strictly many
Now we want to put all the two-entity pairs
together into a single diagram
INFO 203
Week #4
14
ERD


The Entity-Relationship Diagram (ERD)
captures all the entities for our system,
and shows the relationships among them
(the lines connecting them) (e.g. p. 366)
The relationships all need to have



A verb phrase in at least one direction
Cardinality at both ends
Now we can add attributes and keys
INFO 203
Week #4
15
Attributes

Each entity is almost certain to have at least
two attributes (often more)


At least one of them is a unique identifier for each
record in the entity, called the primary key
Other attributes describe significant
characteristics of the entity


INFO 203
So, about what aspects of the entity do you want
to retain knowledge? What is relevant for this
system?
Attributes are also called fields, data items, etc.
Week #4
16
Characterize Attributes

For each attribute, define its data type:





Text (“Fred”) [and the character set (Latin)]
Number (real (3.56) or integer (124))
Date and/or time
Yes/No (a.k.a. T/F, binary, or Boolean)
A fixed set of possible values (e.g. grades)


INFO 203
Formally known as an enumerated list
Multimedia: photos, drawings, movies, sounds
Week #4
17
Attributes & Records

Attributes can be fixed or variable length





Length of integer (8-64 bits, in powers of 2),
Lengths of text fields (the number of characters
or bytes it contains)
Length of real number fields and the number of
decimals (8.1 only has six digits, -12345.6)
We won’t worry about attribute lengths
A record is a set of attributes from one entity

INFO 203
Typically corresponding to one row in a data table
Week #4
18
Relevant Data Type Standards

Character sets



ISO/IEC 8859 – a set of ten 8-bit character sets,
each used for a different family of languages
(Latin, Cyrillic, Greek, Arabic, Hebrew, etc.)
Unicode – a 32-bit character set to represent
almost all languages
Representation of dates and times

INFO 203
ISO 8601
Week #4
19
Characterize Attributes

Other traits to look for, beyond the scope
of this course:

Identify the domain of each attribute




What is the range of allowable values?
Determine if there is a default value for
each attribute
Is each attribute mandatory (required) for each
entity? (Avoid many mandatory fields)
Ask yourself: could an attribute be a key?
INFO 203
Week #4
20
Key Attributes

An attribute or group of attributes may be
a unique identifier, or key, for each entity


Examples are Social Security Number,
driver’s license number, ISBN, Student ID
If a group of attributes is used, it is
a concatenated (or composite or
compound) key

INFO 203
{Course number, section number, and
term} could form a concatenated key
Week #4
21
Many Keys Possible




There might be more than one usable key
for an entity
Each possible key is called a candidate key
One candidate key is selected to be the
primary key (PK)
All others are alternate keys (AK)

INFO 203
Example: the electric company may use a
customer ID or account number as a primary key,
and your phone number or email address as an
alternate key
Week #4
22
Primary Key may be Meaningless

A primary key may correspond to some
recognizable attribute


Or it may be completely meaningless


SSN, student ID, ISBN, etc.
A sequential number, called Order_ID, or
customer ID, or account number
As long as the primary key is unique
for every record, either kind is acceptable
INFO 203
Week #4
23
Foreign Keys

A foreign key is an attribute which establishes
the connection between two entities



A Sales entity might have a foreign key attribute
Customer_ID, which points to all of the data in
the Customer entity for the particular customer
who placed a given sale
If that were the case, the primary key in the
Customer entity must be Customer_ID
This type of connection is the heart of
relational database modeling (!)
INFO 203
Week #4
24
Foreign Keys

A foreign key (FK) is an attribute which exists,
in an entity other than where it is
a primary key (PK), to establish the
relationship between the two entities



INFO 203
Primary key must be unique for each record, but
a foreign key value may appear many times
Only one PK-FK connection is used for
the relationship to exist
The entity with FK generally has a PK of its own
Week #4
25
Other Key Traits

A PK attribute may also be a FK


Especially for 1:1 relationships
An associative entity builds a concatenated
primary key from more than one entity

INFO 203
Uses a diamond shape inside the normal box
to show its special nature
Week #4
26
Other Relationships

A many-to-many (non-specific) relationship
implies a lot of one-to-many relationships


Often use an associative entity to bridge
between them
An identifying relationship is when a parent
entity’s PK is used as part of the PK for a
child entity

INFO 203
Child entity is then considered “weak”
because it depends on the parent
Week #4
27
Summary of Key Traits

The bottom line for keys is:

Each entity must have at least one PK






INFO 203
More than one PK implies a concatenated key
Alternate keys are completely optional
Each entity may have from zero to many FK’s
Each FK is a PK in another, related entity
Only one PK-FK relationship is needed to relate
two entities
Some keys are not inherently meaningful data
Week #4
28
Data Normalization

Analysis of a data model for implementation
is done using data normalization


Normalization organizes data attributes to
form simple, non-redundant, flexible,
adaptive entities
There are five levels of data normalization,
of which three are ever used

INFO 203
They build on each other; so to get to second
level you have to comply with first level also
Week #4
29
First Normal Form (1NF)



An entity is in first normal form if there are no
attributes which can have more than one
value for each instance (record) of the entity
Attributes which could have more than one
value for a given entity belong to a different
kind of entity
In other words, every attribute appears only
once for each record
INFO 203
Week #4
30
Second Normal Form (2NF)
Look at concatenated keys only!



Must be first normal form, and:
Each non-primary-key attribute is uniquely
determined by the entire primary key
Non-primary-key attributes may not be
dependent on only part of the primary key


If any are, move them to another table which uses
only that part of the primary key
If there are no concatenated keys, second
normal form comes automatically with 1NF
INFO 203
Week #4
31
Third Normal Form (3NF)


Must be second normal form, and:
The value of each non-primary-key attribute
is not dependent upon any other
non-primary-key attribute


Every attribute depends only on the primary key
The two ways to look for this are derived
attributes and transitive dependencies...
INFO 203
Week #4
32
Third Normal Form (3NF)

Derived attributes (data) are fields calculated
or logically derived from other fields


Exception: OK to keep attribute if multiple entities
are involved in deriving an attribute
Transitive dependencies may exist for
non-concatenated keyed tables; is when
a non-key attribute depends on another
non-key attribute
INFO 203
Week #4
33
Third Normal Form (3NF)

Or in brief, for third normal form…
An entity is in third normal form if every
non-primary key attribute is dependent on
the primary key, the whole primary key,
and nothing but the primary key
(as in, “Do you swear to tell the truth…”)
INFO 203
Week #4
34
Further Normalization



Additional improvement in data structure
is possible through “Simplification by
Inspection” - look for other redundancies
or simplifications possible
Many CASE tools can also inspect for first
level normalization, but generally no further
Just for the record, here are the 4th and 5th
normal forms

INFO 203
I’ve never seen them used in industry
Week #4
35
Fourth and Fifth Normal Forms

Fourth normal form (4NF) involves removing
multivalued dependencies


If a pair of records has two matching attributes,
decompose the data structure to remove that
Fifth normal form (5NF) involves removing
join dependencies (nearly impossible to do)

INFO 203
This is when business rules define a connection
among many entities (e.g. if you replace a tire,
you must also replace the valve stem)
Week #4
36
Other database concepts




Integrity constraints keep the database
consistent when records are changed or
deleted
Anomalies can occur when creating new
database tables
We won’t go into those in detail
You can forget about 4NF and 5NF, too
INFO 203
Week #4
37
Data Warehouses

A data warehouse is a set of databases that
organize data around subjects, generally
drawing from databases that were created
independently



INFO 203
Data is stored in summary form not raw records
Data are accessed via data mining applications
(siftware?) to look for hidden patterns
May answer complex questions using online
analytical processing (OLAP)
Week #4
38
Business Intelligence (BI)

BI is a decision support system (DSS) that
draws from data warehouses or big data
sources


INFO 203
May need to use text analytics to extract
information from unstructured data (not in a
data warehouse)
Much of the challenge for these systems is
designing the user interfaces
Week #4
39
Summary: How to create an ERD


Define the entities about which you’ll want to
keep information
Define the relationships between those entities


Define the attributes for each entity




INFO 203
Must have a verb phrase and both cardinalities
At a minimum, define the name and data type
of each attribute
Identify a primary key for each entity
Identify foreign keys as needed
Normalize to third normal form (and fix keys
as needed)
Week #4
40