Transcript Lecture 2

Conceptual Design with
ER Model
Lecture #2
1
Lecture Outline
• Logistics
• Steps in building a database application
• Conceptual design with ER model
2
Steps in Building a DB Application
• Step 1: Pick an application
– something where you think you will have quite a bit
of data
– need to manage multiple users who process/query the
data
• Step 2: Application analysis and conceptual
design
–
–
–
–
discuss what to model for the application
need a modeling language to express what you want
ER model is the most popular such language
output: an ER diagram of the application
3
name
category
Example ER
Diagram
name
cid
ssn
Takes
Course
Student
quarter
Advises
Teaches
Professor
address
name
field
4
Steps in Building a DB Application
• Step 3: Translate the ER diagram into a relational
schema
– using a set of rules
• Step 4: Refine the relational schema
(normalization)
– use a set of schema refinement rules to transform the
above rel. schema into a good rel. schema
• At this point
– you have a good relational schema on paper
5
Example Relational Schema
Students:
SSN
123-45-6789
234-56-7890
Takes:
Name
Charles
Dan
…
Category
undergrad
grad
…
SSN
123-45-6789
123-45-6789
234-56-7890
CID
CSE444
CSE444
CSE142
…
Courses:
CID
CSE444
CSE541
Name
Databases
Operating systems
Quarter
fall
winter
6
Steps in Building a DB Application
• Step 5: Pick a relational DB management system
(RDBMS)
– MS SQL server, IBM DB2, Oracle
– MySQL, PostreSQL
– SQLite, ...
• Step 6: Implement your DB in that RDBMS
– first, implement the schema
– next, add the tuples
– you can do all these using a language called SQL
7
Steps in Building a DB Application
• Step 7: Query your DB using SQL
– now that your DB is in place, you can query it using
SQL in the “command-line interface”
Find all courses that Mary takes
select C.name
from Students S, Takes T, Courses C
where S.name = “Mary” and
S.ssn = T.ssn and T.cid = C.cid
8
Steps in Building a DB Application
• Subsequent steps
– you may not want to access the DB just via the
command line interface
• ordinary users may not know how to interact with the
database directly & ask SQL queries
• and the database also cannot do everything you want; SQL is
a limited language (not Turing complete, can’t do recursive
stuff)
– hence you often want to write an application program
in C++, Java, Perl, etc to give lay users an easier way
to query the DB and take care of things that the
database cannot do
– many such applications user a Web interface to interact
9
with users
Database management system
(RDBMS)
DB 2
DB 1
Web browser
PhP code
interface
App 2
DB 3
10
11
Database management system
(RDBMS)
DB 2
DB 1
Lay users
PhP code
DB 3
Database developer
Database system administrator (DBA)
12
Steps in Building a DB Application
• Subsequent steps
– When the tables have many tuples, queries may run
very slowly
– so you may need to build indexes to speed up query
execution
– in the long run, need to maintain the DB, expand,
clean, etc.
• This class will discuss the above steps, and will
look into the internals of RDBMSs
13
We Will Start with Step 2
• Step 1: Pick an application
– something where you think you will have quite a bit
of data
– may need to manage multiple users who
process/query the data
• Step 2: Application analysis and conceptual
design
–
–
–
–
discuss what to model for the application
need a modeling language to express what you want
ER model is the most popular such language
output: an ER diagram of the application
14
ER Model
• Gives us a language to specify
– what information the db must hold
– what are the relationships among components of that
information
• Proposed by Peter Chen in 1976
• What we will cover
–
–
–
–
basic stuff: entities, relations
constraints
weak entity sets
design principles
15
Basic Concepts
name
category
name
price
makes
Company
Product
stockprice
buys
employs
Person
address
name
ssn
16
Entities and Attributes
• Entities
– real-world objects distinguishable from other objects
– described using a set of attributes
price
name
Product
name
category
Company
• Attributes
stockprice
– each has an atomic domain: string, integers, reals, etc.
• Entity set: a collection of similar entities
17
Relations
• A mathematical definition:
– if A, B are sets, then a relation R is a subset of A x B
• A={1,2,3}, B={a,b,c,d},
1
a
2
b
R = {(1,a), (1,c), (3,b)}
A=
c
3
B=
d
makes is a subset of Product x Company:
makes
Company
Product
18
ER Diagram
name
category
name
price
makes
Company
Product
stockprice
buys
employs
Person
address
name
ssn
19
More about relationships ...
20
Multiplicity of E/R Relations
• one-one:
• many-one
a
b
c
d
1
2
3
1
2
3
• many-many
1
2
3
a
b
c
d
a
b
c
d
makes
Company
Product
21
Multiplicity of E/R Relations
• one-one:
• many-one
a
b
c
d
1
2
3
1
2
3
• many-many
1
2
3
a
b
c
d
1
c
2
a
3
d
1
b
2
b
3
d
a
b
c
d
makes
Company
Product
22
Important: Difference with the
Cow Book (Raghu Book)
• We will use
Employee
Manages
Department
• Cow book use (see Page 33, 3rd edition)
Employee
Manages
Department
• You should use the notations in the lectures
23
Multiway Relationships
How do we model a purchase relationship between buyers,
products and stores?
Product
Purchase
Store
Person
Can still model as a mathematical set (how?)
24
• When creating an ER diagram, you will create
various relationships
• Then decide on the “cardinality” of those
– should they be 1-1, 1-many, many-1, many-many, etc
• Then specify these cardinalities on the ER
diagram
• You may or may not be able to specify the
*exact* cardinality requirements
– specially for multiway relationships
• May need to approximate
25
Example
Given the following relationship
Invoice
VideoStore
Rental
Movie
Person
Suppose invoice determines the rest
How would we specify this?
26
Example
There is no good way; here’s a possible solution
Invoice
VideoStore
Rental
Movie
Person
Q: Why is this incomplete ?
27
A: Because of the meaning of such an arrow sign
Invoice
VideoStore
Rental
Movie
Person
If I know the store, person, invoice, I know the movie too
28
Roles in Relationships
What if we need an entity set twice in one relationship?
Product
Purchase
Person
salesperson
Store
Person
buyer
29
• This is a bit confusing, because we have no idea
what role each Person entity would play
• So we should label the edges to indicate the roles
Product
Purchase
salesperson
Person
Store
buyer
Person
30
We can condense this further by
collapsing the two Persons into one
Product
Purchase
Store
buyer
salesperson
Person
31
Attributes on Relationships
date
Product
Purchase
Store
Person
32
Attributes of Entities and Relationships
always have atomic values!
33
Consider Attribute Phone of Professors
name
category
name
cid
ssn
Takes
Course
Student
quarter
Advises
Teaches
Professor
address
name
phone
34
Converting Multiway
Relationships to Binary
date
ProductOf
Product
StoreOf
Store
BuyerOf
Person
Purchase
35
Relationships: Summary
• Modeled as a mathematical set
• Binary and multiway relationships
• Converting a multiway one into many binary
ones
• Constraints on the degree of the relationship
– many-one, one-one, many-many
– limitations of arrows
• Attributes of relationships
– not necessary, but useful
36
Subclasses in ER Diagrams
name
category
price
Product
isa
Software Product
platforms
isa
Educational Product
37
Age Group
Subclasses
• Subclass = special case = fewer entities = more
properties.
• Example: Ales are a kind of beer.
– Not every beer is an ale, but some are.
– Let us suppose that in addition to all the properties
(attributes and relationships) of beers, ales also have
the attribute color.
38
Subclasses in ER Diagrams
• Assume subclasses form a tree.
– I.e., no multiple inheritance.
• Isa triangles indicate the subclass relationship.
– Point to the superclass.
39
Example
name
Beers
manf
isa
color
Ales
40
Constraints
• A constraint = an assertion about the database
that must be true at all times
• Part of the database schema
• Very important in database design
• When creating the ER diagram, you need to find
as many constraints as possible
• And specify them on the ER diagram
41
Modeling Constraints
Finding constraints is part of the modeling process.
Commonly used constraints:
Keys: social security number uniquely identifies a person.
Single-value constraints: a person can have only one father.
Referential integrity constraints: if you work for a company, it
must exist in the database.
Domain constraints: peoples’ ages are between 0 and 150.
General constraints: all others (at most 50 students enroll in a class)
42
Why Constraints are Important?
• Give more semantics to the data
– help us better understand it
• Prevent wrong data entry
• Allow us to refer to entities (e.g, using keys)
• Enable efficient storage, data lookup, etc.
43
Keys in E/R Diagrams
name
Underline:
category
price
No formal way
to specify multiple
keys in E/R diagrams
Product
Person
address
name
ssn
44
More about Keys
• Every entity set must have a key
– why?
• A key can consist of more than one attribute
• There can be more than one key for an entity set
– one key will be designated as primary key
• Requirement for key in an isa hierarchy
– not covered in this lecture
45
Single Value Constraint
• At most one value play a particular role
• An attribute of a entity set has a single value
– we can specify if the value must be present or can be
missing (represented with say NULL or -1)
– example in real-estate domain
• price vs. house-style
• A many-one relation implies single value const.
46
Referential Integrity Constraint
• Single value constraint: at most one value exists
in a given role
• Ref. int. constraint: exactly one value exists in a
given role
• An attribute has a non-null, single value
– this can be considered a kind of ref. int. constraint
• However, we more commonly use such
constraints to refer to relationships
47
Referential Integrity Constraints
• In some formalisms we may refer to other object
but get garbage instead
– e.g. a dangling pointer in C/C++
• the Referential Integrity Constraint on
relationships explicitly requires a reference to
exists
48
Referential Integrity Constraints
Product
makes
Company
Product
makes
Company
• This
will be even clearer once
we get to relational databases
49
Other Kinds of Constraints
•
•
•
•
Domain constraints
Constraints on degree of a relationship
Other more general constraints
See the book
50
Weak Entity Sets
Entity sets are weak when their key attributes come from other
classes to which they are related.
This happens if:
- part-of hierarchies
- splitting n-ary relations to binary.
affiliation
Team
sport
number
)
University
name
51
Weak Entity Sets
• Occasionally, entities of an entity set need “help”
to identify them uniquely.
• Entity set E is said to be weak if in order to
identify entities of E uniquely, we need to follow
one or more many-one relationships from E and
include the key of the related entities from the
connected entity sets.
52
Now, about design techniques ...
53
Design Principles 1: Be Faithful
Product
Country
Instructor
Purchase
President
Teaches
Person
Person
Course
54
Design Principles 2:
Avoid Redundancy
date
Product
Purchase
personAddr
Store
personName
55
Design Principles 3: KISS
date
Dates
Product
Purchase
Store
Person
• Read the book for more design principles
56
More on Design Techniques
(from Ullman's slides)
1. Avoid redundancy.
2. Limit the use of weak entity sets.
3. Don’t use an entity set when an attribute will
do.
57
Avoiding Redundancy
• Redundancy occurs when we say the same thing
in two different ways.
• Redundancy wastes space and (more importantly)
encourages inconsistency.
– The two instances of the same fact may become
inconsistent if we change one and forget to change the
other, related version.
58
Example: Good
name
Beers
name
ManfBy
addr
Manfs
This design gives the address of each
manufacturer exactly once.
59
Example: Bad
name
Beers
name
ManfBy
addr
Manfs
manf
This design states the manufacturer of a beer
twice: as an attribute and as a related entity.
60
Example: Bad
name
manf
manfAddr
Beers
This design repeats the manufacturer’s address
once for each beer; loses the address if there are
temporarily no beers for a manufacturer.
61
Entity Sets Versus Attributes
•
An entity set should satisfy at least one of the
following conditions:
–
–
It is more than the name of something; it has at least
one nonkey attribute.
or
It is the “many” in a many-one or many-many
relationship.
62
Example: Good
name
Beers
name
ManfBy
addr
Manfs
•Manfs deserves to be an entity set because of
the nonkey attribute addr.
•Beers deserves to be an entity set because it is
the “many” of the many-one relationship ManfBy.
63
Example: Good
name
manf
Beers
There is no need to make the manufacturer an
entity set, because we record nothing about
manufacturers besides their name.
64
Example: Bad
name
Beers
name
ManfBy
Manfs
Since the manufacturer is nothing but a name,
and is not at the “many” end of any relationship,
it should not be an entity set.
65
Don’t Overuse Weak Entity Sets
• Beginning database designers often doubt that
anything could be a key by itself.
– They make all entity sets weak, supported by all
other entity sets to which they are linked.
• In reality, we usually create unique ID’s for
entity sets.
– Examples include social-security numbers,
automobile VIN’s etc.
66
When Do We Need Weak Entity Sets?
• The usual reason is that there is no global
authority capable of creating unique ID’s.
• Example: it is unlikely that there could be an
agreement to assign unique player numbers
across all football teams in the world.
67
ER Review
• Basic stuff
–
–
–
–
entity, attribute, entity set
relation: binary, multiway, converting from multiway
relationship roles, attributes on relationships
subclasses (is-a)
• Constraints
– on relations
• many-one, one-one, many-many
• limitations of arrows
– keys, single-valued, ref integrity, domain & general
constraints
68
ER Review
• Weak entity set
• Design principles
– be faithful
– avoid redundancy
– KISS
69