The Entity-Relationship Model

Download Report

Transcript The Entity-Relationship Model

The Entity-Relationship Model
2
1
Database Design Process
Requirement collection and analysis

DB requirements and functional requirements
Conceptual DB design using a high-level model

Easier to understand and communicate with others
Logical DB design (data model mapping)

Conceptual schema is transformed from a high-level
data model into implementation data model
Physical DB design

Internal data structures and file organizations for DB
are specified
2
Overview of Database Design
Conceptual design: (ER Model is used at this stage.)





What are the entities and relationships in the
enterprise?
What information about these entities and
relationships should we store in the database?
What are the integrity constraints or business rules that
hold?
A database `schema’ in the ER Model can be
represented pictorially (ER diagrams).
An ER diagram can be mapped into a relational
schema.
3
The Relational Model
Relational Model [Properties]




Each relation (or table) in a database has a
unique name
An entry at the intersection of each row and
column is atomic (or single-valued);
there can be no multi-valued attributes in a
relation
Each row is unique;
no two rows in a relation are identical
Each attribute (or column) within a table has a
unique name
4
The Relational Model
Properties Cont’d
 The sequence of columns (left to right)
is insignificant;
the columns of a relation can be
interchanged without changing the
meaning or use of the relation
 The sequence of rows (top to bottom) is
insignificant;
rows of a relation may be interchanged
or stored in any sequence
5
The Relational Model...
The relational model of data has
three major components:
 Relational database objects
allows to define data structures
 Relational operators
allows manipulation of stored data
 Relational integrity constraints
allows to defines business rules and
ensure data integrity
6
The Relational Objects
Location
Most RDBMS can have multiple
locations, all managed by the
same database engine
Accounting
Accounts
Receivable
Corporate Database
Accounts
Payable
Accounting
Marketing
Sales
Advertising
Marketing
Purchasing
7
The Relational Objects
Location
Client Applications
Database Server
Multi-user
8
The Relational Objects...
Database
 A set of SQL objectsDatabase Server
Update Trigger
Client Application
UPDATE T SET
INSERT INTO T
DELETE FROM T
CALL STPROG
Table T
BEGIN
...
Insert Trigger
BEGIN
...
Table A
Stored
Procedure
BEGIN
...
Delete Trigger
BEGIN
...
Table B
9
The Relational Objects...
Database
 A collection of tables and associated
indexes
Index
Table
Employee
Table
Table
Product
Table
Department Customer
Files
10
The Relational Objects...
Relation
 A named, two dimensional table of
data
Database
 A collection of databases, tables and
related objects organised in a
structured fashion.
 Several database vendors use schema
interchangeably with database
11
Relational Objects...
Data is presented to the user as tables:
 Tables are comprised of rows and a
fixed number of named columns.
Table
Column 1 Column 2 Column 3 Column 4
Row
Row
Row
12
Relational Objects...
Data is presented to the user as tables:
 Columns are attributes describing an entity.
Each column must have an unique name and
a data type.
Employee
Name
Designation
Department
Row
Row
Row
Structure of a relation (e.g. Employee)
Employee(Name, Designation, Department)
13
Relational Objects...
Data is presented to the user as tables:
 Rows are records that present information
about a particular entity occurrence
Employee
Name
Designation
Department
Row
De Silva
Manager
Personnel
Row
Perera
Secretary
Personnel
Row
Dias
Manager
Sales
14
Relational model terminology
 Row is called a ‘tuple’
 Column header is called an ‘attribute’
 Table is called a ‘relation’
 The data type describing the type of values that can
appear in each column is called a ‘domain’
 Eg: Names : the set of names of persons
 Employee_ages : value between 15 & 80 years old
The above is called ‘logical definitions of domains’.
A data type or format can also be specified for each domain.
Eg: The employee age is an integer between 15 and 80
15
Characteristics of relations
 Ordering of tuples
 Tuples in a realtion don’t have any particular order. How
ever in a file they may be physically ordered based on a
criteria, this is not there in relational model
 Ordering of values within tuple
 Ordering of values within a tuple are unnecessary, hence a
tuple can be considered as a ‘set’.
 But when relation is implemented as a file attributes may be
physically ordered
 Values in a tuple are atomic
16
Relational constraints
 Domain constraints
 specifies that the value of each attribute ‘A’ must be an
atomic value. And from the specified domain
 Key constraints
 There is a sub set of attributes of a relational schema with
the property that no two tuples should have the same
combination of values for the attributes.
 Any such subset of attributes is called a ‘superkey’
 A ‘superkey’ can have redundant attributes. A key is a
minimul superkey
 If a realtion has more than one key, they are called candidate
keys
 One of them is chosen as the primary key
17
Relational Objects
Keys
 Primary Key: An attribute (or
combination of attributes) that uniquely
identifies each row in a relation.
Employee(Emp_No, Emp_Name,
Department)
 Composite Key: A primary key that
consists of more than one attribute
Salary(Emp_No, Eff_Date, Amount)
18
Relational Objects
Data is presented to the user as tables:
 Each table has a primary key. The primary key is
a column or combination of columns that
uniquely identify each row of the table.
Salary
Employee
E-No
E-Name
179
857
342
Silva
Perera
Dias
Primary Key
D-No
E-No
Eff-Date
Amt
7
4
7
179
857
179
342
1/1/98
3/7/94
1/6/97
28/1/97
8000
9000
7000
7500
Primary Key
19
Relational Objects
Data is presented to the user as tables:
 The cardinality of a table refers to the
number of rows in the table. The degree of
a table refers to the number of columns.
Salary
Salary Table
Degree
=3
Cardinality = 4
E-No
Eff-Date
Amt
179
857
179
342
1/1/98
3/7/94
1/6/97
28/1/97
8000
9000
7000
7500
20
Entity integrity, referential
integrity/foreign keys
 Entity integrity constraint specifies that no primary
key can be null
 The referential integrity constraint is specified
between two relations and is used to maintain the
consistency among tuples of the two realtions
 Informally what this means is that a tuple in one
relation that refers to another relation must refer to
an existing tuple.
 To define referential integrity we use the concept of
foreign keys.
21
Relational Objects
Relationship
 Foreign Key: An attribute in a relation of a
database that serves as the primary key of
another relation in the same database
Employee(Emp_No, Emp_Name, Department)
Department(Dept_No, Dept_Name, M_No)
22
Relational Objects
Data is presented to the user as tables:
 A foreign key is a set of columns in one table that
serve as the primary key in another table
Department
Employee
E-No
E-Name
179
857
342
Silva
Perera
Dias
Primary Key
D-No
7
4
7
D-No D-Name
4
7
Finance
Sales
M-No
857
179
Primary Key
Foreign Key
Recursive foreign key: A foreign key in a relation that
references the primary key values of that same relation
23
Relational Objects...
Employee
E-No
E-Name
179
857
342
Silva
Perera
Dias
Primary Key
D-No
7
4
7
Foreign Key
Rows in one or more tables are
associated with each other solely
through data values in columns
(no pointers).
Department
D-No D-Name
4
7
M-No
Finance
Sales
Primary Key
857
179
Foreign Key
Salary
E-No
Eff-Date
Amt
179
857
179
342
1/1/98
3/7/94
1/6/97
28/1/97
8000
9000
7000
7500
Foreign Key
Primary Key
24
Relational Objects
Index
 An ordered set of pointers to the data
in the table
Employee
E-Name
De Silva
Dias
Perera
Silva
Pointer
E-No
179
857
342
719
E-Name
Silva
Perera
Dias
De Silva
D-No
7
4
7
5
25
Index: Employee Name
E-Name
Alwis
Bandara
Costa
De Silva
Dias
Opatha
Peiris
Perera
Silva
Vaas
Wickrama
Zoysa
Pointer
Employee
E-No
E-Name
D-No
179
857
342
719
587
432
197
875
324
917
785
234
Silva
Perera
Dias
De Silva
Alwis
Costa
Zoysa
Peiris
Vaas
Bandara
Opatha
Wickrama
7
4
7
5
4
6
2
4
7
3
2
1
26
Search: Employee Dias
Index
Improves
performance.
Access to
data
is faster
E-Name
Pointer
Alwis
Bandara
Costa
De Silva
Dias
Opatha
Peiris
Perera
Silva
Vaas
Wickrama
Zoysa
27
Search: Employee Dias
Index
Opatha
Ensures uniqueness.
A table with unique fields
in the index cannot have
two rows with the same
values in the column or
columns that form the
index key.
Costa
Bandara
Silva
Dias
Perera
Wickrama
28
Search: Employee Dias
. De Silva . Perera .
. Bandara . .
. Opatha . .
. Alwis . . . Costa . . . Dias . .
. Peiris . .
. Vaas . .
. Silva . .
. Wickrama
. Zoysa .
29
Relational Database
STORE
Store Name | City
INVENTORY
Store Name | Part No | Quantity
ORDERS
Store Name | Part No | Vendor No | Order No | Quantity
PART
Part No | Description
VENDOR
Vendor No | Vendor Name
ORDERS
Store 1 | P3 | 3428 | 0052 | 10
Store 2 | P2 | 3428 | 0098 | 7
Store 2 | P3 | 3428 | 0098 | 15
Store 2 | P4 | 5726 | 0099 | 1
PART
P1 | Printer
P2 | Diskette
P3 | Disk Drive
P4 | Modem
STORE
Store 1 | Colombo
Store 2 | Kandy
INVENTORY
Store 1 | P1 | 50
Store 1 | P3 | 20
Store 2 | P2 | 100
Store 2 | P1 | 30
VENDOR
3428 | East West
5726 | DMS
30
ER Model Basics
Entity: Real-world object distinguishable
from other objects. An entity is described
(in DB) using a set of attributes.
Entity Set: A collection of similar entities.
E.g., all employees.



All entities in an entity set have the same set of
attributes. (Until we consider ISA hierarchies,
anyway!)
name
Each entity set has a key. ssn
Each attribute has a domain.
lot
Employees
31
ER Model Basics
ssn
name
lot
Employees
Key and key attributes:
 Key: a unique value for an entity
 Key attributes: a group of one or more attributes that
uniquely identify an entity in the entity set
Super key, candidate key, and primary key



Super key: a set of attributes that allows to identify and
entity uniquely in the entity set
Candidate key: minimal super key
• There can be many candidate keys
Primary key: a candidate key chosen by the designer
• Denoted by underlining in ER attributes
32
name
ER Model Basics (Contd.)
name
dname
lot
Employees
did
Works_In
lot
Employees
since
ssn
ssn
budget
Departments
supervisor
subordinate
Reports_To
Relationship: Association among two or more entities.
e.g., Jack works in Pharmacy department.
Relationship Set: Collection of similar relationships.

An n-ary relationship set R relates n entity sets E1 ... En;
each relationship in R involves entities e1 in E1, ..., en in En
• Same entity set could participate in different
relationship sets, or in different “roles” in same set.
33
Key Constraints
since
name
ssn
 Consider Works_In:
An employee can
work in many
departments; a dept
can have many
employees.
 In contrast, each
dept has at most
one manager,
according to the
key constraint on
Manages.
dname
lot
Employees
1-to-1
1-to Many
did
Manages
Many-to-1
budget
Departments
Many-to-Many
34
Example ER
Department
major
• An ER diagram
represents several
assertions about the real
world. What are they?
• When attributes are
added, more assertions
are made.
• How can we ensure they
are correct?
• A DB is judged correct if
it captures ER diagram
correctly.
faculty
Professor
advisor
offers
Courses
teaches
enrollment
Students
35
Participation Constraints
 Does every department have a manager?

If so, this is a participation constraint: the participation of
Departments in Manages is said to be total (vs. partial).
• Every Departments entity must appear in an instance of the
Manages relationship.
since
name
ssn
dname
did
lot
Employees
Manages
budget
Departments
Works_In
since
36
Weak Entities
 A weak entity can be identified uniquely only by considering
the primary key of another (owner) entity.


Owner entity set and weak entity set must participate in a oneto-many relationship set (one owner, many weak entities).
Weak entity set must have total participation in this identifying
relationship set.
name
ssn
lot
Employees
cost
Policy
pname
age
Dependents
37
name
ssn
ISA (`is a’) Hierarchies
lot
Employees
As in C++, or other PLs, hourly_wages hours_worked
ISA
contractid
attributes are inherited.
 If we declare A ISA B, every A
Contract_Emps
Hourly_Emps
entity is also considered to be a B
entity.
 Overlap constraints: Can Joe be an Hourly_Emps as well as
a Contract_Emps entity? (default: disallowed; A overlaps B)
 Covering constraints: Does every Employees entity also have
to be an Hourly_Emps or a Contract_Emps entity? (default:
no; A AND B COVER C)
 Reasons for using ISA:
 To add descriptive attributes specific to a subclass.
 To identify entities that participate in a relationship.
38

name
ssn
Aggregation
 Used when we have
to model a
relationship
involving (entitity
sets and) a
relationship set.

Aggregation allows us
to treat a relationship
set as an entity set
for purposes of
participation in
(other) relationships.
lot
Employees
Monitors
since
started_on
pid
pbudget
Projects
until
dname
did
Sponsors
budget
Departments
 Aggregation vs. ternary relationship:
 Monitors is a distinct relationship,
with a descriptive attribute.
 Also, can say that each sponsorship
is monitored by at most one employee.
39
Conceptual Design Using the ER Model
Design choices:



Should a concept be modeled as an entity or an
attribute?
Should a concept be modeled as an entity or a
relationship?
Identifying relationships: Binary or ternary?
Aggregation?
Constraints in the ER Model:


A lot of data semantics can (and should) be captured.
But some constraints cannot be captured in ER
diagrams.
40
Entity vs. Attribute
Should address be an attribute of Employees or an
entity (connected to Employees by a relationship)?
Depends upon the use we want to make of address
information, and the semantics of the data:
• If we have several addresses per employee, address
must be an entity (since attributes cannot be setvalued).
• If the structure (city, street, etc.) is important, e.g., we
want to retrieve employees in a given city, address
must be modeled as an entity (since attribute values
are atomic).
41
Entity vs. Attribute (Contd.)
 Works_In4 does not
allow an employee to
work in a department
for two or more periods.
 Similar to the problem
of wanting to record
several addresses for an
employee: We want to
record several values of
the descriptive attributes
for each instance of this
relationship.
Accomplished by
introducing new entity
set, Duration.
from
name
ssn
to
dname
lot
did
Works_In4
Employees
budget
Departments
name
dname
ssn
lot
Employees
from
did
Works_In4
Duration
budget
Departments
to
42
Entity vs. Relationship
 First ER diagram OK if
a manager gets a
separate discretionary
budget for each dept.
 What if a manager gets
a discretionary
budget that covers
all managed depts?


Redundancy: dbudget
stored for each dept
managed by manager.
Misleading: Suggests
dbudget associated with
department-mgr
combination.
since
name
ssn
dbudget
lot
Employees
dname
did
budget
Departments
Manages2
name
ssn
lot
dname
since
did
Employees
ISA
Managers
Manages2
dbudget
budget
Departments
This fixes the
problem!
43
Binary vs. Ternary Relationships
name
ssn
 If each policy is
owned by just 1
employee, and
each dependent
is tied to the
covering policy,
first diagram is
inaccurate.
 What are the
additional
constraints in the
2nd diagram?
pname
lot
Employees
Dependents
Covers
Bad design
age
Policies
policyid
cost
name
pname
ssn
lot
age
Dependents
Employees
Purchaser
Beneficiary
Better design
policyid
Policies
cost
44
Binary vs. Ternary Relationships (Contd.)
Previous example illustrated a case when two
binary relationships were better than one ternary
relationship.
An example in the other direction: a ternary
relation Contracts relates entity sets Parts,
Departments and Suppliers, and has descriptive
attribute qty. No combination of binary
relationships is an adequate substitute:


S “can-supply” P, D “needs” P, and D “deals-with” S
does not imply that D has agreed to buy P from S.
How do we record qty?
45
Summary of Conceptual Design
Conceptual design follows requirements analysis,

Yields a high-level description of data to be stored
ER model popular for conceptual design

Constructs are expressive, close to the way people think
about their applications.
Basic constructs: entities, relationships, and attributes
(of entities and relationships).
Some additional constructs: weak entities, ISA
hierarchies, and aggregation.
Note: There are many variations on ER model.
46
Summary of ER (Contd.)
Several kinds of integrity constraints can be expressed
in the ER model: key constraints, participation
constraints, and overlap/covering constraints for ISA
hierarchies. Some foreign key constraints are also
implicit in the definition of a relationship set.


Some constraints (notably, functional dependencies) cannot be
expressed in the ER model.
Constraints play an important role in determining the best
database design for an enterprise.
47
Summary of ER (Contd.)
ER design is subjective. There are often many ways
to model a given scenario! Analyzing alternatives
can be tricky, especially for a large enterprise.
Common choices include:

Entity vs. attribute, entity vs. relationship, binary or nary relationship, whether or not to use ISA hierarchies,
and whether or not to use aggregation.
Ensuring good database design: resulting
relational schema should be analyzed and refined
further. FD information and normalization
techniques are especially useful.
48