What`s Wrong with ER Modeling

Download Report

Transcript What`s Wrong with ER Modeling

DAMA, 2001 December.
ORMvER
1
What’s Wrong With
ER Modeling?
Gordon C. Everest
Carlson School of Management
University of Minnesota
Problems and Solutions
ORMvER
2
OBJECTIVES FOR THIS PRESENTATION:
• Show several PROBLEMS with ER modeling schemes,
(actually, any “record-based” modeling scheme).
• Identify the ROOT CAUSE of the problem
To stop there would be irresponsible, so…
• Show you a better way – a SOLUTION
using Object Role Modeling (ORM)
• NOT asking you to abandon what you have learned about data
modeling and are doing in practice
• BUT to defer thinking in terms of entity records, and
to begin doing data modeling at a richer, more conceptual level
Data Modeling
DMOD
3
What is the Dominant Data Modeling Scheme today?
What’s Wrong with ER Modeling?
BEFORE WE CAN ANSWER THAT:
Why Do Data Modeling?
How do we do Data Modeling?
Why do we need Normalization?
Database Design
DMOD
4
Objective: (WHAT we are trying to do)
TO ACCURATELY AND COMPLETELY MODEL
SOME PORTION OF THE REAL WORLD
UNIVERSE OF DISCOURSE (UoD)
OF INTEREST TO SOME ORGANIZATION
OR COMMUNITY OF USERS.
Logical Database Design
Objective, Principles, Benefits
DMOD
5
· OBJECTIVE of LOGICAL DATABASE DESIGN:
TO ACCURATELY AND COMPLETELY MODEL
SELECTED PORTIONS OF THE REAL WORLD
OF INTEREST TO A COMMUNITY OF USERS.
• USERS (COLLECTIVELY) WILL ALWAYS KNOW MORE
ABOUT A DATA STRUCTURE THAN THE SYSTEM KNOWS,
OR THAN COULD BE DEFINED TO THE SYSTEM.
• WHAT IS NOT FORMALLY DEFINED TO THE SYSTEM,
THE SYSTEM CANNOT MANAGE . . . THE USERS MUST!
• THEREFORE, NEED TO CAPTURE RICH SEMANTICS
WITH COMPREHENSIVE DATA MODELING and DEFINITION,
INCLUDING INTEGRITY CONSTRAINTS AND OPERATIONS.
FOR ==> GREATER QUALITY & RELIABILITY IN DATA
==> GREATER USER CONFIDENCE.
==> HIGHER USER / DEVELOPER EFFICIENCY
Let the ‘system’ do it!
Purpose of Data Modeling (WHY we do it)
DMOD
DUAL, CONFLICTING PURPOSES DRIVE THE PROCESS:
6
USE
R
• Facilitate Human Communication, Understanding, & Validation
– capture and present meaning, the semantics of a model
– direct representation of only essential model semantics
PRESENTATION CHARACTERISTICS:
– scoping and presenting subparts of a Model
– unfolding presentation at different levels of abstraction or detail
– visual prominence in proportion to semantic importance
SECONDARY:
• Basis for Implementation - defining & creating a Database
– complete in all the necessary details
– construction/generation able to be fully automated
SCHEMA
DATABASE
Modeling
DMOD
7
(Re).present.(ation)
present
Reality
(mental models)
MODELING
PROCESS
MODEL
Re.present
Knowledge
in the head
Knowledge
in the world
Knowledge
externalized,
formalized,
shared.
What drives or guides the process?
The Modeling Process
DMOD
8
METHODOLOGY:
Steps/Tasks + Milestones + Deliverables
Real World
Universe of Discourse
+
MODELING SCHEME
perception
selection/filtering
Context
Constructs
Composition
Constraints
MODELING
PROCESS
REPRESENTATIONAL FORMS:
Narrative, Graphical Diagram,
Formal Language Statements
(the Syntax)
MODEL
A Data Modeling “Scheme”
DMOD
9
DEFINES the:
• Context
• Constructs (ENTITIES, OBJECTS)
• Collections, Compositions, Connections (RELATIONSHIPS)
• Constraints, Characteristics
WE LOOK FOR
IN THE “REAL” WORLD UoD or Domain of Interest
and
USE IN BUILDING A DATA MODEL.
Data Modeling Constructs
DMOD
10
What to look for:
Relative emphasis differentiates Data Modeling approaches
ENTITY
RELATIONSHIP
(OBJECT)
IDENTIFIER
ATTRIBUTE
characteristics
[ FOREIGN KEY ]
characteristics
Student-Course Database - Table Diagram
DMOD
11
Diagram of the Schema:
COURSE
Course#
Title
Description
Credits
INSTRUCTOR
SSN
LastName
FirstName
Address
Phone
Dept
COURSE
OFFERING
Course#
Year
Term
Section
Building
Room
Days
Time Start
Control
Enrollment
Instructor SSN
What if you move the arrow head
to the other end of the arc?
STUDENT
Student ID
Name
Address
Major
GPA
REGISTRATION
Course ID
Student ID
Grade
LEGEND:
ENTITY NAME (upper case)
Identifier (bold face)
Attributes (not bold face)
Foreign Key
Identifier
M:1 relationship
Student-Course Database – Populated
DMOD
12
Actual instances of data values:
COURSE:
Course# Title
Credits
ACC101 Intro Accounting
4
ENG101 English Composition 4
MIS101 Intro MIS
4
MIS103 Intro Database
4
MIS403 Advanced Database 2
…
COURSE OFFERING:
CRSO# Course# Year Term Sect Room
1004 MIS101 2000 Fall 001 1-142
1017 MIS101 2001 Spr 002 2-224
3001 MIS103 2000 Fall 001 2-207
…
INSTRUCTOR:
InstrID Name
33741 Allen, Lillian
85959 Boyd, Don
64578 Carlis, John
11248 Davis, Gordon
77004 Everest, Gordon
55432 Fine, Alan
…
Dept
Eng
ACC
CSci
IDS
IDS
IDS
Secondary (Composite) Key
STUDENT:
StudentID
1111111
2222222
3333333
4444444
5555555
…
InstrID
11248
55432
77004
Name
Major GPA
Able, Emma
MIS 3.4
Bright, Sue
MIS 3.9
Challenger, X
ACC 2.7
Dummie, Noe
ACC 3.2
Everest, Monty MIS 3.8
Enroll
48
60
27
REGISTRATION:
CRSO# StudentID
1004 4444444
1017 3333333
3001 1111111
3001 2222222
3001 5555555
3001 7777777
…
Grade
B+
B
A
A
A-
Data Modeling – Schema Diagram
DMOD
13
THINKING ABOUT ATTRIBUTES:
Record-Based:
ENTITY
IDENTIFIER
ATTRIBUTE
ATTRIBUTE
...
Essentials of ER Modeling / Diagramming
DMOD
14
ENTITY1
1
RELATIONSHIP
M
ENTITY2
Adding Attributes, omitting the Diamond:
identifier
ENTITY1
ENTITY2
============
IDentifier 1
--------------------Attribute 1.1
Attribute 1.2
Attribute 1.3
:
============
IDentifier 2
--------------------Attribute 2.1
Attribute 2.2
Attribute 2.3
ForeignID 1
:
ENTITY1
ENTITY
ATTRIBUTE
ATTRIBUTE
...
Attribute2
Attribute3
ENTITY2
IDENTIFIER
Attribute1
Attribute
ORMvER
15
What’s wrong with
ER Modeling?
________
ER / Record-based Modeling
DMOD
16
VALUE
DOMAIN
VALUE
DOMAIN
VALUE
DOMAIN
VALUE
DOMAIN
... roles
ID
TABLE:
X
ATTRIBUTES . . .
A
B
C
D
CLUSTERING of ATTRIBUTES into RECORDS/RELATIONS
– NOT a necessary or desirable first step
– gets us into trouble: if too much, must decompose to normalize
Record-based Design
ORMvER
17
WHAT SEMANTICS ARE PRESUMED
BY THE FOLLOWING RECORD STRUCTURE?
X
A
B
C
• What does it say about X ?
• What does it say about A ?
• What does it say about the relationship X – A ?
• What does it say about the relationship A – B ?
There are at least 14 distinct semantic statements you can make in answering these questions!
• Do we know it is in Third Normal Form (3NF)? How?
Record-based Design
ORMvER
18
WHAT DOES IT SAY ABOUT
X
A
X?
B
C
Record-based Design
ORMvER
19
WHAT DOES IT SAY ABOUT
X
A
A?
B
C
Record-based Design
ORMvER
20
WHAT DOES IT SAY ABOUT THE RELATIONSHIP
X
A
B
C
X–A ?
Record-based Design
ORMvER
21
REPRESENTING THE RELATIONSHIP
X
A
A
N
B
D ...
C
X–A ?
Record-based Design
ORMvER
22
WHAT DOES IT SAY ABOUT THE RELATIONSHIP
X
A
B
C
A–B ?
Record-based Design
ORMvER
23
REPRESENTING COMPLEX RELATIONSHIPS AMONG
X
A
?
A
...
B
C
B
A? ...
X, A, & B .
Separately consider the relationship
between A and B.
What if it is many-to-many?
What if other information is
functionally dependent on
A–B ?
Record-based Design - Compound Key
ORMvER
24
WHAT IS PRESUMED BY THE FOLLOWING RECORD STRUCTURE?
X
Y
A
B
C
Major Data Modeling Schemes
DMOD
25
Everest-DM-4p.121.
(1) SINGLE FILE (E-A)
- FLAT FILE “TABLE”
- HIERARCHICAL - nested repeating groups
e.g., COBOL
(M) MULTIFILE (E-R → E-A-R)
- NETWORK - hierarchical records
- RELATIONAL (E-A-[R]) - flat records
(O) NO FILE (O-R)
(No Clustering of Data Items into Records)
- NIAM/“Binary” Modeling
- ORM (Object-Role Modeling - Halpin)
RECORDBASED
(Clustered
Data Items)
Data Modeling Schemes
DMOD
26
CLASSIFIED by Degree of Clustering:
• No clustering
– NIAM/ORM - Nijssen, Halpin
• Clustering to One Level => Atomic Data Values
–
–
–
–
–
–
–
Relational Modeling - Codd
ER Modeling - Chen
Extended ER (EER) - Teorey
Information Engineering (IE) – Clive Finkelstein -> James Martin
Oracle (Designer*2000) - Barker
IDEF1X - Appleton, US Gov’t, ERwin (tool), Bruce (book)

• Nested Objects
–
–
–
–
–
–
Hierarchical data structure (single file; COBOL)
CODASYL Network (ANSI NDL)
Nested Relations
Semantic Object Modeling (SOM) – Kroenke, Salsa (tool)
Object Modeling (UML) – Rational Rose (tool)
ANSI SQL:1999

Data Modeling Schemes – Clustered
DMOD
27
HIERARCHIC
special case
NETWORK
ER
-
- single file
- nested repeating groups
- implicit hierarchical relationships
- multifile, hierarchical record
- defined relationships
=> semantic/ OBJECT models
Focus on E & R, hidden record structure
Usually flat records [optionally with attributes]
Defined relationships (general M:N)
Usually restricted to binary relationships
RELATIONAL
- Multifile; flat records only
- Relationships as foreign keys
so no M:N relationships
Taxonomy of
“Clusterered” Data Structures
DMOD
28
Single
File
Multiple
Files
Flat
SINGLE FLAT
FILE (“TABLE”)
RELATIONAL
(“TABLES”)
Nested
HIERARCHICAL
FILE
(CODASYL)
NETWORK
Intra-Record
Structure
Clustered
Stages of Data Modeling
DMOD
29
Start at the highest Conceptual Level!
USE
R
Domain
Knowledge
CONCEPTUAL
ER
CLUSTERED
ORM
“LOGICAL”
Attribs in Records RELATIONAL
• Objects
MultiValued,
• Obj. ID’s
PHYSICAL
Nested - - - - - -> Flat (1NF)
• Roles/Relships
Ternaries - - - - - -> Binary only
• Implementation
• (Fnl. Dep)
in/for a DBMS
M:N - - - - - - - - - -> 1:Many only
NO clustering
• Denormalize
Normalized (2,3,4) Primary Keys
(for performance)
=> NO “attributes” Relationships - - ->
Foreign Keys
+ triggers, stored
w/attributes
procedures
Sub/SupTypes
SCHEMA
DATABASE
Data Modeling - Representation Stages
DMOD
A SECOND CUT:
30
USE
R
NEW
• Conceptual (ORMHALPIN/NIJSSEN SUMMFULTON UDMCDMTG)

– only what the user knows or needs to know
– functional dependencies fully represented
– Elementary Facts - no clustering of “attributes” into “records”
• Clustered (ERCHEN EERTEOREY SDMMcLEOD SOMKROENKE SQL:99ANSI UML)
– identifiers (attributes or dependent relationships)
– keep: M:N, ternary relationships, super/subtypes,
attributed relationships, multi-valued items/rgroups
• “Logical” (RELATIONALCODD SQLANSI )
– flat files/tables; – stored identifiers; – 3NF (decompose)
– resolve: M:N, ternary, super/subtype relationships
– foreign keys to represent relationships
SCHEMA
• Denormalize (Recluster) - for performance
• Physical (IMPLEMENTATION in a DBMS)
– triggers, stored procedures, user code to
DATABASE
represent and enforce semantics beyond the DBMS.
Data Modeling Schemes - ER
DMOD
31
• ENTITIES, that have ATTRIBUTES,
and participate in RELATIONSHIPS.
• Originated with Peter Chen, 1976, TODS (1:1)
• Notation has evolved, many variations
– Drop diamond; attributes inside entity box or suppressed.
•
•
•
•
•
No standard syntax notation (but similar semantics)
Common: attributes clustered into entity records.
Most popular today
Weak entity Association entity Relationship naming: one name, direction unstated,
thus ambiguous; need direction (>) or rule (eg. left to right).
EMPLOYEE
EmpNo
M
EmpName
works in
…
1
UnitNo
DEPT
Name
…
Data Modeling Schemes - Oracle
DMOD
32
•
•
•
•
In Oracle Designer*2000 tool (R. Barker, A-W, 1990)
A flavor of ER modeling
ENTITY in rounded box; optionally ATTRIBUTES inside
ATTRIBUTE flags: # - [part of] identifier
* - mandatory
o – optional
• RELATIONSHIPS: - binary only
- two names at end from which to be read
- optional ---, mandatory —, many
- identifying ———, fixed ———
EMPLOYEE
EmpNo (#)
EmpName (*)
Address (o)
works in
employs
DEPT
Data Modeling Schemes - IE
DMOD
33
•
•
•
•
•
•
Information Engineering (1970’s)
Due to Clive Finkelstein, adapted by James Martin
Used in several tools: IEF, IEW/ADW/Cool, ICES, …
Widely used, many variations, no single standard
ENTITIES: in boxes, optionally with ATTRIBUTES, in or out
RELATIONSHIPS: - usually binary only
- many ——— , at most one ———
- optional ——— (at the “other” end)
- mandatory, at least one ———
EMPLOYEE
DEPT
Data Modeling Schemes – IDEF1X
DMOD
34
• U.S. Air Force/Defense (1970’s), Appleton eXtensions
•
•
•
•
•
NIST (U.S. Govt) standard – 1993; revised in IDEF1X97; IEEE - 1998
Book by T. Bruce, 1992; Used in ERwin (now from CA), Visio, …
Widely used in and for U.S. Govt work, some outside
Some Relational restrictions: Foreign Keys, thus no M:N
“Unnecessarily complex, confusing, and forgettable” - Halpin
• ENTITY: independent , dependent • ATTRIBUTE flags: - Alternate Key - (AKi), Foreign Key - (FK)
- optional (O) – mandatory is default
• RELATIONSHIPS: - binary only, “child” ——— (may be arbitrary)
- First Name always read toward the child
- identifying —— , non identifying ----- “cardinality” on child: P - one or more, Z - zero or one, n - exactly n
----- Parent is optional (some allow many parents)
EMPLOYEE
EmpNo
EmpName
SS# (AK1)
Address (O)
UnitNo (FK)
employs/
works in
DEPT
DeptNo
DeptName
:
Forming a Relational Data Structure
RELSQL
35
Some rules:
• Define a TABLE or “Relation” for each Entity type
– Types of Entities: base/reference, dependent (“weak”),
association/intersection, event/transaction
– Assumes mutually exclusive (non-overlapping) populations
• SINGLE-VALUED ITEMS (“flat” tables)
– If multivalued or nested repeating group of items,
put into a separate table
• IDENTIFIER for every table (entity “integrity”)
• FOREIGN IDENTIFIERS to represent all relationships
1:M - stored in the child / dependent entity
1:1 - should probably merge into one table
M:N - must introduce an association/intersection table
• NORMALIZE to second and third normal form
– important for good design
– but not enforced by RDBMS... WHY?

Functional Dependency in Relationships
RELSQL
36
Basis for Database Normalization.
A  f (X)
is functionally dependent on
X
A
determines
X
A
A is dependent on X, and the Relationship is exclusive on A, multiple on X.
Clustered into a Record/table for entity of X:
X
A …
There can only be one
There can be multiple
There can be different
A for each X .
Xs for a given A .
As for the Xs .
Database Normalization
RELSQL
37
Start with ENTITIES, their IDENTIFIERS (unique keys)
and their ATTRIBUTE FIELDS (facts about each entity).
i.e., start with data items clustered into records/tables.
PROBLEM: we may do it wrong; cluster too much; some items in the
wrong place, which can lead to redundancy & update anomalies.
Any Flat File is a Relation, but… not all Relations are “well-formed.”
• NORMALIZATION is the test
– a set of rules to perform internal validation of a data model
• Record DECOMPOSITION is the remedy.
– Removing attributes from the entity record, and placing them in
a different, often a new entity record
(1) First Normal Form: no multivalued items or rgroups.
(2) Second Normal Form: no partial dependencies.
(3) Third Normal Form: no transitive dependencies.
“Every non-key data item must be single-valued, and dependent upon
the key, the whole key, and nothing but the key… so help me Codd.”
Anomalies
RELSQL
38
Resulting from (clues to) poor database design:
EMPLOYEE#
o
•
•
•
•
•
EMPNAME
SKILL
PROFICIENCY … BOSSNAME
DEPT#
DEPTNAME
DEPTNAME and BOSSNAME stored redundantly
if EMPLOYEE moves to another DEPT#, DEPTNAME
and BOSSNAME would also change, needing update.
If a DEPTNAME (or BOSSNAME) for a DEPT changes,
must update all occurrences, else inconsistency.
To delete a DEPT you must also delete all its
EMPLOYEEs (unless null foreign keys allowed!)
If you delete the last EMPLOYEE in a DEPT, you also
delete that DEPT (unless null keys allowed!…multiple?)
No place to insert a DEPT# and its DEPTNAME, if
there are no EMPLOYEEs there.
Summary of all Normal Forms
RELSQL
39
GIVEN:
– a set of attributes, clustered into tables/records with identifiers
– all functional dependencies on the attributes
• No multi-valued, non-key attributes (1NF)
• No partial dependencies on non-key attributes (2NF)
• No transitive dependencies in non-key attributes (3NF)
• No partial or transitive dependencies within any key (EKNF, BCNF),
i.e., consider all candidate keys.
• No multiple, independent multi-valued attributes in the same table
(4NF)
• No join dependencies, i.e., a relation can be reconstructed without
loss of information by joining some of its projections (5NF).
• No more than one table with the same key (“minimal”).
• No transitive dependencies across tables (“optimal”).
NOTE: number order is artificial, i.e., there is no necessary sequence
to the normal forms.
Normalization – Testing your Understanding
RELSQL
40
Assuming that A is single valued with respect to X (i.e. 1NF).
GIVEN:
Could you have a violation of: (if not, why not?)
X
A
X
A
X
A
X
A
2NF?
3NF?
4NF?
B
2NF?
3NF?
4NF?
B
2NF?
3NF?
4NF?
B
What does this diagram mean?
How does this differ from diagram above, if any?
MUST DISTINGUISH THE PRIMARY KEY .
Representing a M:N Relationship
DMOD
41
Another Pattern:
EMPLOYEE
PROJECT
• If you cannot store multiple Projects (or Project IDs) in an
Employee record, or multiple Employees (or Employee IDs) in a
Project record (as is the case in a Relational Database), then …
you must introduce an “Intersection Entity” between them to
represent the Many-to-Many Relationship.
EMPLOYEE
EMPL-ID
PROJECT
PROJ-ID
• The Intersection Entity also provides the place
to store additional attributes of the relationship
e.g., Hours Worked, Rate of Pay, …
What is the problem with this representation?
Representing a Ternary Relationship
DMOD
42
While we can develop a consistent notation for binary relationships,
ternary relationships are a problem.
EMPLOYEE
SKILL
PROFICIENCY
• If one of the entities is single valued,
is it really ternary? Or “attributed” binary?
• What lends uniqueness to each instance
of the relationship?
• How to verbalize the relationship? Which order?
• How to represent Multiplicity / Exclusivity ?
• How to represent Dependency? Must have all 3?
What’s Wrong with ER Modeling?
ORMvER
43
I will show you still
a more excellent way
– PAUL, I Cor 12.31
N
Record-based Design
B
ORMvER
44
WHAT DOES THIS “RECORD” REPRESENT?
X
X
A
X
B
X
C
A
B
C
Design minimal records
with at most one non-key domain.
Now what do these “records” represent?
Perhaps Codd was right in naming it a _________!
Avoids spurious associations, e.g., A – B …
Could there be any violations of normal forms?
What about the representation of the entity
What if
A
is related to other “entities”?
X?
Transform Record-based (ER) Design
ORMvER
45
TO REALLY REPRESENT THE ENTITY DOMAINS
X
A
B
C
A
X
A
X
B
X
C
Object
Role
Model:
X
B
C
Data Modeling
ORMvER
46
THINKING ABOUT ATTRIBUTES:
Record-Based (ER):
ENTITY
IDENTIFIER
ATTRIBUTE
ATTRIBUTE
Object-Role (ORM):
ENTITY
...
ENTITY
ENTITY
ENTITY
ENTITY
(id)
ENTITY
ENTITY
ENTITIES have ATTRIBUTES / DESCRIPTORS
by playing roles in relationships with other entities.
Record-Based Modeling
ORMvER
47
GIVEN TWO FACTS (conceptually):
• one about the CITY a PERSON lives in
• another about the CITY a PERSON works in
ASSUME:
• every person has to live and work in a city
• each person can live and work in only one city at a time
• not interested in anything more about persons or cities
EXAMPLE:
• Gordon Everest lives in Falcon Heights and * works in Minneapolis
DIAGRAM A CONCEPTUAL DATA MODEL
– to represent this information (a database to contain these facts)
Record-Based Data Model
ORMvER
48
for PERSON lives in / works in a CITY
• What is the entity and what is the attribute?
• Would it make any sense to say (to a novice layperson - a user):
– CITY was an "attribute" of PERSON?
• Doing more than is necessary at the conceptual level
PERSON
PersonID [key]
LiveCity
WorkCity
•
•
•
•
•
cannot have CITY and CITY as attributes of PERSON
column/attribute name reflects " entity + role "
CITY as an entity/object is lost (not its own table)
what if there is a CITY where no one lives or works
some add concept of a DOMAIN
Object-Role Model
ORMvER
49
for PERSON lives in / works in CITY
lives in
CITY
PERSO
N
(id)
(name)
works in
FORML language statements:
• PERSON lives in CITY
• Every PERSON lives in some CITY
• Each PERSON lives in at most one CITY
• ... for works in
FACT
Record-Based Modeling
ORMvER
50
for an additional fact.
• A PERSON makes sales calls in multiple CITIES
DIAGRAM the extended conceptual data model
• can you add an attribute "SalesCallCities" to PERSON?
FLAT Record-Based Modeling is even worse:
• create a new table SALESCALLS with a compound key
– Is this a real entity in the conceptual view?
EXTEND THE OBJECT-ROLE DATA MODEL
Record-Based Data Modeling
ORMvER
51
DISADVANTAGES:
• no way to capture the conceptual view directly
• must mentally map from conceptual view to the
"logical" (record-based) view
– by structural groupings of attributes and relationships
• must choose unique, arbitrary names
– for attributes in a record; for spurious new "entities"
•
•
•
•
cannot reuse attributes in the same table
must do your own normalization
hides or ignores inter-attribute relationships
creates (implies) spurious inter-attribute relationships
Object-Role (ORM) Data Modeling
ORMINTRO
52
THE ESSENTIAL DIFFERENCE:
• Three main constructs ..rolled into.. Two main constructs
Record-based modeling:
ENTITY
NIAM/ORM modeling:
? ? ? ?
What to call it?
ATTRIBUTE
OBJECT
ENTITY
ENTRIBUTE!
RELATIONSHIP
Role in
RELATIONSHIP
Data Modeling Terminology
ORMINTRO
53
O-R
E-R
("conceptual")
("logical")
COBOL/DBTG
("physical" implementation)
ENTITY (TYPE)
RECORD TYPE
ATTRIBUTE
DATA ITEM
OBJECT
(ELEMENT)
FACT
SENTENCE
INSTANCE
RECORD
IDENTIFIER
PREDICATE
RELATIONSHIP
CONSTRAINT CHARACTERISTICS
RELATIONAL
RELATION
TABLE
COLUMN
FIELD
ROW
TUPLE
KEY
"SET"
FOREIGN KEY
CONSTRAINT
Fact Sentence - Verbalize
ORMODLG
54
• A Fact = a Predicate + Object(s) => Sentence
• THINK: Objects playing Roles in a Relationship
• Naming: object instances versus object types
– e.g. “Ann” is an instance of “Person”
• Arity - the number of object “holes” in the Predicate
– UNARY:
- “Ann smiles”
- only 2 states: true/false, present/absent, yes/no
- making the closed world assumption
– BINARY:
- “Ann likes to run”
- most common
- has an inverse- “Running is liked by Ann”
- Inverse name is never the same (else symmetric, handled differently)
– TERNARY:
- “Ann married Bob in 1967”
with types:
- “PERSON married PERSON in YEAR”
- verbalizing can be difficult with more than 2 (sequence problem)
Symbolize: ORM Constructs
ORMODLG
55
• OBJECT (ENTITY, CONCEPT) - NOUN … in an ellipse
• PREDICATE (RELATIONSHIP) - verb = role name …in a box
– unary, binary, ternary, +++
Binary Predicate:
PREDICATE
OBJECT1
role12 role21
OBJECT2
Elementary Binary Fact Sentence:
works in employs
PERSON
Verbalization:
“PERSON works in DEPARTMENT”
“DEPARTMENT employs PERSON”
DEPARTMENT
Adding ORM Constraints
ORMODLG
56
works in employs
PERSON
DEPARTMENT
Verbalization:
“PERSON works in DEPARTMENT”
“DEPARTMENT employs PERSON”
DEPENDENCY (MANDATORY):
“PERSON must work in some DEPARTMENT”
EXCLUSIVITY (UNIQUENESS):
“PERSON works in at most one DEPARTMENT”
Methodology
Steps in OR Modeling
ORMINTRO
57
•
•
•
•
•
Familiarize with real world Universe of discourse
Verbalize sentences of elementary facts
Symbolize build the conceptual ORM model diagram
Constrain the roles in predicates
Validate the conceptual data model
• Map into neutral, record-based, logical tables
• Refine the table definitions
• Generate
physical database definition for target DBMS
VisioModeler Architecture
ORMINTRO
58
FORML fact sentences
Population
Quick Facts
Tables
FACT EDITOR
CONCEPTUAL
DATA MODEL
DIAGRAMMER
correct
VERBALIZER
VALIDATE
BUILD
DICTIONARY
DICTIONARY
"REPOSITORY"
"LOGICAL"
DATA MODEL
(CHECK)
refine
(TABLES)
BROWSER
GENERATE
PHYSICAL DATABASE
STRUCTURE & DEFINITION
for a target DBMS
Levels of Abstraction in NIAM/ORM
DMODPRE
59
REMOVING (generally in order of importance):
1. Lexical Object Types (LOTS); Value Object Types
2. “Terminal” Object types – equivalent to / become “attributes”
IF: – play only functionally dependent roles (often only one role)
i.e. One:Many relationships; (disjunctive) mandatory (implied)
3. Common Object Types - generic value domains / ref. modes
4. “Event” Object Types
5. Dependent (“weak”) Object Types
- Subtypes, Objectified Facts
6. User-defined priority levels on Object Types
7. Constraints and Reference Modes
8. Predicates
Sample, Simple ORM Data Model
DMODPRE
60
earns
EMPLOYEE
(number)
works in
BOSS
SALARY
(dollars)
paid to
DEPT
(number)
employs
supervises is headed by
reports to
superior to
ac
may spend up to of spending for
LIMIT
"EmployeeSkill!"
{ 1000 .. 9999 }
possesses
<=5
possessed by
SKILL
(code)
has
DESCRIPTION
(name)
is of
{ 1 .. 10 }
with proficiency of assigned to
A major criticism of
NIAM / ORM, both
by protagonists and
proponents, is that
it is too detailed, a
bottom-up design,
BUT… ER Diagrams
usually omit the
details of attributes
and most constraints.
So, present the model
using top-down
abstractions.
RATING
Remove "Terminal" (M:1) Objects
ORM Abstractions
DMODPRE
61
• Removing "Terminal" (M:1) Objects
{ 2000 .. 2999 }
EMPLOYEE
DEPT
works in employs
(number)
BOSS
(number)
supervises is headed by
reports to superior to
ac
"EmployeeSkill!"
{ 1000 .. 9999 }
possesses possessed by
<=5
SKILL
(code)
Remove Constraints and Reference Modes
ORM Abstractions
DMODPRE
62
• Removing Constraints and Reference Modes
EMPLOYEE
works in employs
BOSS
DEPT
supervises is headed by
reports to superior to
possesses possessed by
SKILL
Remove Less Important Objects & Predicates
– Subtypes, Objectified Predicates, Reflexive Relationships
ORM Abstractions
DMODPRE
63
• Removing Less Important Objects & Predicates
– Subtypes, Objectified Predicates, Reflexive Relationships
EMPLOYEE
works in employs
supervises is headed by
SKILL
Remove Predicates
DEPT
ORM Abstractions
DMODPRE
64
• Removing Predicates
EMPLOYEE
SKILL
DEPT
... Leaving BASE Entities!
A Top-Level Abstract Conceptual Data Model
an ER Diagram ? ! ! !
Language Design Criteria
ORMQURY
65
See: Halpin, “Conceptual Queries”.
• Semantic Strength, Expressiveness
– Able to model all relevant details in the domain
– The range of queries that can be expressed
– The “100% Principle”
• Semantic Clarity
– Ease of Understanding and Use; intuitive
– Unambiguous, i.e., only one possible meaning
• Semantic Relevance
– Only relevant information need be stated
– Not dependent on artificial or spurious expressions
• Semantic Stability, Independence
– How well the model/query retains its original intent
in the face of changes to the underlying application
Conceptual Query Language
ORMQURY
66
See: Halpin, “Conceptual Queries”.
• ConQuer
– Based on ORM
– Need not be familiar with ORM or its notation
“user can construct a query without any prior
knowledge of the schema” but…
– In the form of a textual outline
- Indentation is significant
– Implemented in Visio ActiveQuery
- Object pick list – drag to the query window
- Roles pick list – drag to the query window
– Projection – items to display marked with a tick ()
– Mapping to SQL
Sample ConQuer Query (1)
ORMQURY
67
See: Halpin, “Conceptual Queries”.
“List Employees who live in
the City that is the
Location of Branch 52”
Employee
(number)
Branch
(number)
lives in /
City
is located in / is location of
CityName
/ has
U
State
 Employee [number]
/ is in
(code)
+– lives in City
+– is location of Branch [number =] 52
NOTE: City acts as a Join object type (the common “attribute”),
i.e. Employee and Branch are joined through City.
Semantic clarity (+), semantic relevance (+), semantic stability (+).
SQL for Sample ConQuer Query (1)
ORMQURY
68
See: Halpin, “Conceptual Queries”.
“List Employees who live in
the City that is the
Location of Branch 52”
Employee
lives in /
(number)
Branch
(number)
is located in / is location of
CityName
In SQL: (Where are the tables?)
City
/ has
U
State
(code)
/ is in
SELECT
EmployeeNumber
FROM
Employee, Branch
WHERE
Employee.CityName = Branch.CityName
and
Employee.StateCode = Branch.StateCode
and
Branch.BranchNumber = 52
Could you do this in Access using the Query Form?
Semantic clarity (-), semantic relevance (-), semantic stability (-)
Suppose an Employee could live in more than one City???
Suppose we now wish to record the Population of Cities???
Problems with ER Modeling - Summary
ORMvER
69
• Too much clustering; attributes in the wrong place
• Ignores (presumes) intra-record structure
(that is, inter-attribute relationships)
• Human modeler is responsible for normalization
remedy is always record decomposition
• Attribute migration… to become an entity
- modeler must distinguish attributes and entities
• Naming columns = domain + role, loses domain objects
• Modeling dilemma:
– Complete representation of an entity object - more clustering
– Full normalization (1NF) – decomposition, less clustering
• Indirect representation of M:N relationships
– Introduces artificial “new” entities
• Difficulty representing Ternary relationships
• Stability of the query language (SQL)
At the Root,
ORMvER
70
What’s wrong with
ER Modeling?
CLUSTERING
Gordon C. Everest
Carlson School of Management
University of Minnesota
Why NIAM/OR Modeling?
ORMINTRO
71
• roots in both LOGIC & LINGUISTICS
• based on one modeling construct: the fact sentence
• more expressive, understandable - diagrams & verbalization
• diagrams can be populated with actual data samples
• abstraction levels equivalent to E-R modeling
• more, richer semantics (than E-R, EER, IDEF1X)
• capture and represent all functional dependencies
• avoids normalization problems with record-based modeling
• better meets criteria for good data modeling
• organizations that switched wouldn’t go back to E-R
• direction of Standards
(SUMM, UDM, ...)
• now supported with a viable PC-based CASE tool
Resources on ORM
ORMvER
72
BOOK:
• Terry Halpin (now from Microsoft),
Information Modeling and Relational Databases:
From Conceptual Analysis to Logical Design,
Morgan Kaufmann Publishers, San Francisco, 2001, 763 pages.
WEB SITE for my course:
• http://webfoot.csom.umn.edu/faculty/everest/idsx431
– with ORM intro and further reading
– InfoModeler software download
– Usage Notes
SPRING CLASSES:
• IDSc 6431 (for MBAs)
• IDSc 4431 (for CSOM Undergrads)
• IDSc 4131 (for CCE and others)
TRAINING and CONSULTING:
• InConcept, Inc., Lake Elmo, MN
www.inconcept.com