Flawless Logi to Phys Modeling

Download Report

Transcript Flawless Logi to Phys Modeling

Flawless Logical to Physical
Data Model Transformations
Bert Scalzo, PhD.
Database Domain Expert
[email protected]
Copyright © 2006 Quest Software
About the Author …
Database Domain Expert & Product Architect for Quest Software
Oracle Background:
•
Worked with Oracle databases for over two decades (starting with version 4)
•
Work history includes time at both “Oracle Education” and “Oracle Consulting”
Academic Background:
•
Several Oracle Masters certifications
•
BS, MS and PhD in Computer Science
•
MBA (general business)
•
Several insurance industry designations
Key Interests:
•
Data Modeling
•
Database Benchmarking
•
Database Tuning & Optimization
•
"Star Schema" Data Warehouses
•
Oracle on Linux – and specifically: RAC on Linux
Articles for:
•
Oracle’s Technology Network (OTN)
•
Oracle Magazine,
•
Oracle Informant
•
PC Week (eWeek)
Articles for:
•
Dell Power Solutions
Magazine
•
The Linux Journal
•
www.linux.com
•
www.orafaq.com
Books by Author …
Coming in 2008 …
Agenda
• Purpose
– Identify issues arising from over reliance on modeling tools
– Illustrate Top 10 most common modeling issues faced when
transforming data models from logical (conceptual) to physical
– Describe how to correctly identify these issues
– Explain why these issues are serious problems
– Provide Best Practices to resolve these issues
• Overview
– Primary, Unique and Foreign Keys
– Inheritance (i.e. super/sub-types)
– Relationship Dependencies
– Normalization/Denormalization
World of Data Modeling …
• Bus. Analyst
• Data Architect
• Data Analyst
Logical Data Modeling
(.TXL file)
• DBA
• DB Developer
• DB Architect
Physical Data Modeling
(.TXP file)
Toad Data Modeler synchronizes data
models from all levels into a single tool
• Identify all data & relationships
- E/R (Entity/Rel’ship) diagrams
- Database independent view
• Business Rules
• Focus=Effectiveness
•
•
•
•
Database platform specific
Reverse engineer existing DB
Create/Update DB from model
Focus=Efficiency
10 Most Common Logical to Physical
Data Modeling Transformation Issues
Here we go…
1. Many to Many Relationships
•
•
•
You can NOT physically implement many to many relationships
You may potentially miss multiple key business requirements
Many modeling tools will attempt to automatically resolve this
1. Resolution
•
Need to accurately model true business requirements yourself in logical…
Intersection or
Bridging Entity
may have its own:
Attributes
Unique ID’s
Relationships
–Lookup
–Parent/Child
2. Avoid Partial Unique Keys
•
•
•
You SHOULD NOT physically implement partial unique keys
You may invalidate or corrupt key business requirements
Some Databases (i.e. Oracle) can surprise you on how works
Logically
Wrong
Only an issue for
composite unique keys
•Modeling tool generates initial physical database design
•Modeler/Architect/DBA often incorrectly modifies design
•Change column to allow Null to reduce constraints
Database will allow
unintended results
(see next slide)
Toad
Data
Modeler
will
prevent
2. Effect of Incorrect Change
Issue 3: Avoid Candidate
Key Loss
All these alternate or
candidate keys exist
due to some business
requirements
•
•
You SHOULD NOT lose candidate or alternate unique ID’s
You may potentially miss multiple key business requirements
3. Effect of Incorrect Change
•
Modeling tool generates initial physical database design:
Index Count = 4
• Modeler/Architect/DBA often incorrectly modifies design
• Remove indexes to increase efficiency, but now allow bad data 
Eliminated indexes to
increase efficiency at
the cost of business
requirements!!!
4. Avoid Surrogate Key Loss
Only an issue
when replacing
primary key with
surrogate/artificial
key
•
•
DO NOT lose unique ID’s when convert to surrogate keys
May potentially miss multiple key business requirements
– This is actually a special (extended) case of prior issue
4. Design Generated by Tool
•
Note that the two FK’s are part of the PK
Issue 4: Effect of Incorrect Change
•
Modeler/Architect/DBA often incorrectly modifies design
Wrong – lost alternate key
Right – has new & old keys
5. Avoid Partial Foreign Keys
Referential integrity requires that for each row of a child table,
the value in the foreign key matches a value in a parent key.
Only an issue for
mandatory, composite
foreign keys
 DO NOT physically implement partial foreign keys
 May invalidate or corrupt key business requirements
 Some Databases (i.e. Oracle) can surprise you on how works
5. Effect of Incorrect Change
 Modeling tool generates initial physical database design
Toad Data
Modeler
will
prevent
 Modeler/Architect/DBA often incorrectly modifies design
Oracle Concepts:
Partially null composite foreign keys are permitted. If any
column of a composite foreign key is null, then the non-null
portions of the key do not have to match any corresponding
portion of a parent key. That mean’s no RI check!!!
(same issue as unique keys)
6. Avoid Indirect Foreign Keys
Note there is no business
requirement to relate
Entity_1 to Entity_3
 You SHOULD NOT physically implement implied FK relationships
 You may potentially enforce invalid business requirements
 You may needlessly add additional performance overhead
6. Design Generated by Tool
6. Effect of Incorrect Change
 Modeler/Architect/DBA often incorrectly modifies design
Superfluous
FK and not a
true business
requirement
7. Avoid Bogus Foreign Keys
Referential integrity requires that for each row of a child table,
the value in the foreign key matches a value in a parent key.
Many conceptual (logical) modeling tools will not permit you
to construct questionable scenarios since the relationship lines
implicitly reflect the association. The physical details are not
really known until implementation, which is exposed during
the physical modeling process. But there the tools generally
permit DBA’s to apply their insight to make things better …
7. Effect of Incorrect Change
 Modeling tool generates initial physical database design
Toad Data
Modeler will
prevent
 Modeler/Architect/DBA often incorrectly modifies design
A foreign key
pointing to a non
primary or unique
key, what does that
mean???
8. Problematic Relationships
 Many relationships CAN be logically modeled and physically
implemented…
– BUT should they be???
 Example 1
I’m a peon, I
manage no one
(recurse no bottom)
 Example 2
I’m the CEO, I have
no boss
(recurse no top)
8. Problematic Relationships
 Example 3
Which came first?
(Circular Logic)
8. Problematic Relationships
 Example 4:
Should you perform “Unification” of FK’s???
Keep Both
or
Combine?
9. Using Normal Forms
•
“Normalization” is often quoted, but generally not very well understood.
Some quick facts:
– Goal is to minimize data redundancy in order to lessen the
likelihood of inconsistent data
– Side effect of reducing data storage needs
– But is this important given today’s cheap disk…
– Useful primarily in OLTP and ODS database designs
– Normal forms are cumulative (e.g. 2NF includes 1NF)
– Easy jingle to remember:
• “Depends on the key, the whole key, and nothing but
the key – so help me Codd”
9. Common NF Violations
1NF – Attributes are all single
valued, there are no repeating
groups or arrays
2NF – All non-identifying
attributes are dependent on
the entity's entire unique
identifier (only applies when
have compound unique ID’s)
3NF – A non-identifying
attribute CAN NOT be
dependent upon another
non-identifying attribute
10: Super and Sub Types
How should you physically implement super and sub types?
There are three valid options …
10. Option 1 - One Big Table
 Generate Parent = Y and Generate Children = N
 Requires Discriminator attribute (e.g. Account Type)
 Violates third normal form (… nothing but the key …)
 PRO: Easy to code against, just one big table …
 CON: All child columns optional, so need table check constraint
10. Option 2 - Table per Sub Type
 Results in N-1 tables
 Gen. Parent = N, Gen. Children = Y, Inherit All Attributes = Y
 PROS: All child columns implemented as expected
 CON: Two tables to code against …
10. Option 3 - Table per Super
and Sub Type
 Results in N tables
 Gen. Parent = Y, Gen. Children = Y,
Inherit Only Primary Attr. = Y
 NOT RECOMMENDED: Just Plain Overkill
Logical/Conceptual to Physical Transformation
•
Check List:
– Verify everything with the business analysts and end users
– Verify everything with the business analysts and end users
– Verify everything with the business analysts and end users
•
Use your software’s model checking utilities and/or reports
– Every entity must have unique identifier (as per Chen)
– Resolve many-to-many relationships (cannot be built)
– Double check isolated entities (i.e. no relationships)
– Look for very common, generic modeling patterns
•
Use your software’s generate physical model utility
•
NOTE – Generated physical model will require DBA review …
Refining the Physical Model
•
Check List:
– Verify that nothing was lost in translation from logical to physical
– Add table(s) required for implementation, but not modeled
• eg. Lookup tables
•
Use your software’s model checking utilities and/or reports
– Every table should have primary key
– Add foreign key relationship meta-data
– Add indexes to support data access needs (lots of work)
•
Use your software’s generate SQL or DDL script utility
•
REVIEW THE SCRIPT!
– Never just run SQL without looking at it
Parting Thoughts ???
•
Data Modeling tools do not automatically = good design
– Must do complete business analysis
– Must do adequate Conceptual -> Physical transformation
– Must add required physical meta-data (tuning & insight)
•
Many of the worst DB’s built result from failure to do the above
•
There are many other modeling issues – this was just a start …
– Breaking models into sub-models
– Round-trip Engineering:
• Conceptual -> Physical Model compare and sync
• Physical Model -> Database compare and sync
– Repository-based collaborative modeling
– Horizontal and Vertical Partitioning
– Data Warehousing (Star Schema design)
– Object-Relational Mapping
– etc, etc, etc …
Thank you
Please offer any questions or comments
Remember:
•Toad Data Modeler – data modeling for the rest of us 
•Robust, yet Inexpensive
•Both easy to learn & use
•www.quest.com/toad_data_modeler
Modeling White Papers
www.quest.com/documents/list.aspx?SearchOff=true&ContentTypeID
=1&prod=306
•Data Modeling: Common Mistakes and Their Impact
•Data Modeling: It's Really All About the Relationships
•Data Modeling: Reality Requires Super and Sub Types