Transcript Slide 1
Overall objective of class: Enhance knowledge
about the management of data resources
Conceptual Objectives:
What you “know” when you finish the
class
Understand what it means to “compete
on analytics”.
Understand the issues that affect the
success and failure of BI
Know the basic components of a
business intelligence/data
warehousing system.
Understand the objectives and design
goals of a transaction database vs. a
data warehouse.
Understand the role of an IT
professional in facilitating the use of BI
within an organization.
Skill Objectives:
What you can “do” when you finish
the class
Analyze an organization’s approach to
business intelligence.
Use ERD modeling tool to document
the design of a transaction database
and data warehouse.
Design a data warehouse.
Design a data mart.
Use SQL Server to develop a
prototype BI solution for an
organization.
Class Objectives – Next 3 Weeks
Better understand the processes in the database design
life cycle.
Enhance personal knowledge of database design.
Enhance personal ability to design databases for
transaction processing.
Learn how to design a database targeted at decision
support (data warehouse and data mart).
Database Design Life Cycle
Requirements
Analysis
Database
Modification
Logical Design
Metadata Repository
Database
Implementation
Physical Design
Data
Distribution
Metadata repository
Database to store metadata.
Repository contains:
Business information: context, business rules, policies, data owner.
Technical information: field names, data types, sizes, security,
currency.
Transformation information: calculations, aggregations.
Application information: Processes used to create and update data,
processes used to access data, processes used to download data, data
quality, data edit/validation.
Best collected automatically rather than through human
responsibility and input.
Administration of the Repository
Requires an automated repository with a
database and tools for creation, update and
access.
Requires an administrator to:
Decide what should be stored and develop the structure for
storage.
Create policies and procedures for metadata maintenance.
Enforce maintenance of metadata.
Requirements Analysis
Identify what data needs to be stored.
New data is needed due to:
Data strategy. Assessment identified current and future
data needs for transaction processing and decision making.
New applications. New technical applications are identified
and data needs follow the new processes.
New decisions. Environmental changes trigger the need for
new decisions which require new, or differently configured,
information.
Methods used to determine requirements
“Give me what I say I want, then I can tell you
what I really want.” (W. H. Inmon)
Give me what I am thinking about but can’t explain, then I
may be able to tell you want I actually want.
Give me something that will help my business but I don’t
know what that is.
Give me something quickly (agile methodologies) so that I
have time to change my mind.
The requirements will change. Change is a fact of life in
database design in general, and BI in particular.
Logical Design
Development of a general data model.
Uses an entity-relationship diagram as the
communication tool.
Identifies:
Entities.
Relationships.
Primary and Foreign Keys.
Attributes.
Tasks during Logical Design
Identify needed data.
Develop user-oriented views of data
requirements
Integrate the views into a global schema.
Normalize the global schema.
Physical Design
Forecast performance requirements.
Change database as necessary to
accommodate performance considerations.
May require de-normalization of database for
performance considerations.
Convert logical design to SQL CREATE TABLE
statements.
Identify and create necessary indexes.
Data Distribution
Identify geographic considerations for data
structures.
Partition data as necessary to accommodate
geographic considerations.
Database Implementation
Extract existing data.
Transform existing data.
Load existing data.
Develop policies and procedures for data
security.
Develop policies and procedures for data
privacy.
Database Modification
Monitor performance.
Monitor data access.
Monitor data usage.
Modify structure as necessary to accommodate
changes from performance, access and usage.
Our databases are
intended to be
implemented on a
relational database
management system.
Rules of a relational database:
A database that calls itself
“relational” must abide by all the rules mandated here.
All data is represented by twodimensional tables – both the
logical and physical design of the
data are depicted in 2 dimensional
tables.
All data is accessible by table
name and column name.
Only one value is stored in an
intersection of any given row and
column, called a “cell”.
Null values are supported as
distinct values.
The data dictionary is available in
the same way as all user data.
There is one comprehensive
language to define, create, update
and access data (SQL).
Data can be updated and
accessed at the table level.
Changes to application programs
do not affect data structures.
Changes to data structures do not
affect application programs.
Data validation and integrity is
done by the database.
Programs cannot bypass data
validation and integrity rules.
Changes to the distribution of the
data do not affect application
programs.
Logical design review
Goals of database design
Components of a database design
Structure and purpose of a model
Purpose of normalization
Goals of database design
Goal 1: Correctly identify data necessary to
store.
Goal 2: Store data without redundancy
Goal 3: Store data so that information can be
accessed effectively.
What is data redundancy? (Micro)
Orderid
Order
Date
Cust
Name
Item
id
ItemName
Item
cost
Sell
price
Quantity
Ordered
1234
01/20/2012
Jones
A72
Martini Glasses – 4 pack
7.50
19.95
12
1234
01/20/2012
Jones
A43
Tumbler, 12 oz – 8 pack
9.20
22.95
16
3900
01/13/2012
Smith
A72
Martini Glasses – 4 pack
7.50
19.95
5
3900
01/13/2012
Smith
B33
Ketchup Dispensers – 12
pack
8.70
25.95
22
3900
01/13/2012
Smith
B97
Salt Shakers – 2 pack
2.20
9.95
3
8911
01/20/2012
Resco
A43
Tumbler, 12 oz – 8 pack
9.20
22.95
35
8911
01/20/2012
Resco
A72
Martini Glasses – 4 pack
7.50
19.95
235
What is data redundancy? (Macro)
SupportRequest
Order
PK
PK
OrderID
SupportID
RequestType
RequestComments
CustomerName
CustomerAddress
CustomerCity
CustomerState
CustomerZip
CustomerPhone
CustomerFax
CustomerMobile
OrderDate
RequiredDeliveryDate
BillingType
CustomerBillName
CustomerBillAddress
CustomerBillCity
CustomerBillState
CustomerBillZip
CustomerPhone
MarketingContact
PK
contactID
Name
Address
City
State
Zip
Phone
Mobile
Fax
PreferredContact
LastContactDate
Comments
Components of a database design (1)
Entity: person, place or thing about which we
want to store data. Synonyms: table, file.
Attribute: descriptive characteristic of an entity.
Synonyms: column, field.
Relationship: association between entities
created by a business rule.
Relationship cardinality: The number of instances of one
entity that can (or must) be related with each instance of
another entity.
Components of a database design (2)
Constraints
Rules or limits.
Constraints depicted on a logical model
Primary key.
Referential integrity.
Order
Customer
PK
customerID
LastName
PK
orderID
OrderDate
DiscountDescription
Item
PK
ItemID
Description
ListPrice
DiscountCode
PK
DiscountCodeID
Description
Order
Customer
PK
PK
customerID
LastName
HomeAddress
FK1
FK2
OrderLine
orderID
OrderDate
DiscountDescription
customerID
DiscountCodeID
BillingAddress
PK
PK,FK1
PK,FK2
shippingaddress
orderID
ItemID
quantity
price
Item
PK
ItemID
Description
ListPrice
Normalization
Normalization is:
The process of decomposing relations with anomalies to produce wellstructured relations.
A process accomplished in stages, or “normal forms”.
Normalization identifies data dependencies.
The purpose of normalization is to:
Provide greater structure to the design process.
Reduce data redundancy.
Simplify the enforcement of referential integrity.
Make it easier to maintain data.
What is an entity?
How is an entity denoted on an ERD?
Strong vs. weak entity?
Parent vs. child entity?
Associative entity?
Intersection entity?
Entity instance?
Examples of all…
What is a relationship?
How is a relationship denoted on an ERD?
What is a binary relationship?
What is the cardinality of a relationship?
How is cardinality denoted on an ERD?
What is a ternary relationship?
What is an n-ary relationship?
All about keys…
What is a primary key?
Surrogate primary key vs. natural primary key
Primary key during logical design vs. physical design
How is it denoted on an ERD?
What is a foreign key?
How is it denoted on an ERD?
What is the difference between a single attribue
key and a composite key?
What kinds of entities use composite keys?
Review about attributes
What is an attribute?
Single-valued vs. multi-valued attribute?
Stored vs. derived attribute?
Atomic vs. composite attribute?