Topics for Today (10/13/99)

Download Report

Transcript Topics for Today (10/13/99)

Topics about Data Warehouses
What is a data warehouse?
How does a data warehouse differ from a transaction
processing database?
What are the characteristics of a data warehouse?
What are the components of a data warehousing
system?
How is a data warehouse created?
How is a data warehouse accessed?
1
TPS vs. DSS
Issue
TPS/MIS
DSS
Definition
Systems to
support day-today operations.
Systems to support
ad-hoc decision
making.
Users
clerks, data entry, managers, analysts,
low-level
support staff,
supervisors.
researchers.
Design goal
Performance.
Flexibility, ease of
use, ease of access.
Transaction
Type
Updates.
Queries.
Query
Activity
low; few joins.
high; many joins.
Transaction vs. DSS databases
Issue
Content
Transaction
database
Internal data,
process-oriented.
DSS database
Internal and
external data.
Subject-oriented.
Data
currency
Real time.
Batch.
Current.
Historical.
Volatile.
Non-volatile.
Summary
level
Details of
transactions; no
(or very little)
derived data.
Summarized;
many
aggregation
levels.
Volume
Megabytes to
gigabytes.
Gigabytes to
terabytes.
Design
Normalized to
prevent
anomalies.
Denormalized to
enhance query
performance.
So, can one database support both transaction
processing and decision support applications?
Yes
No
What is a data warehouse?
A data warehouse is a database designed to support a
decision support system.
A data warehouse is:
Integrated: It is a centralized, consolidated database integrating data
from an entire organization.
Subject-oriented: Data warehouse data are organized around key
subjects. The data are usually arranged by topic, such as customers,
products, suppliers, etc.
Time-variant: Data in the warehouse contain a time dimension so
that they may be used as a historical aggregation.
Non-volatile: Once data enter, they seldom leave. Data are
appended rather than overwritten. Data are updated in batches.
CustomerType
PK
CustomerTypeID
Description
Data warehouse
design example
SalesFact
Employee
PK
EmployeeID
Name
PK
PK
PK
PK,FK4
PK,FK1
PK,FK2
PK,FK3
Day
Month
Year
ProductID
CustomerTypeID
EmployeeID
LocationID
SalesDollars
#ofSales
Product
PK
ProductID
Description
Location
PK
LocationID
Description
Issues in designing a data warehouse
Must have a predefined subject focus.
Has the potential to be very large – must define
the “grain” or granularity level of storage.
Will always have a dimension of time.
Will contain derived data.
Will be a summary of data, rather than each
detailed transaction.
Does not always adhere to standard
normalization rules.
7
Customer
Transaction
Database
Order
Transaction
Database
Product
Transaction
Database
Data
Scrubbing
Data
Scrubbing
Data
Scrubbing
Data
Extraction
Data
Extraction
Data
Extraction
Data
Integration
Creating a
Data
Warehouse
Sales Data
Warehouse
Issues in creating a data warehouse
How to get accurate and complete data?
How to consolidate data?
Differing data meanings.
Differing storage mechanisms.
Differing data formats.
9
Components of a data warehousing system
Data store.
Extraction/filtering/transformation processes.
End user query tools.
End user visualization tools.
10
Two-tier data warehouse architecture
Data warehouse
server
Operational
database
Transformation
process
Operational
database
Summarized
data
External
data s ource
EDM
Data warehous e
User departments
Three-tier data warehouse architecture
Data warehouse
server
Operational
database
Transformation
process
Data mart tier
Extraction
process
Data mart
Operational
database
Summarized
data
External
data source
EDM
Data warehous e
Data mart
User
departments
Accessing a data warehouse
Visualization tools.
Graphical.
Spreadsheet format - usually Excel or Lotus look-andfeel.
Dashboard. Example:
http://tomcat.corda.com/superstore/sr.jsp
Query tools.
OLAP: Online analytical processing.
Data mining: Artificial intelligence based query
methods.
13
Online analytical processing
Provides multi-dimensional data analysis techniques.
Works primarily with data aggregation.
Provides advanced statistical analysis.
Provides advanced graphical output.
Supports access to very large databases.
Provides enhanced query optimization algorithms.
Lots of acronyms: OLAP, ROLAP, MOLAP, HOLAP.
Can be add-ons to existing products, example is Excel.
Can have their own user interfaces.
14
OLAP vs. Data Mining questions
OLAP
Which customers spent
the most with us in the
past year?
Data Mining
Which types of customers
are likely to spend the
most with us in the coming
year?
How much did the bank
What are the
lose from loan defaulters characteristics of the
within the past two years? customers most likely to
default on their loans
before the year is over?
What were the highest
selling fashion items in
our London stores?
What additional products
are most likely to be sold
to customers who buy
shorts?
Which store/location
made the highest sales in
the past year?
In which area whould we
open a new store next
year?
Data mining
Data mining tools:
analyze the data;
uncover patterns hidden in the data;
form computer models based on the findings; and
use the models to predict business behavior.
Proactive tools.
Based on artificial intelligence software such as
decision trees, neural networks, fuzzy logic systems,
inductive nets and classification networking.
16
What are some applications of data warehousing?
Customer relationship management.
Business process management.
Order management.
Strategic decision analysis.
17