Slides of Lecture 1

Download Report

Transcript Slides of Lecture 1

Data Warehousing
MEC 623 – Data Warehousing
and Data Mining
The Need for Data Warehousing
• Traditionally, databases have supported
transactions.
• DBs are often optimized for transaction
processing.
• Nowadays, we also need DBs for decision
support
• Transaction processing schema may not
be amenable for decision support
Why DW?
Consider indexes
Help speed data retrieval
May slow data writes/updates
Transaction processing
Lots of writes, but less retrieval
Decision support
Almost all retrieval (few/no writes)
Efficient TPS hinders DS and the reverse
also holds
Need for DW continues
Organizations collect huge volumes of
data through transactions (and other
means)
How to take advantage of this data
Can be useful for decision support, planning,
etc
DB design to support TP doesn’t work well
for DS
Where lies the solution???
The Answer
Have two databases
Transaction-oriented
Decision support
Transaction databases: generates data for
strategic decision making
Decision support DBs: Warehouse data
Thus the term “data warehousing”
Decision Support Data
Need trends, rather than specific facts
Almost all reads and no writes
Up-to-the-minute accuracy isn’t required
Decision support
Decisions often require analyzing trends in
data (over time)
No need for transaction control in DS
database (almost all reads, no writes)
Up-to-the-second accuracy isn’t necessary
for DS
Data Warehousing
Data warehousing is a process
to benefit from historical transactional data
Using data warehouses
Data warehouse
Copy of transactional data formatted so that it’s
useful for query and analysis (decision support)
Features of a Data Warehouse
Collection of DBs designed for decision
support
DBs are subject-oriented
Organized around particular subjects
Data in DW are integrated from a
variety of internal and external sources
Data are usually transformed from
original format
Data are non-volatile