Slides of Lecture 1
Download
Report
Transcript Slides of Lecture 1
Data Warehousing
MEC 623 – Data Warehousing
and Data Mining
The Need for Data Warehousing
• Traditionally, databases have supported
transactions.
• DBs are often optimized for transaction
processing.
• Nowadays, we also need DBs for decision
support
• Transaction processing schema may not
be amenable for decision support
Why DW?
Consider indexes
Help speed data retrieval
May slow data writes/updates
Transaction processing
Lots of writes, but less retrieval
Decision support
Almost all retrieval (few/no writes)
Efficient TPS hinders DS and the reverse
also holds
Need for DW continues
Organizations collect huge volumes of
data through transactions (and other
means)
How to take advantage of this data
Can be useful for decision support, planning,
etc
DB design to support TP doesn’t work well
for DS
Where lies the solution???
The Answer
Have two databases
Transaction-oriented
Decision support
Transaction databases: generates data for
strategic decision making
Decision support DBs: Warehouse data
Thus the term “data warehousing”
Decision Support Data
Need trends, rather than specific facts
Almost all reads and no writes
Up-to-the-minute accuracy isn’t required
Decision support
Decisions often require analyzing trends in
data (over time)
No need for transaction control in DS
database (almost all reads, no writes)
Up-to-the-second accuracy isn’t necessary
for DS
Data Warehousing
Data warehousing is a process
to benefit from historical transactional data
Using data warehouses
Data warehouse
Copy of transactional data formatted so that it’s
useful for query and analysis (decision support)
Features of a Data Warehouse
Collection of DBs designed for decision
support
DBs are subject-oriented
Organized around particular subjects
Data in DW are integrated from a
variety of internal and external sources
Data are usually transformed from
original format
Data are non-volatile