Data Warehousing (Alex Ostrovsky)

Download Report

Transcript Data Warehousing (Alex Ostrovsky)

Data Warehousing
Alex Ostrovsky
CS157B
Spring 2007
Introduction
► Data
warehouse is a main repository of
corporate data
► Multiple databases are employed per
specific purpose
► Contains raw events and unprocessed data,
although separate tables might exist for
processed information displaying meaningful
data
What is it used for?
► Data
analysis
► Data mining
► Complex queries with multiple table join
► Forecasting
► Historical reporting
► OLAP (Online Analytical Processing)
High level view
Key Concepts and Features
► Data
is not required to be heavily
normalized
► Transaction Processing is done mostly
offline, thus processing time is not very
critical. Although, this might depend on
amount of data, normalization, query
complexity, and application specifications.
Key Concepts and Features (cont.)
► Unlike
regular OLTP real-time databases
data is subject-oriented
► Non-volatile, i.e. data is essentially stored
forever without being pruned or deleted.
► Heavily integrated: contains data from
majority of organization’s applications
► Time-variant: most of the data has some
time reference for the purpose of producing
the reports
Types of data warehousing DBs
► Offline
operational database: similar to
regular data replication. Used to minimize
the impact of queries on a running primary
operational system
► Offline data warehouse: heavily integrated,
reporting-oriented warehouse databases
which are updated with data from
operational databases on regular time
intervals
Types of data warehousing DBs
(cont)
► Real-time
data warehouse: database data is
updated instantaneously as soon as
transaction happens
► Integrated data warehouse: database is
integrated with primary operational system
for immediate decision making and
reporting.
Benefits of Data Warehousing
► No
need to stress operational database with
complex queries
► Separation of processing and business logic
► Very flexible, multiple distinct relations can be
defined from a set of data
► Can be customer or object specific
► Persistent – once result is computed from the raw
events, it doesn’t need to be recomputed again,
giving faster response time on subsequent
queries.
Dangers of Data Warehousing
► Heavy
processing requires physically separate
database machines for warehousing and OLTP
► Must be optimized for novice users, complex
queries might take a very long time
► Much more complex multidimensional design
compared to regular relational databases
► Errors in computational logic can cause serious
financial losses and computational recalculations.
► Data representation
► Relatively difficult to perform data migration
Database Design
► Data
warehousing databases mostly utilize
complex multidimensional design
► Relationships must be meaningful and represent
clear patterns and trends of unprocessed data.
More data and relationships you have more
dimensions database will have.
► Information is viewed along one common
dimensional position. Can be thought of as
intersection of a few planes.
OLAP Market
References
► http://en.wikipedia.org/wiki/Data_warehouse
► http://en.wikipedia.org/wiki/OLAP
► http://dmoz.org/Computers/Software/Databases/D
ata_Warehousing/
► http://dmoz.org/Computers/Software/Databases/D
ata_Warehousing/Articles/
► http://en.wikipedia.org/wiki/Multidimensional_data
base
► http://www.olapreport.com/market.htm