Data Warehouses
Download
Report
Transcript Data Warehouses
9
Adv. DBMS
Data Warehouse
CSC5301
Review
Hachim Haddouti
1
9
Do You Remember?
DSS
OLTP
Data cube
MD
RollUp
drill down
Slice/dice
MD
Star schema MOLAP
Data extraction
ROLAP
Data mining
Fact table
2
9
Data Warehouses
“Subject-oriented,
integrated, time-variant,
nonvolatile collection of data in support of
management’s decision-making process”
Inmon (AP = analytical processing is missing)
Used
for analysis of existing data
Resolves
performance issues suffered by
operational RDBMSs and OLTPs
3
9
Sizing DW?
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr.
Location) ~ 100 B/tuple
In Germany
107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year
Scanning data at 107 B/s takes
3*1012/107 = 3*105 s > 3 days
4
Data Warehouse Architecture
9
5
9
Data model
ER Model
a disaster for querying a
huge amount of data (time)
not understandable for users
and they can not be
navigated
usefully
by
DBMS software.
hard to visualize; many
possible
connections
between tables,
To avoid redundancy
MD Model
better performance
Better data
organisation
Better visualization
Business queries
(why, what if)
6
9
Typical DWH Analyses/Queries
What are the consequences of new orders
for production capacity w.r. to investment,
personnel, maintenance, extra hours, ...
Seasonal adaptions, e.g. when to produce
how many skis, bikinis, convertibles, ...
Influence of external financing on profits
7
9
Operations:
• aggregation
• slice
• dice (cube)
• rollup to coarser level
• drill down to more detailed level
• grouping
• sorting
8
9
Data Cube Representation
9
9
Steps to build a DWH
Acquisition
of data
Data cleansing
Storage
Processing: AP
Maintenance, ...
Not possible with classical DB-technology
alone
10
9
On-Line Analytical Processing
OLTP (online transaction processing) for
operational data of enterprise, e.g. in
relational DBMS, IMS, SAP/R3, ...
DSS: Decision Support System to store
data/information for strategic management
decisions: aggregations, summaries, etc.
Optimized to work with data warehouses
Used to answer questions
Allows users to perceive data as a
multidimensional data cube
Data mining
11
9
OLTP versus OLAP
Thematic focus
OLTP: many small transactions (microscopic
view of business processes, individual steps at
lowest level, single order, delivery)
OLAP: finances in general, personnel in
general, ...
OLAP requires integration and unification of
many detailed data into big picture
Time orientation
Durability: data extracted once, no updates
12
9
Technical Comparison OLTP vs OLAP
OLTP: high rate of updates, several thousand t/s
OLAP: read only transactions, very complex, DWH is
loaded at certain time intervals, e.g. after the end of
the month, quarter
Compute intensive
Special systems with new access methods, e.g.
multidimensional data organization and access
methods
Special OLAP systems necessary to offload OLTP
systems
13
9
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical
processing, built on top of relational DBS, additional
middleware or client front end (star schema)
Solution 2: MOLAP: multidimensional online analytical
processing
new model
new data organizations
new algorithms
new query languages
new optimization techniques
14
9
DW Review
• degenerate dimension
• big dimensions
• hierarchies
• snow falcking
• Slowly changing dimensions
• dirty dimensions
•Hetegrogeneous prodcuts (core and custom)
• Factless Fact table
15