Data Warehouses

Download Report

Transcript Data Warehouses

9
Adv. DBMS
Data Warehouse
CSC5301
Review
Hachim Haddouti
1
9
Do You Remember?
DSS
OLTP
Data cube
MD
RollUp
drill down
Slice/dice
MD
Star schema MOLAP
Data extraction
ROLAP
Data mining
Fact table
2
9
Data Warehouses
 “Subject-oriented,
integrated, time-variant,
nonvolatile collection of data in support of
management’s decision-making process”
Inmon (AP = analytical processing is missing)
 Used
for analysis of existing data
 Resolves
performance issues suffered by
operational RDBMSs and OLTPs
3
9
Sizing DW?
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr.
Location) ~ 100 B/tuple
In Germany
107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year
Scanning data at 107 B/s takes
3*1012/107 = 3*105 s > 3 days
4
Data Warehouse Architecture
9
5
9
Data model
ER Model
 a disaster for querying a
huge amount of data (time)
 not understandable for users
and they can not be
navigated
usefully
by
DBMS software.
 hard to visualize; many
possible
connections
between tables,
 To avoid redundancy
MD Model
 better performance
 Better data
organisation
 Better visualization
 Business queries
(why, what if)
6
9
Typical DWH Analyses/Queries
What are the consequences of new orders
for production capacity w.r. to investment,
personnel, maintenance, extra hours, ...
 Seasonal adaptions, e.g. when to produce
how many skis, bikinis, convertibles, ...
 Influence of external financing on profits

7
9
Operations:
• aggregation
• slice
• dice (cube)
• rollup to coarser level
• drill down to more detailed level
• grouping
• sorting
8
9
Data Cube Representation
9
9
Steps to build a DWH
 Acquisition
of data
 Data cleansing
 Storage
 Processing: AP
 Maintenance, ...
Not possible with classical DB-technology
alone
10
9
On-Line Analytical Processing
OLTP (online transaction processing) for
operational data of enterprise, e.g. in
relational DBMS, IMS, SAP/R3, ...
 DSS: Decision Support System to store
data/information for strategic management
decisions: aggregations, summaries, etc.
 Optimized to work with data warehouses


Used to answer questions

Allows users to perceive data as a
multidimensional data cube

Data mining
11
9
OLTP versus OLAP
Thematic focus
 OLTP: many small transactions (microscopic
view of business processes, individual steps at
lowest level, single order, delivery)
 OLAP: finances in general, personnel in
general, ...
 OLAP requires integration and unification of
many detailed data into big picture
 Time orientation
 Durability: data extracted once, no updates
12
9
Technical Comparison OLTP vs OLAP


OLTP: high rate of updates, several thousand t/s
OLAP: read only transactions, very complex, DWH is
loaded at certain time intervals, e.g. after the end of
the month, quarter
 Compute intensive
 Special systems with new access methods, e.g.
multidimensional data organization and access
methods
 Special OLAP systems necessary to offload OLTP
systems
13
9
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical
processing, built on top of relational DBS, additional
middleware or client front end (star schema)
Solution 2: MOLAP: multidimensional online analytical
processing
 new model
 new data organizations
 new algorithms
 new query languages
 new optimization techniques
14
9
DW Review
• degenerate dimension
• big dimensions
• hierarchies
• snow falcking
• Slowly changing dimensions
• dirty dimensions
•Hetegrogeneous prodcuts (core and custom)
• Factless Fact table
15