1. DBMS/DW Review

Download Report

Transcript 1. DBMS/DW Review

9
Data Warehouse
CSC5301
Hachim Haddouti
1
9
About Me










Hachim Haddouti, born in 1969, married, one baby 9 weeks
Ph.D. in Computer Science (Database Management Systems) at Technical
University of Munich under Supervision of Prof. Bayer (Inventor of B-Tree)
Master in Computer Science (Knowledge Management Systems) at
Techical University of Berlin
Project Manager at BMW Munich Germany
Senior Consultant and Project Manager at DaimlerChrysler Services (now
called T-Systems, Deutsche Telekom)
Research Scientist with Prof. R. Bayer in Technical University of Munich
UNESCO Consultant
Visiting Scientist at Tsukuba University, Japan, University of Sta. Barbara
University, California; University of Catania, Italy; Beijing Univ China …
Area of Interest: DBMS, Digital Libraries, Document & Content &
Knowledge Management, XML databases and Web technologies,
Multilinguality etc.
More at www.haddouti.de
2
9
Do You Remember?
DSS
OLTP
Data cube
MD
RollUp
drill down
Slice/dice
MD
Star schema MOLAP
Data extraction
ROLAP
Data mining
Fact table
3
9
Why DW?
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr.
Location) ~ 100 B/tuple
In Germany
107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year
Scanning data at 107 B/s takes
3*1012/107 = 3*105 s > 3 days
4
9
Data Warehouses
 “Subject-oriented,
integrated, time-variant,
nonvolatile collection of data in support of
management’s decision-making process”
Inmon (AP = analytical processing is missing)
 Used
for analysis of existing data
 Resolves
performance issues suffered by
operational RDBMSs and OLTPs
5
Data Warehouse Architecture
Figure 9.7
9
6
9
Model
• need abstract model with above operations
• suitable datastructures
• very large databases
Relational Model?
• one-dimensional access via primary key
• n*m „relationships“ are 2-dimensional:
(FK1, FK2)
7
9
The Multidimensional Data Model
Requirements: must support typical analyses,
queries like
Sales of a product group digital cameras in
Nov, Dec Jan Feb in Munich area
 sorted by sales of each product in €
 sorted by sales in numbers
 sorted by shops
8
9
Data model
ER Model
 a disaster for querying a
huge amount of data (time)
 not understandable for users
and they can not be
navigated
usefully
by
DBMS software.
 hard to visualize; many
possible
connections
between tables,
 To avoid redundancy
MD Model
 better performance
 Better data
organisation
 Better visualization
 Business queries
(why, what if)
9
9
Typical DWH Analyses/Queries
What are the consequences of new orders
for production capacity w.r. to investment,
personnel, maintenance, extra hours, ...
 Seasonal adaptions, e.g. when to produce
how many skis, bikinis, convertibles, ...
 Influence of external financing on profits

10
9
Operations:
• aggregation
• slice
• dice (cube)
• rollup to coarser level
• drill down to more detailed level
• grouping
• sorting
11
9
Data Cube Representation
12
9
Slicing on Time Dimension
13
9
Dicing on Part Dimension
14
9
Steps to build a DWH
 Acquisition
of data
 Data cleansing
 Storage
 Processing: AP
 Maintenance, ...
Not possible with classical DB-technology
alone
15
9
On-Line Analytical Processing
OLTP (online transaction processing) for
operational data of enterprise, e.g. in
relational DBMS, IMS, SAP/R3, ...
 DSS: Decision Support System to store
data/information for strategic management
decisions: aggregations, summaries, etc.
 Optimized to work with data warehouses


Used to answer questions

Allows users to perceive data as a
multidimensional data cube

Data mining
16
9
OLTP versus OLAP
Thematic focus
 OLTP: many small transactions (microscopic
view of business processes, individual steps at
lowest level, single order, delivery)
 OLAP: finances in general, personnel in
general, ...
 OLAP requires integration and unification of
many detailed data into big picture
 Time orientation
 Durability: data extracted once, no updates
17
9
Technical Comparison OLTP vs OLAP


OLTP: high rate of updates, several thousand t/s
OLAP: read only transactions, very complex, DWH is
loaded at certain time intervals, e.g. after the end of
the month, quarter
 Compute intensive
 Special systems with new access methods, e.g.
multidimensional data organization and access
methods
 Special OLAP systems necessary to offload OLTP
systems
18
9
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical
processing, built on top of relational DBS, additional
middleware or client front end (star schema)
Solution 2: MOLAP: multidimensional online analytical
processing
 new model
 new data organizations
 new algorithms
 new query languages
 new optimization techniques
19
Data Warehouse Structure
9
20
9
Rules for OLAP Systems

Multidimensional conceptual view

Transparency

Accessibility
 Consistent
reporting performance

Client/server architecture

Generic dimensionality
21
9
Rules for OLAP Systems
 Dynamic
sparse matrix handling
 Multiuser
support
 Unrestricted,
cross-dimensional
operations
 Intuitive
data manipulation
 Flexible
reporting
 Unlimited
dimensions and aggregation
levels
22