1. DBMS/DW Review
Download
Report
Transcript 1. DBMS/DW Review
9
Data Warehouse
CSC5301
Hachim Haddouti
1
9
About Me
Hachim Haddouti, born in 1969, married, one baby 9 weeks
Ph.D. in Computer Science (Database Management Systems) at Technical
University of Munich under Supervision of Prof. Bayer (Inventor of B-Tree)
Master in Computer Science (Knowledge Management Systems) at
Techical University of Berlin
Project Manager at BMW Munich Germany
Senior Consultant and Project Manager at DaimlerChrysler Services (now
called T-Systems, Deutsche Telekom)
Research Scientist with Prof. R. Bayer in Technical University of Munich
UNESCO Consultant
Visiting Scientist at Tsukuba University, Japan, University of Sta. Barbara
University, California; University of Catania, Italy; Beijing Univ China …
Area of Interest: DBMS, Digital Libraries, Document & Content &
Knowledge Management, XML databases and Web technologies,
Multilinguality etc.
More at www.haddouti.de
2
9
Do You Remember?
DSS
OLTP
Data cube
MD
RollUp
drill down
Slice/dice
MD
Star schema MOLAP
Data extraction
ROLAP
Data mining
Fact table
3
9
Why DW?
Mining of mobile phone calls:
(Caller, Callee, Time, Duration, Geogr.
Location) ~ 100 B/tuple
In Germany
107 users * 10 calls/(day*user) * 100 B/call =
= 1010 B/day ~ 3*1012 B/year = 3 TB/year
Scanning data at 107 B/s takes
3*1012/107 = 3*105 s > 3 days
4
9
Data Warehouses
“Subject-oriented,
integrated, time-variant,
nonvolatile collection of data in support of
management’s decision-making process”
Inmon (AP = analytical processing is missing)
Used
for analysis of existing data
Resolves
performance issues suffered by
operational RDBMSs and OLTPs
5
Data Warehouse Architecture
Figure 9.7
9
6
9
Model
• need abstract model with above operations
• suitable datastructures
• very large databases
Relational Model?
• one-dimensional access via primary key
• n*m „relationships“ are 2-dimensional:
(FK1, FK2)
7
9
The Multidimensional Data Model
Requirements: must support typical analyses,
queries like
Sales of a product group digital cameras in
Nov, Dec Jan Feb in Munich area
sorted by sales of each product in €
sorted by sales in numbers
sorted by shops
8
9
Data model
ER Model
a disaster for querying a
huge amount of data (time)
not understandable for users
and they can not be
navigated
usefully
by
DBMS software.
hard to visualize; many
possible
connections
between tables,
To avoid redundancy
MD Model
better performance
Better data
organisation
Better visualization
Business queries
(why, what if)
9
9
Typical DWH Analyses/Queries
What are the consequences of new orders
for production capacity w.r. to investment,
personnel, maintenance, extra hours, ...
Seasonal adaptions, e.g. when to produce
how many skis, bikinis, convertibles, ...
Influence of external financing on profits
10
9
Operations:
• aggregation
• slice
• dice (cube)
• rollup to coarser level
• drill down to more detailed level
• grouping
• sorting
11
9
Data Cube Representation
12
9
Slicing on Time Dimension
13
9
Dicing on Part Dimension
14
9
Steps to build a DWH
Acquisition
of data
Data cleansing
Storage
Processing: AP
Maintenance, ...
Not possible with classical DB-technology
alone
15
9
On-Line Analytical Processing
OLTP (online transaction processing) for
operational data of enterprise, e.g. in
relational DBMS, IMS, SAP/R3, ...
DSS: Decision Support System to store
data/information for strategic management
decisions: aggregations, summaries, etc.
Optimized to work with data warehouses
Used to answer questions
Allows users to perceive data as a
multidimensional data cube
Data mining
16
9
OLTP versus OLAP
Thematic focus
OLTP: many small transactions (microscopic
view of business processes, individual steps at
lowest level, single order, delivery)
OLAP: finances in general, personnel in
general, ...
OLAP requires integration and unification of
many detailed data into big picture
Time orientation
Durability: data extracted once, no updates
17
9
Technical Comparison OLTP vs OLAP
OLTP: high rate of updates, several thousand t/s
OLAP: read only transactions, very complex, DWH is
loaded at certain time intervals, e.g. after the end of
the month, quarter
Compute intensive
Special systems with new access methods, e.g.
multidimensional data organization and access
methods
Special OLAP systems necessary to offload OLTP
systems
18
9
ROLAP and MOLAP
Solution 1: ROLAP relational online analytical
processing, built on top of relational DBS, additional
middleware or client front end (star schema)
Solution 2: MOLAP: multidimensional online analytical
processing
new model
new data organizations
new algorithms
new query languages
new optimization techniques
19
Data Warehouse Structure
9
20
9
Rules for OLAP Systems
Multidimensional conceptual view
Transparency
Accessibility
Consistent
reporting performance
Client/server architecture
Generic dimensionality
21
9
Rules for OLAP Systems
Dynamic
sparse matrix handling
Multiuser
support
Unrestricted,
cross-dimensional
operations
Intuitive
data manipulation
Flexible
reporting
Unlimited
dimensions and aggregation
levels
22