Transcript OLAP

Lecture 09:
OLAP
www.cl.cam.ac.uk/Teaching/current/Databases/
1
2+2
/* Microsoft SQL Server 2005 */
/* By the way, it is just VHVyaW5nIG1hY2hpbmU= :-) */
WITH SubQuery(t, s, a, b) AS
(
SELECT 0, 's', CAST ('<' AS VARCHAR(8000)), CAST ('110110' AS VARCHAR(8000))
UNION ALL
David Srbecky
SELECT
t + 1,
newS,
CASE mv
WHEN 'l' THEN SubString(curr.a, 1, Len(curr.a) - 1)
WHEN 's' THEN SubString(curr.a, 1, Len(curr.a) - 1) + newZ
WHEN 'r' THEN SubString(curr.a, 1, Len(curr.a) - 1) + newZ + Left(b + '_', 1)
ELSE '?'
END,
CASE mv
WHEN 'l' THEN newZ + b
WHEN 's' THEN b
WHEN 'r' THEN SubString(b, 2, ((Len(b)-1)+Abs(Len(b)-1))/2)
ELSE '?'
END
FROM
SubQuery AS curr,
(
SELECT 's', '<', '1', '<', 'r' UNION ALL
SELECT '1', '1', '1', '1', 'r' UNION ALL /* find 0 */
SELECT '1', '_', 'a', '0', 's' UNION ALL
SELECT '1', '0', '2', '0', 's' UNION ALL
SELECT '2', '0', '2', '0', 'r' UNION ALL /* find 1; left */
SELECT '2', '_', 'a', '_', 's' UNION ALL
SELECT '2', '1', '3', '1', 'l' UNION ALL
SELECT '3', '0', '4', '1', 's' UNION ALL /* 0 -> 1 */
SELECT '4', '1', '4', '1', 'r' UNION ALL /* find 0 or _; left */
SELECT '4', '_', '5', '_', 'l' UNION ALL
SELECT '4', '0', '5', '0', 'l' UNION ALL
SELECT '5', '1', '6', '0', 's' UNION ALL /* 1 -> 0 */
SELECT '6', '1', '6', '1', 'l' UNION ALL /* rewind */
SELECT '6', '0', '6', '0', 'l' UNION ALL
SELECT '6', '<', 's', '<', 's'
/* restart */
) AS prog(currS, currZ, newS, newZ, mv)
WHERE
curr.s = currS AND
Right(curr.a, 1) = currZ
)
SELECT CharIndex('0', a + b) - 2
FROM
SubQuery
WHERE
s = 'a'
OPTION (MAXRECURSION 0);
/* SELECT t, s, a + '.' + b FROM SubQuery OPTION (MAXRECURSION 0); */
2
Acknowledgments
• DB2/400: Mastering Data Warehousing Functions. (IBM
Redbook) Chapters 1 & 2 only.
http://www.redbooks.ibm.com/abstracts/sg245184.html
• Data Warehousing and OLAP
Hector Garcia-Molina (Stanford University)
http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt
• Data Warehousing and OLAP Technology for Data
Mining Department of Computing
London Metropolitan University
http://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt
3
uzz Words Buzz Words Buzz Words Buzz W
•
•
•
•
•
•
•
•
•
•
•
•
•
Data Warehouse (DW)
Decision Support (DS)
Data Marts (DM)
Data Mining (DM)
Enterprise Dashboard (ED)
Multi-Dimensional Modeling (MDM)
Online Analytic Processing (OLAP)
Extract, Transform, and Load (ETL)
MOLAP vs. ROLAP
Three Letter Acronym (TLR)
Drill Down, Roll up (DD+RU)
Data vs. Knowledge (DvK)
Data Cube vs. Sugar Cube (DCvSC)
Don’t be surprised to see this
sort of BDB (Blah-Dee-Blah)
in the trade press:
“The ED lets you transform
enterprise data into knowledge
with at-a-glance DS/DM and
MDM, allowing interactive
DD/RU over large DCs.”
4
OLTP vs. OLAP
•
•
•
•
Database is operational
Data is up-to-date
Mostly updates
Need to support high
levels of update
transactions
• Normal form schemas
are important
•
•
•
•
Database is for analysis
Data is historical
Mostly reads
Need to efficiently support
complex queries, and only
bulk loading of data
• Schema optimized for
query processing
5
Decision Support Systems
Information Sources
Data Warehouse
Server
(Tier 1)
OLAP Servers
(Tier 2)
Clients
(Tier 3)
e.g., MOLAP
Semistructured
Sources
Analysis
serve
Extract
Transform
Load
Query/Reporting
Data
Warehouse
serve
e.g., ROLAP
serve
Operational
DB’s
Data Mining
Data Marts
6
From Enrico Franconi CS 636
xOLAP
• Multi-dimensional OLAP (MOLAP)
– ‘A k-dimensional matrix based on a non relational storage
structure.’ [Agrawal et al]
• Relational OLAP (ROLAP)
– ‘A relational back-end wherein operations of the data are
translated to relational queries.’ [Agrawal et al]
• Hybrid OLAP (HOLAP)
– Integration of MOLAP with ROLAP.
• Desktop OLAP (DOLAP)
– Simplified versions of MOLAP or ROLAP.
• ZOLAP
– Speak with your chemist (normally only prescribed for death
march victims)
7
Beware of Data Warehouse
Death March
Death March projects “use a forced march
imposed upon relatively innocent victims, the
outcome of which is usually a high casualty rate.”
Edward Yourdon, 1997, Death March:
The Complete Software Developer’s Guide to
Surviving “Mission Impossible Projects”
Data Warehouses and Decision Support systems are among the
most complex and demanding in the IT world. Failure rates are
very high….
8
Relational data model
• based on a single structure of data values in a two
dimensional table
CUSTOMER
ORDER
Cus_id
Cus_name
…
Ord_no Ord_date
Cus_id
…
001
Robert
…
01
02 Dec 02
002
…
002
Lyn
…
02
03 Dec 02
Lyn
…
…
…
…
…
…
…
…
9
Data warehousing
___Multidimensional Data
Sales volume as a function of product, month, and region
Product
Dimensions:
Product,
Location,
Time
Month
10
A Sample Data Cube
2Qtr
3Qtr
4Qtr
sum
U.S.A
Canada
Mexico
Country
TV
PC
VCR
sum
1Qtr
Date
Total annual sales
of TV in U.S.A.
sum
11
A Concept Hierarchy for Dimension Location
all
all
Europe
region
country
city
office
Germany
Frankfurt
...
...
...
Spain
North_America
Canada
Vancouver ...
L. Chan
...
...
Mexico
Toronto
M. Wind
12
Cuboids Corresponding to
the Cube
all
0-D(apex) cuboid
product
product,date
date
country
product,country
1-D cuboids
date, country
2-D cuboids
3-D(base) cuboid
product, date, country
13
Multidimensional Data:
A University Sample Data Cube
Abraham
Bridget
Caroline
Avg
Business
Computing
Avg
Average Mark
of Abraham in Year 1.
Year 1
Year 2
Year 3
Time
Module
Avg
Students’ marks as a function of student, department,
14
Data Warehousing
• “A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile
collection of data in support of
management’s decision-making process.”
—W. H. Inmon
15
OLAP Operations
• Roll up (drill-up): summarize data
– by climbing up hierarchy or by dimension reduction
• Drill down (roll down): reverse of roll-up
– from higher level summary to lower level summary or detailed
data, or introducing new dimensions
• Slice and dice:
– project and select
• Pivot (rotate):
– reorient the cube, visualization, 3D to series of 2D planes.
• Other operations
– drill across: involving (across) more than one fact table
– drill through: through the bottom level of the cube to its backend relational tables (using SQL)
16