24 SQL/OLAP SQL/OLAP

Download Report

Transcript 24 SQL/OLAP SQL/OLAP

SQL/OLAP
Sang-Won Lee
Let’s e-Wha!
Email: [email protected]
URL: http://home.ewha.ac.kr/~swlee
Jul. 12th, 2001
SQL/OLAP
ISO/IEC JTC1 SC32
1
Contents
 Introduction to OLAP and SQL Issues
 Current OLAP Solutions
 SQL/OLAP
 Future OLAP Trends
SQL/OLAP
ISO/IEC JTC1 SC32
2
OLAP
 On-Line Analytical Processing
–
E.F. Codd coined the term “OLAP”([1])
–
Multi-dimensional data model
–
vs. On-Line Transaction Processing
–
vs. Data warehouse
SQL/OLAP
ISO/IEC JTC1 SC32
3
Data Warehouse Architecture
SQL/OLAP
ISO/IEC JTC1 SC32
4
Multi-dimensional Data Model
 Sales(prod-id,store-id,time-id,qty,amt)
Dimension: Product, Store, Time
Hierarchy:
–
–
–
SQL/OLAP
Product -> Category -> Industry
Store->City -> State -> Country
Date -> Month -> Quarter -> Year
ISO/IEC JTC1 SC32
5
Multi-dimensional Data Model(2)
 Operations
–
–
–
–
–
–
–
roll-up/drill-down
slice/dice
pivot
ranking
comparisons
drill-across
etc.
 Example
–
–
–
for each state show me top 10 products based on total sales
what is the percentage growth of Jan-99 total sales
over total Jan-98?
for each product show me the quantity shipped and sold
SQL/OLAP
ISO/IEC JTC1 SC32
6
Database Back in the OLAP Game
- History of SQL Evolutions in 1990s(OLAP Area)  Requirements from industries(‘95 ~ ‘96)
–
R. Kimball, “Why Decision Support Fails and How to Fix it?”
([2]); see also [3], [4]
 Reactions from researchers(‘96)
–
–
Jim Gray et al., “Data Cube: A Relational Aggregation Operator
Generalizing Group-By, Cross-Tab and Sub Totals,” ([7,8])
Chatziantoniou, K. Ross, “Querying Multiple Features in
Relational Databases,”([9])
 Commercial DBMSs and SQL standards(‘98 ~ )
–
–
commercial products: e.g. Oracle, “Analytical Functions for
Oracle8i”, Oct., 1999
SQL standards
 ANSI X3H2-96-205(R3): Super Sets(The Cube and Beyond)
 ANSI NCITS H2-99-154: Introduction to OLAP
- see also [6]
SQL/OLAP
ISO/IEC JTC1 SC32
7
OLAP Operations
 Many business operations was hard or impossible to
express in SQL
–
multiple aggregations
–
comparisons(with aggregation)
–
reporting features
 Be prepared for serious performance penalty
 Client and middle-ware tools provide the necessary
functionality
–
OLAP server: ROLAP vs. MOLAP
SQL/OLAP
ISO/IEC JTC1 SC32
8
Multiple Aggregations
 Create a 2-dimensional spreadsheets that shows sum
of sales by maker as well as model of car
 Each subtotal requires a separate aggregate query
Cross Tab
Chevy
Ford
By Color
RED
WHITE
BLUE
By Make
Sum
SQL/OLAP
SELECT color, make, sum(amt)
FROM sales
GROUP BY color, make
union
SELECT color, sum(amt)
FROM sales
GROUP BY color
union
SELECT make, sum(amt)
FROM sales
GROUP BY make
union
SELECT sum(amt)
FROM sales
ISO/IEC JTC1 SC32
9
Comparisons
 Examples:
–
last year’s sales vs. this year’s sales for each product
 requires a self-join
VIEW:
create or replace view v_sales as
select prod-id, year, sum(qty) as sale_sum
from sales
group by prod-id, year;
QUERY:
select
cur.year cur_year,
cur.sale_cur_sales,
last.sum last_sales
from v_sales curr, v_sales last
where curr.year=(last.year+1)
SQL/OLAP
ISO/IEC JTC1 SC32
10
Reporting Features
 It was too complex to express
–
rank(top 10) and N_tile(“top 30%” of all products)
–
median, mode, …
–
running total, moving average, cumulative totals
SQL/OLAP
ISO/IEC JTC1 SC32
11
Reporting Features(2)
 Examples:
–
a moving average(over 3 day window) of total sales for
each product for 2000
VIEW:
create or replace view v_sales as
select
prod-id, time-id, sum(qty) as
sale_sum
from sales
group by prod-id, time-id;
QUERY:
select end.time, avg(start.sale_sum)
from v_sales start, v_sales end
where
end.time
>= start.time and
end.time
<= start.time+2
group by end.time
SQL/OLAP
ISO/IEC JTC1 SC32
12
OLAP Servers
Processing
MD queries
efficiently
SQL/OLAP
ISO/IEC JTC1 SC32
13
ROLAP
OLAP Client
OLAP Client
OLAP Client
To map warehouse
schema into a
MD model
OLAP Engine
meta-data
Relational Database
(Star or Snowflake Schema)
SQL/OLAP
ISO/IEC JTC1 SC32
14
ROLAP(2)
 Example: Oracle Discoverer 4i leverages Oracle 8i
–
8i - biggest SQL improvements in a decade!
–
more powerful analysis using new analytic functions
–
sharing query redirection(rewrite) using MVs
–
100% automated summary management
SQL/OLAP
ISO/IEC JTC1 SC32
15
MOLAP
 A multidimensional database(MDDB) stores data in
a series of array structures, indexed to provide
optimal access time to any element in the array.
 Example: Oracle Express stores arrays of data
18
9
0
19
10
1
11
2
2
0
0
1
1
2
3
4
5
6
8
9
10 11 12 13
7
14 15
14
5
1
3
4
16 17 18 19 20 21 22 23
26
5
17
24 25 26
2
U
C
0
23
2
O
D
20
11
P
R
20
8
2
6
7
8
T
0
T
0
M
SQL/OLAP
1
O
N
2
T
H
Y
1
I
C
ISO/IEC JTC1 SC32
16
Propose SQL Constructs
 Multiple aggregations
–
Gray et. al., “Cube and Roll-Up”[6,7]
 Comparison
–
Chatziantoniou and Ross, “Group By Column
Variable”[8]
SELECT subscriber, r.login-time
FROM log
GROUP BY subscriber: r
SUCH THAT r.spent-time = max(spent_time)
 Reporting
–
Redbrick provides SQL extensions in RISQL
 rank, tertile, ratio-to-report etc
SQL/OLAP
ISO/IEC JTC1 SC32
17
The Data CUBE Relational Operator
Generalizes Group By and Aggregates
Aggregate
Group By
(with total)
Sum
By Color
RED
WHITE
BLUE
Cross Tab
Chevy Ford
Sum
By Color
RED
WHITE
BLUE
The Data Cube and
The Sub-Space Aggregates
By Make
Sum
By Year
By Make
By Make & Year
RED
WHITE
BLUE
By Color & Year
Sum
By Make & Color
By Color
source:[6]
SQL/OLAP
ISO/IEC JTC1 SC32
18
Getting Sub-totals: ROLLUP Operation
SELECT year, brand, SUM(qty)
FROM sales
GROUP BY ROLLUP (year, brand);
YEAR
1996
1996
1996
1996
1997
…
1997
SQL/OLAP
BRAND
Ford
Honda
Toyota
Ford
SUM(qty)
250
300
450
1000
300
1200
2200
ISO/IEC JTC1 SC32
Getting Cross-tabs: CUBE Operation
SELECT year, brand, SUM(amount)
FROM sales
GROUP BY CUBE (year, brand);
YEAR BRAND SUM(AMOUNT)
1996 Ford
250
...
1996 Toyota
450
1997 Ford
300
...
1997
1200
Ford
550
Honda
650
Toyota
1000
2200
SQL/OLAP
ISO/IEC JTC1 SC32
Flexible Grouping: GROUPING_SETS
Operator
SELECT year, brand, color, SUM(qty)
FROM sales
GROUP BY GROUPING_SETS ((year, brand),
(brand,color),());
YEAR BRAND
COLOR
1996 Ford
1996 Honda
1996 Toyota
1997 Ford
1997 Honda
1997 Toyota
Ford
Blue
Ford
Red
Honda
Blue
Toyota Red
Toyota White
SQL/OLAP
SUM(QTY)
250
300
450
300
350
550
400
150
650
700
300
2200
ISO/IEC JTC1 SC32
Year, Brand
Brand, Color
Grand total
LAG Operator
SQL> SELECT timekey, sales
2 LAG(sales, 12) OVER
3 (ORDER BY timekey) AS sales_last_year,
4 (sales - sales_last_year) AS sales_change
5 FROM sales;
TIMEKEY
98-1
…..
99-1
99-2
99-3
99-4
99-5
99-6
99-7
99-8
SQL/OLAP
SALES
1100
…
1200
1500
1700
1600
1800
1500
1300
1400
SALES_LAST_YEAR
…
1100
1450
1350
1700
1600
1450
1250
1200
ISO/IEC JTC1 SC32
SALES_CHANGE
...
100
50
250
-100
200
50
50
200
22
MOVING Average
SELECT time-id, avg(sum(qty))
over (order by time-id
RANGE INTERVAL ‘2’ DAY PRECEDING )
as mvg_avg_sales
from sales
group by time_id ;
SQL/OLAP
ISO/IEC JTC1 SC32
23
SQL/OLAP
 Why enhance the RDBMS for OLAP calculations?
–
–
–
–
Before
After
Improvement
Performance
Scalability
Simpler SQL development
Productivity
Rollup
Functional Index
8.32
8.62
1.42
0.91
486%
847%
Top 10
4.26
1.02
318%
Moving window Cumulative window
43.62
45.55
4.97
3.36
778%
1256%
Lead and lag
175.01
4.96
3428%
4000%
% Improvement
3500%
3000%
2500%
2000%
1500%
1000%
500%
0%
Rollup
SQL/OLAP
Functional Index
Top 10
Moving window
Cumulative
window
ISO/IEC JTC1 SC32
Lead and lag
24
Database Back in the OLAP Game
 Materialized views
 Index techniques: e.g. bitmap (join) index
 Partitioning: e.g. range/hash/list
 Query optimization: e.g. star query optimization
 ......
SQL/OLAP
ISO/IEC JTC1 SC32
25
Future OLAP Trends
To be or not to be?
OLAP API:
-OLE DB for OLAP
-JOLAP
SQL/OLAP
ISO/IEC JTC1 SC32
26
References
[1] E.F. Codd et al., “Providing OLAP(On-line Analytical Processing) to User-Analysts: An IT
Mandate,” Available from Arborsoft’s Web Site(http://www.arborsoft.com)
[2] R. Kimball, “Why Decision Support Fails and How to Fix it?” SIGMOD Record, Sep.,1995
[3] R. Kimball, “The Problem with Comparisons,” DBMS Magazine, Jan., 1996(also available
from http://www.rkimball.com/html/articles.html)
[4] R. Kimball, “SQL Roadblocks and Pitfalls,” DBMS Magazine, Feb., 1996(also available
from http://www.rkimball.com/html/articles.html
[5] R. Winter, “Database Back in the OLAP Game,” Intelligent Enterprise Magazine, Dec.,
1998,(available from http://www.intelligententerprise.com)
[6] R. Winter, “SQL-99’s New OLAP Functions,” Intelligent Enterprise Magazine, Jan.,
2000,(available from http://www.intelligententerprise.com)
[7] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By,
Cross-Tab and Sub Totals,” Proceedings of International Conferences on Data Engineering,
p. 152 - 159, 1996
[8] Jim Gray et al., “Data Cube: A Relational Aggregation Operator Generalizing Group-By,
Cross-Tab and Sub Totals,” Data Mining and Knowledge Discovery Journal, Vol. 1, No. 1,
1997
[9] D. Chatziantoniou, K. Ross, “Querying Multiple Features in Relational Databases,”, Proc.
Of VLDB Conf., 1996
SQL/OLAP
ISO/IEC JTC1 SC32
27