slides - UCLA Computer Science

Download Report

Transcript slides - UCLA Computer Science

OLAP Functions
Order-Dependent Aggregates and Windows in SQL:
SQL:2003---same as SQL:1999
Copyright: Silberschatz, Korth and
Sudarshan
1
Extended Aggregation: OLAP Functions
 Rank, row_number and other functions based on explicit order
 Many new statistical functions: see, e.g., DB2 UDB's High-Function
Business Intelligence in e-business---Red Book
http://www.redbooks.ibm.com/redbooks/SG246546.html
 VAR(X): variance, STDDEV(X): its square root
 COV(X,Y): covariance is themeasure of the linear association between two
variables.
 CORR(X,Y): correlation is normalized covariance. +1 max correlation, 0
none, -1 inverse
 Regression functions: Y = a X + b
 REGR_SLOPE (X) -> a
 REGR_INTERCEPT (X) -> b
More: Chi-squared testing, sampling
2
Database System Concepts 4th Edition
22.2
©Silberschatz, Korth and Sudarshan
OLAP Functions
 Windows on aggregates:
 Physical windows: based on the number of rows in the window
 Logical windows: based on the value span of the window.
3
Database System Concepts 4th Edition
22.3
©Silberschatz, Korth and Sudarshan
Ranking
 Ranking is done in conjunction with an order by specification.
Suppose we are given a relation student-marks(Name, marks)
which stores the marks obtained by each student. The following query
gives the rank of each student.
Select Name,
rank ( ) (order by (marks) desc) as s-rank,
dense_dank ( ) (order by (marks) desc) as s-rank
from student-marks
order by s-rank
Name
Marks
Rank
DenseRank
Tom
8
1
1
Jeff
7
2
2
Mary
7
2
2
Alex
6
4
3
4
Database System Concepts 4th Edition
22.4
©Silberschatz, Korth and Sudarshan
Partition By
 Ranking can be done within partition of the data. The following query then
gives the rank of students within each section:
student-marks(student-id, marks);
student- section(student-id, section)
select student-id, section,
rank( ) over (partition by section order by marks desc)
as sec-rank
from student-marks, student-section
where student-marks.student-id = student-section.student-id
order by section, sec-rank
 dense_rank(): no holes after ties
 row_number(): provide row numbering given a specific partitioning and
ordering of rows.
5
Database System Concepts 4th Edition
22.5
©Silberschatz, Korth and Sudarshan
Ranking (Cont.)
 For a given constant n, the ranking the function ntile(n) takes the tuples
in each partition in the specified order, and divides them into n buckets
with equal numbers of tuples. For instance, we sort employees by
salary, and use ntile(3) to find which range (bottom third, middle third, or
top third) each employee is in, and compute the total salary earned by
employees in each range:
select threetile, sum(salary)
from (
select salary, ntile(3)
over (order by (salary) as threetile
from employee) as s
group by threetile
 SQL permits the user to specify where they should occur by using nulls
first or nulls last, for instance
select student-id,
rank ( ) over (order by marks desc nulls last) as s-rank
from student-marks
6
Database System Concepts 4th Edition
22.6
©Silberschatz, Korth and Sudarshan
Aggregates on Windows

An example of window query is that, given sales values for each date, calculates
for each date the average of the sales on that day, the previous day, and the next
day; such moving average queries are used to smooth out random variations.

In contrast to group by, the same tuple can exist in multiple windows. Suppose
we are given a relation transaction(account-number, date-time, value),
select account-number, date-time,
sum(value) over (partition by account-number
order by date-time
range unbounded preceding)
 This query returns a new sum for each new tuple—cumulative sum!
 An actual window can also be specified: e.g.
range 10 rows preceding
range between 10 rows preceding and 4 following
range 30 minutes preceding
range interval 30 minutes preceding and current row
7
Database System Concepts 4th Edition
22.7
©Silberschatz, Korth and Sudarshan