Transcript item-name

SQL:1999 Advanced Querying
 Decision-Support Systems
 Data Warehousing
 Data Analysis and OLAP
This also apply to SQL: 2003 that is just a minor revision of
SQL:1999
1
Database System Concepts 4th Edition
22.1
©Silberschatz, Korth and Sudarshan
Decision Support Systems
 Decision-Support systems are used to make business
decisions often based on data collected by on-line transactionprocessing systems.
 Examples of business decisions:
 what items to stock?
 What insurance premium to change?
 Who to send advertisements to?
 Examples of data used for making decisions

Retail sales transaction details

Customer profiles (income, age, sex, etc.)
2
Database System Concepts 4th Edition
22.2
©Silberschatz, Korth and Sudarshan
Decision-Support Systems: Overview
 A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site.
 Important for large businesses which generate data from multiple
divisions, possibly at multiple sites
 Data may also be purchased externally
 Data analysis tasks are simplified by specialized tools and SQL
extensions
 Example tasks
 For each product category and each region, what were the total sales
in the last quarter and how do they compare with the same quarter
last year
 As above, for each product category and each customer category
 Statistical analysis packages (e.g., : S++) can be interfaced with
databases
 Partial support through OLAP Functions
 Data mining seeks to discover knowledge automatically in the form
of statistical rules and patterns from Large databases.
3
Database System Concepts 4th Edition
22.3
©Silberschatz, Korth and Sudarshan
Data Warehousing
 A data warehouse is a repository of information gathered
from multiple sources.
4
Database System Concepts 4th Edition
22.4
©Silberschatz, Korth and Sudarshan
Data Warehousing (Cont.)
 Provides a single consolidated interface to data
 Data stored for an extended period, providing access to
historical data
 Data/updates are periodically downloaded form online
transaction processing (OLTP) systems.
 Typically, download happens each night.
 Data may not be completely up-to-date, but is recent enough for
analysis.
 Running large queries at the warehouse ensures that OLTP
systems are not affected by the decision-support workload.
5
Database System Concepts 4th Edition
22.5
©Silberschatz, Korth and Sudarshan
Issues in Building a Warehouse
 When and how to gather data.
 Source driven: data source initiates data transfer
 Destination driven: warehouse initiates data transfer
 What schema to use.
 Schema integration
 Cleaning and conversion of incoming data
 What data to summarize.
 Raw data may be too large to store on-line
 Aggregate values (totals/subtotals) often suffice
 Queries on raw data can often be transformed by query optimizer to use
aggregate values
 How to propagate updates.
 Date at warehouse is a view on source data
 Efficient view maintenance techniques required
6
Database System Concepts 4th Edition
22.6
©Silberschatz, Korth and Sudarshan
Data-Warehouse Architecture
7
Database System Concepts 4th Edition
22.7
©Silberschatz, Korth and Sudarshan
Star Schema For A Data Warehouse
8
Database System Concepts 4th Edition
22.8
©Silberschatz, Korth and Sudarshan
Online Analytical Processing
 The operation of changing the dimensions used in a cross-tab is
called pivoting.
 An OLAP system provides other functionality as well. For
instance, the analyst may wish to see a cross-tab on item-name
and color for a fixed value of size, for example, large, instead of
the sum across all sizes. Such an operation is referred to as
slicing. The operation is sometimes called dicing, particularly
when values for multiple dimensions are fixed.
 The operation of moving from finer-granularity data to a coarser
granularity is called a rollup.
 The opposite operation - that of moving from coarser-granularity
data to finer-granularity data – is called a drill down.
9
Database System Concepts 4th Edition
22.9
©Silberschatz, Korth and Sudarshan
Relational Representation of the Data in
Figure 22.1
10
Database System Concepts 4th Edition
22.10
©Silberschatz, Korth and Sudarshan
Cross Tabulation of sales by item-name
and color
 The table above is an example of a cross-tabulation(or cross-
tab) also referred to as a pivot-table. In general, a cross-table is
a table where values for one attribute form the row headers,
values for another attribute form the column headers, and the
values in an individual cell are derived as follows.
 A cross tab with summary rows/columns can be represented by
introducing a special value all to represent subtotals.
Database System Concepts 4th Edition
22.11
11
©Silberschatz, Korth and Sudarshan
Hierarchies on Dimensions
12
Database System Concepts 4th Edition
22.12
©Silberschatz, Korth and Sudarshan
Cross Tabulation of sales With Hierarchy
on item-name
13
Database System Concepts 4th Edition
22.13
©Silberschatz, Korth and Sudarshan
Three-Dimensional Data Cube
14
Database System Concepts 4th Edition
22.14
©Silberschatz, Korth and Sudarshan
OLAP Implementation
 The earliest OLAP systems used multidimensional arrays in
memory to store data cubes, and are referred to as
mutidimensional OLAP (MOLAP) systems.
 Hybrid systems, which store some summaries in memory and
store the base data and other summaries in a relational
database, are called hybrid OLAP (HOLAP) systems.
15
Database System Concepts 4th Edition
22.15
©Silberschatz, Korth and Sudarshan
Data Analysis (Cont.)
 Rollup: Moving from finer-granularity data to a coarser
granularity by means of aggregation.
 Drill down: Moving from coarser-granularity data finer-
granularity data.
 Proposed extensions to SQL, such as the cube operation
help to support generation of summary data
 The following query generates the previous table.
select color, size, sum (number)
from sales
groupby color, size with cube
16
Database System Concepts 4th Edition
22.16
©Silberschatz, Korth and Sudarshan
Data Analysis (Cont.)
 Figure shows the combinations of dimensions size, color, price
 In general computing cube operation with n groupby columns gives
2nd different groupby combinations.
17
Database System Concepts 4th Edition
22.17
©Silberschatz, Korth and Sudarshan
SQL:1999 Extended GroupBy
 also supports generalizations of the group by constructs, using the cube and
rollup constructs. A representative use of the cube construct is;
select item-name, color, size, sum(number)
from sales
group by cube(item-name, color, size)
 This query computes the union of eight different groupings of the
sales relation:
{ (item-name, color, size), (item-name, color), (item-name, size),
(color, size), (item-name), (color), (size), ( ) }
Where ( ) denotes an empty group by list.
 For each grouping, the result contains the null value for attributes not
present in the grouping. For instance, with occurrences of all replaced by
null, can be computed by the query
select item-name, color, sum(number)
from sales
group by cube(item-name, color)
18
Database System Concepts 4th Edition
22.18
©Silberschatz, Korth and Sudarshan
Extended Group By (Cont.)
 A representative rollup construct is
select item-name, color, size, sum(number)
from sales
group by rollup(item-name, color, size)
 Here only four grouping are generated:
{ (item-name, color, size), (item-name, color), (item-name), ( ) }
 Rollup can be used to generate aggregates at multiple levels of a
hierarchy on a column. For instance, we have a table itemcategory(itemname, category) giving the category of each item. Then the query
select category, item-name, sum(number)
from sales, category
where sales.item-name = itemcategory.item-name
group by rollup(category, item-name)
would give a hierarchical summary by item-name and by category.
19
Database System Concepts 4th Edition
22.19
©Silberschatz, Korth and Sudarshan
Extended Group By (Cont.)
 Multiple rollups and cubes can be used in a single group by clause.For
instances, the following query
select item-name, color, size, sum(number)
from sales
group by rollup(item-name), rollup(color, size)
generates the groupings
{ (item-name, color, size), (item-name, color), (item-name), (color, size),
(color), ( ) }
 The function grouping can be applied on an attribute; it returns 1 if the value
is a null value representing all, and returns 0 in all other cases. Consider the
following query:
select item-name, color, size, sum(number),
grouping(item-name) as item-name-flag,
grouping(color) as color-flag,
grouping(size) as size-flag,
from sales
group by cube(item-name, color, size)
20
Database System Concepts 4th Edition
22.20
©Silberschatz, Korth and Sudarshan
Extended Aggregation: OLAP Functions
 Rank, row_number and other functions based on explicit order
 Many new statistical functions: see, e.g., DB2 UDB's High-Function
Business Intelligence in e-business---Red Book
http://www.redbooks.ibm.com/redbooks/SG246546.html
 STDDEV, etc.
 CORRELATION,
 COVARIANCE.
 Regression functions: Y = a X + b
 REGR_SLOPE -> a
 REGR_INTERCEPT -> b
 Windows on aggregates:
 Physical windows: based on the number of rows in the window
 Logical windows: based on the value span of the window.
21
Database System Concepts 4th Edition
22.21
©Silberschatz, Korth and Sudarshan
Ranking
 Ranking is done in conjunction with an order by specification. Suppose we
are given a relation student-marks(student-id, marks) which stores the
marks obtained by each student. The following query gives the rank of each
student.
select student-id,
rank( ) over (order by (marks) desc) as s-rank
from student-marks
 An extra order by clause is needed to get them in sorted order, as shown
below.
select student-id,
rank ( ) (order by (marks) desc) as s-rank
from student-marks
order by s-rank
22
Database System Concepts 4th Edition
22.22
©Silberschatz, Korth and Sudarshan
Partition By
 Ranking can be done within partition of the data. The following query then
gives the rank of students within each section:
student-marks(student-id, marks);
student- section(student-id, section)
select student-id, section,
rank( ) over (partition by section order by marks desc)
as sec-rank
from student-marks, student-section
where student-marks.student-id = student-section.student-id
order by section, sec-rank
 dense_rank(): no holes after ties
 row_number(): provide row numbering given a specific partitioning and
ordering of rows.
23
Database System Concepts 4th Edition
22.23
©Silberschatz, Korth and Sudarshan
Ranking (Cont.)
 For a given constant n, the ranking the function ntile(n) takes the tuples
in each partition in the specified order, and divides them into n buckets
with qual numbers of tuples. For instance, we an sort employees by
salary, and use ntile(3) to find which range (bottom third, middle third, or
top third) each employee is in, and compute the total salary earned by
employees in each range:
select threetile, sum(salary)
from (
select salary, ntile(3)
over (order by (salary) as threetile
from employee) as s
group by threetile
 SQL:1999 permits the user to specify where they should occur by using
nulls first or nulls last, for instance
select student-id,
rank ( ) over (order by marks desc nulls last) as s-rank
from student-marks
24
Database System Concepts 4th Edition
22.24
©Silberschatz, Korth and Sudarshan
Aggregates on Windows

An example of window query is that, given sales values for each date, calculates
for each date the average of the sales on that day, the previous day, and the next
day; such moving average queries are used to smooth out random variations.

In contrast to group by, the same tuple can exist in multiple windows. Suppose
we are given a relation transaction(account-number, date-time, value),
select account-number, date-time,
sum(value) over (partition by account-number
order by date-time
range unbounded preceding)
 This query returns a new sum for each new tuple—cumulative sum!
 An actual window can also be specified: e.g.
range 10 rows preceding
range 30 minutes preceding
25
Database System Concepts 4th Edition
22.25
©Silberschatz, Korth and Sudarshan