Introduction to KDD for Tony's MI Course

Download Report

Transcript Introduction to KDD for Tony's MI Course

1
COMP 3503
Deductive Modeling with OLAP
with
Daniel L. Silver
2
Agenda
 What
is OLAP?
 OLAP, MOLAP and ROLAP
 OLAP Functionality
 Overview of IBM Cognos Insight
 OLAP Pros and Cons
3
What is OLAP?
4
On-Line Analytical Processing
OLAP
Term coined by E.F. Codd in a document
published in 1993 sponsored by Arbor
Software Corp (ESSBASE)
 In contrast to OLTP and traditional RDBMS
 Defined requirements for databases and tools
to implement decision support and business
intelligence systems.
 Has had a significant impact on the database
and business software market.

5
OLAP Definition

Online Analytical Processing = OLAP refers to
technology that allows users of multidimensional
databases to generate on-line descriptive or
comparative summaries ("views") of data and
other analytic queries.

OLAP facilities should be integrated into
enterprise-wide data base systems
•
•
allow analysts and managers to monitor the performance
of the business
e.g. –number of transactions / sales at different locations
by product class by time
Courtesy Anders Stjarne
6
Multidimensional Requirements
 Example:
Sales volume as a function of
product, time, and geography.
More than three dimensional
data cube is referred to as a
hypercube
Product
Dimensions: Product, Geography, Time
Measure: ‘Sales Volume’
Time
Courtesy Anders Stjarne
7
Deductive Modelling and Analysis
Comprehensive Sales Analysis
When?
Time
(1997)
Who?
Customers
(Channels)
What?
Product
(Type)
Where?
Location
(Region)
Result?
Indicator
(Revenue)
Combination 1
Quarter
Month
Type
Customer
Line
Brand
Number
Country
Branch
Sales Rep
Quantity
Cost
Margin
Type
Customer
Line
Brand
Number
Country
Branch
Sales Rep
Quantity
Cost
Margin
Combination 2
Quarter
Month
q
Courtesy Anders Stjarne
8
On-Line Analytical Processing
12 Rules of an OLAP Environment
by E.F. Codd







Multi-dimensional data-cubes or
hypercubes
Transparent access
Navigation aids
Consistent reporting
Client-sever based
Generic dimensionality
Efficient data storage





Multi-user support
Unrestricted crossdimensional operations
Intuitive data
manipulation
Flexible reporting
Unlimited levels of
aggregation
9
On-Line Analytical Processing
Strong connection to multi-dimensional
database (MDBMS) model  MOLAP
 Data-cubes are typically constructed off-line
due to time required to build indices
 Dimensions, values, and aggregations are
limited to that within data-cube
 On-line cube development has allowed
RDBMS vendors to survive as major players
in OLAP market  ROLAP

10
OLAP, MOLAP and ROLAP
11
OLAP Distributed Framework
OLAP functions are independent of:
• Front-end user interface
• Back-end data storage
Courtesy Anders Stjarne
12
MDBMS
 Relational
versus Dimensional Data
• http://www.youtube.com/watch?v=FjKaRU5V1
Rw
 ROLAP
= Representing dimensional
data with RDBMS
• Star Schema
o
http://www.dwreview.com/OLAP/Introduction_OLA
P.html
• More details:
o
o
http://www.youtube.com/watch?v=1Qdf5c_nmtw
http://www.ciobriefings.com/Publications/WhitePapers/Des
igningtheStarSchemaDatabase/tabid/101/Default.aspx
13
MOLAP vs. ROLAP
Multidimensional
•
•
•
•
•
•
•
difficulty handling sparcity
efficiently
direct representation of the data
“cube”
rapid drill down on summary
data
proprietary solutions
better performance response
does not scale well to handle
large amounts of detail
thin client, analytical
processing done on server
REF: White, “MOLAP vs ROLAP,” (B&A-15)
Courtesy Anders Stjarne
Relational
•
multidimensional view built on a
Relational DBMS
•
hampered by the limitations of SQL
•
handles sparcity automatically
•
stores summary and detail data equally
easily
•
easy to share common dimensions across
DWs
•
scales well using well-developed
relational technology
•
depends on efficient processing of STAR
joins and indexes
•
analytical processing done on the client
(or middle server)
14
OLAP Functionality
15
On-Line Analytical Processing
Deductive Modeling with OLAP
Model is developed within the users mind as
data is explored
 Verification or rejection is facilitated by
multi-dimensional functions which display
data numerically and graphically
 Best practices:

•
•
•
•
Determine suspected variable interaction
Verify/reject model through exploration
Drill-down to refine model
Maintain record of exploratory findings
16
On-Line Analytical Processing
Basic OLAP Functionality
Dimension selection - slice & dice
 Rotation - allows change in perspective
 Filtration -value range selection
 Hierarchies of aggregation levels
• drill-downs to lower levels
• roll-ups to higher levels
Tremendous tool for decision support and
executive information delivery and analysis

17
OLAP - Sample Operations

Roll up: summarize data
•

Roll down, drill down, drill through: go from
higher level summary to lower level summary or
detailed data
•

For a particular product category, find the detailed sales data
for each salesperson by date
Slice and dice: select and project
•

total sales volume last year by product category by region
Sales of beverages in the West over the last 6 months
Pivot or rotate: change visual dimensions
Courtesy Anders Stjarne
OLAP and Data Mining
 The
final results from OLAP exploration
can lead to inductive data mining
 Data Mining techniques can be applied
to the data views and summaries
generated by OLAP to provide more indepth and often more multidimensional
knowledge
 Data Mining techniques can be
considered analytic extension of OLAP
18
19
Multi-dimensional Cubes

A cube is a structure that stores data multi-dimensionally
and provides:
•
•

secure data access
fast retrieval of data.
Cubes can be distributed across a network or to
individual computers.
Time
Channels
Products
Customers
q
Sales Reps
Locations
20
Measures

The numeric (continuous) data that is collected
and stored by your organization.

The performance measures used to evaluate your
business.
Examples:
•
•
•
•
•
•
•
Revenue
Cost
Quantity sold
Units on-hand
Hours per Job
Number of calls
Defective units.
Basic
% #
Derived
Revenue - Cost = Profit Margin
q
21
Dimensions and Levels


Dimensions are a broad group of descriptive data
about the major aspects of your business.
Levels represent established hierarchy within
dimensions.
When?
What?
Where?
Date
Products
Locations
Years
Line
Region
Months
Type
Country
Days
Product
Branch
Dimensions
Levels
q
Courtesy Anders Stjarne
22
Levels and Categories
•A category is a data item that populates a level in a
dimension.
Dimension
Locations
Categories
Region
Europe
Country
United Kingdom
Levels
Branch
London, U.K.
Manchester, U.K.
q
Courtesy Anders Stjarne
23
Application Development Process
Sales Management Example
All Years
Years
Quarters
Months
2
8
24
National Sales
Force
All Products
State 4
City 16
Store
72
Technician
Business
Units
Product
Lines
158
Revenue
Units
Discounts
Quota
3
6
Brands
18
Products
125
Plan measures
and dimensions
Create the cube
Obtain the
required data
Develop the
MDBMS model
Explore the cube data
using Insight
q
Courtesy Anders Stjarne
24
Basic OLAP Operations
•
•
•
•
•
•
•
Selection (Filter) – within the range of a dimension
Scope – the range on a dimension
Slice – a two dimensional ‘page’ from the cube
Dice – chopping up along the dimensions
Drill down analysis - to the detail beneath summary data
Rollup/ Consolidate
Rotate (Pivot) – change dimension orientation
o
o
o
Swap rows and columns
Swap on or off
Change nesting order
• Reach Through – to the source data detail
• Calculations / Derivation formulas on the measured facts
o
o
Ratios, Rankings, etc.
E.g., NetSales = GrossSales – Cost; NetSales = GrossSales*(1 - Margin)
REFS: INMON, Building, Ch. 7, p. 243; White, “MOLAP vs ROLAP,” (B&A-15)
Courtesy Anders Stjarne
25
Advanced OLAP Operations

Trend analysis - over broad vistas of time
• handling time series data, time calculations

Key ratio indicator measurement and tracking

Comparisons - present to: past, plan, and others
• competitive market analysis

Problem monitoring - of variables within control
limits
Alerts and Event-Driven Agent Processing

Courtesy Anders Stjarne
26
OLAP Pros and Cons
27
On-Line Analytical Processing
Strengths of OLAP
Powerful visualization ability via GUI
 Fast, interactive response times
 Analysis of time series
 Deductive discovery of clusters/exceptions
 Many OLAP products available and
integrated to DB products

28
On-Line Analytical Processing
Weaknesses of OLAP
 Does
not handle continuous variables
 Does not automatically discover
patterns and models
 Generation of a complex hypercubes
require some training and experience
 Hypercube generation and update MOLAP Vs. ROLAP
29
On-Line Analytical Processing
Products and Suppliers

http://en.wikipedia.org/wiki/Comparison_
of_OLAP_Servers
30
Overview of
IBM Cognos Insight OLAP
Intro:http://www.youtube.com/watch?v=
ugczSGNVXlU
In
depth:http://www.youtube.com/watch
?v=bNw89HUHKEk
31
Tutorial
 IBM
Cognos Insight
32
THE END
[email protected]