Online Analytical Processing (OLAP)

Download Report

Transcript Online Analytical Processing (OLAP)

Online Analytical
Processing (OLAP)
Hweichao Lu
CS157B-02 Spring 2007
What is OLAP
 Basic idea: converting data into
information that decision makers need
 Concept to analyze data by multiple
dimension in a structure called data cube
History
 In 1993, E. F. Codd came up with the
term online analytical processing (OLAP)
and proposed 12 criteria to define an
OLAP database
 the term OLAP seems perfect to describe
databases designed to facilitate decision
making (analysis) in an organization
Purpose of OLAP
 To derive summarized information from
large volume database
 To generate automated reports for
human view
Why need OLAP over
Relational Database I
 Consistently fast response
 OLAP obtains a consistently fast
response is by prestoring calculated
values
Why need OLAP over
Relational Database II
 Metadata-based queries
 provide analysis functions that are
difficult or impossible to express in SQL
 SQL was developed primarily for
transaction systems, not for reporting
applications
Why need OLAP over
Relational Database III
 Spreadsheet-style formulas
 design the data structure with users in
mind.
 Spreadsheets are key components of
business management because they are
intuitive to create
Step I
1. identify multidimensional data

measure attribute
(measure some value, can be
aggregated upon)
 dimension attribute
(define the dimension and summary of
measure attribute)
(Cont.)
 Each dimension is typically expressed as
a “hierarchy”
 Hierarchy: Analyst is interested in
different level of detail of a dimension
Step II
2. Analyze multidimensional data into
cross-tabulation
row header: value for one attribute
column header: value for another attr.
individual cell: value aggregation
Step III
3. Visualize n-dimensional cube - data
cube
the word CUBE describe what in the
relational world would be the integration
of the fact table with dimension tables
Step IV
 After you design the cube, you will use
the cube's structure to build a relational
database (known as a star schema) to
house the data for the cube
Step V
 Once you load data into the relational
database, and then into the cube, you'll
be able to see how attributes, dimensions,
measures, and measure groups fit
together within a cube to create a
powerful analytical tool.
Star Schema
 Cubes are easily stored in relational databases,
using a denormalized data structure called the
star schema, developed by Ralph Kimball
 starts with a central fact table
 Each row in the central fact table contains
some combination of keys that makes it unique.
These keys are called dimensions.
Slicing & Dicing
 Additional Functionality that can be
thought of as viewing a slice of the data
cube, particularly when values for
multiple dimensions are fixed.
 Slicing/Dicing simply consists of selecting
specific values for these attributes, which
are then displayed on top of the cross-tab
Rollup & Drill-down
 OLAP permit users to view data at ay
desired level of granularity.
 Rollup: moving from finer-granularity data
to coarser granularity
 Drill-down: opposite to Rollup
OLAP Inplementation
 Multidimensional OLAP (MOLAP)
 Relational OLAP (ROLAP)
 Hybrid OLAP (HOLAP)
MOLAP
 The database is stored in a special, usually
proprietary, structure that is optimized for
multidimensional analysis.
 + : very fast query response time because data
is mostly pre-calculated
 -: practical limit on the size because the time
taken to calculate the database and the space
required to hold these pre-calculated values
ROLAP
 The database is a standard relational database
and the database model is a multidimensional
model, often referred to as a star or snowflake
model or schema.
 +: more scalable solution
 -: performance of the queries will be largely
governed by the complexity of the SQL and the
number and size of the tables being joined in
the query
HOLAP
 a hybrid of ROLAP and MOLAP
 can be thought of as a virtual database
whereby the higher levels of the
database are implemented as MOLAP
and the lower levels of the database as
ROLAP
DOLAP
 The previous terms are used to refer to
server based OLAP technologies
 DOLAP (Desktop OLAP)
 DOLAP enables users to quickly pull
together small cubes that run on their
desktops or laptops
Conclusion
 OLAP is a significant improvement over
query systems
 OLAP is an interactive system to show
different summaries of multidimensional
data by interactively selecting the
attributes in a multidimensional data cube
References
 IBM Redbooks. DB2 Cube Views: A Primer. Durham, NC,
USA: IBM, 2003. ebrary collections. San Jose State
University.
<http://site.ebrary.com/lib/sjsu/Doc?id=10113016&ppg=43>
 Jacobson, Reed, Microsoft® SQL Server™ 2005 Analysis
Services Step by Step. Microsoft Press.
 Berry, Michael J. A. Data Mining Techniques : For
Marketing, Sales, and Customer Relationship Management.
Hoboken, NJ, USA: John Wiley & Sons, Incorporated, 2004.
ebrary collections. San Jose State University.
<http://site.ebrary.com/lib/sjsu/Doc?id=10114278&ppg=522>
.