2.JiaoDaCube

Download Report

Transcript 2.JiaoDaCube

Cube Intro
Decision Making
Effective decision making


Goal: Choice that moves an organization closer
to an agreed-on set of goals in a timely manner
Needs supports
Goals – needs to be specific

Not to hit an iceberg – for Titanic or other boats
Measures – needs to be concrete

Away from any visible iceberg
Feedback – needs to be timely and accurate


We hit an iceberg, rather than saw one – not timely enough
The boat can sink, actually, again too late to find out
BI – Business Intelligence
Is the delivery of accurate, useful
information to the appropriate decision
makers within necessary timeframe to
support effective decision making
It is a superset of what can be provided by
data mining
SQL Server 2014 is a decent tool for this
and support both Cubes/Data warehouse
and Data mining
OLAP/Cube
Term coined in mid-1990’s when database
people needed a new research direction
Main goal: support ad-hoc but complex
querying performed by business analysts
Interactive process of creating, managing,
analyzing and reporting on data
Extends spreadsheet-like analysis to work
with huge amounts of data in a data
warehouse
Known vs. Unknown Questions
When using SQL statements or even
Cubes, we know the question and can
formulate the SQL statements (most of the
time) to dig out answers to the questions


Layout let answers – SQL
Data-led answers – the drill down in a cube
dimension (if NW sales figure is low, which
state is the main contributor?)
When we do not know what to ask, we use
data mining – book’s definition is less
accurate
What should be the sources
OLTP is the source


On line transaction Process
These are the things we have learned in database
classes
But OLTP, the direct data store is not suitable for
BI because



Not to fight for computing resources of OLTP systems
OLTP may not have all the history readily available
BI may have to use data from several OLTP systems
The solution is generally Data Mart
Data marts
Data marts are related sets of data that
are grouped together and separated out
from the main body of data.
It can be a repository of OLTP data or
portion of data from a data warehouse
Data in a Data Mart are not meant for
supporting real time operations
Data in a Data mart are generally have
gone through the data clearing process
More on OLAP
Place key performance indicators
(measures) into context (dimensions)



measures are pre-aggregated
data retrieval is significantly faster
modeled in a DW
The processed “cube” is made available
to business analysts who can browse the
data using a variety of tools, making ad
hoc interactive and analytical processing
OLAP versus OLTP
OLAP versus OLTP
Measures and dimensions
Measures: key performance indicators, such as
sales amount, that you would like to evaluate



typically numerical, including volume, sales, and costs
a rule of thumb: if a number makes (business/social)
sense when aggregated, then it is a measure
examples:
aggregate daily volume to month, quarter and year
aggregating telephone numbers would not make sense
therefore, telephone numbers are not measures
postcode: not a measure, but can be a dimension (?)
Totals in all the transactions would be a measure
Number of high school students by school, county, state should
be a measure
Measures and dimensions
Dimensions: categories of data analysis
typical dimensions include product, time,
region


a rule of thumb: when a report is requested to
be ordered "by“ something, that something is
usually a dimension
e.g. in sales report: view sales by month, by
region, so the two dimensions needed are
time and region
Star Schema
Dimensions and measures are physically
represented by a star schema (typically)


arrange the dimension tables around a
central fact table that contains the measures
a fact table contains a column for each
measure as well as a column for each
dimension
Example
Dimension tables



Market (Store_ID, City, Region)
Product (Product_ID, Name, Category, Price)
Time (Time_ID, Week, Month, Quarter, Year)
Fact table

Sales (Store_ID, Product_ID, Time_ID, Amount)
Notes



Amount should be the measure
Time_ID is preprocessed to show at week or month level
This can be modified to show Snowflakes
Example
supermarket sales



roll-up: summaries
over all cities, regions
slice the cube to select
sales only in day 3
dice the cube to select
sales only in week 2
(days 8-14), and group
by regions
The TWO
Multi-dimensional model

What we meant mostly when referring a cube
or a data warehouse
Tabular model


Complicated in SQL Server
May be accomplished easily with other tools
such as vertica
SSMS and SSDT
SSDT (SQL Server Data Tool), used to be
the BI studio, is really the Visual Studio
2013
Give a demo