DATC01 - Business Intelligence For The Relational Expert

Download Report

Transcript DATC01 - Business Intelligence For The Relational Expert

DATC01
Business Intelligence For
The Relational Expert
Len Wyatt And Dave Wickert
[email protected]
[email protected]
Program Managers
Microsoft Corporation
Overview
BI Users and Applications
BI Technology Overview
…And a little terminology
Dimensional Modeling
Intro to ETL and Data Transformation Svcs.
Intro to OLAP and Analysis Services
Intro to Reporting Services
Intro to Client Tool Options
Key BI Challenges
Getting more information
BI Users
Business users
Don’t want to formulate queries
Clear, graphical output
Need model that matches business
concepts
Usage models
Reports (in many forms!)
Interactive exploration
Portals
Multidimensional Analysis
Cube concept
Edges are
dimensions
Groceries
Electronics
Product
Clothing
North
South
Garden
East
West
Automotive
Q1
Q2
Q3
Time
Q4
Geog
Dimension Hierarchies
Time dimension
“All” level g
All
Time
Year level g
2003
Quarter level g
Month level g
Day level g
Jan
2004
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
Feb
Mar
Oct
Nov
Dec
Jul
Aug
Sep
1
2
3
4
5
6
7
Product dimension levels:
Group, Subgroup, Family, Brand, Item, UPC
…
Business Analysis with
Drilldown, Slice And Pivot
OLTP Versus Business Intelligence
Who asks what?
OLTP Questions
Analysis Questions
When did that order
What factors affect order
ship?
processing time?
How many units are in
How did each product
inventory?
line (or product)
contribute to profit last
Does this customer
quarter?
have
unpaid bills?
Which products have the
lowest Gross Margin?
Are any of customer
X’s line items on
What is the value of items
backorder?
on backorder, and is it
trending up or down
over time?
What Is A DW Made From?
Data Marts
and cubes
Source
Systems
Relational
Data
Store
Reports
Clients
DW = ETL + Relational Store + OLAP + Reporting + Clients
+ Metadata ?
+ Data Marts ?
+ Data Mining ?
Compare and contrast…
“Relational BI” and “OLAP”
“Data warehouse” and “Data mart”
Subject orientation…
Warehouse == Multi-subject
Mart == Single-subject
Staging area
Warehouse: Staging area (back room)
Most complete, stable data
Mart: Supports user queries
Analysis Services is more than a data mart
Query engine for dimensional data
Platform for analytical applications
Microsoft BI Product Stack
Solution Accelerators
Excel
OWC
Data Analyzer
SharePoint
Portal
Server
SQL Server
Relational Engine and DTS (ETL)
Reporting Services
Analysis Services
OLAP and Data Mining
Management
Tools
Dev Tools
Visual Studio .Net
SSABI, Business Scorecards
and Excel Reporting
Open, integrated, and based on
industry standards
Dimensional Modeling
As implemented in a RDBMS
Data is presented in a standard, intuitive
framework that allows for high-performance
access
The measurement data is organized in a
single table with a multipart key, called the
fact table, and a set of smaller tables called
dimension tables
The other non-FK fields in the fact table are
numeric, additive measures, e.g., sales $,
counts, etc.
Dimensions are then mostly all descriptive,
text fields
Star And Snowflake Schemas
Star
Snowflake
• Above the line, FKs
• Below are measures
Star And
Snowflake Schemas
Star = denormalized hierarchies into a
single dimension table
Snowflake = normalized
(FK constraint) tables
Structures are always one-to-many up
the hierarchy – they can be regular,
ragged or parent-child, but always
one-to-many
Facilitates rollups; drill down, etc.
Why Use A Dimensional Model?
Clear, predictable model which allows
reporting and query tools to make
assumptions around
How data is presented, grouped, and
navigated (i.e., slice ‘n pivot)
Browse dimensions (and hierarchies)
independent of numeric data
Users are not locked into one data access
route – drilldown and pivot
Well-understood semantics for
high-performance, i.e., aggregation
Multidimensional Analysis
with Drilldown, Slice And
Pivot
Slice ‘n Dice Analysis
The underlying
MDX . . .
Slice ‘n Dice Relationally
Equivalent
TSQL if
done
relationally…
Slice ‘n Dice Relationally
To implement declarative MDX in a
relational system becomes:
SUMs over GROUP BYs
Exponential number of
possible combinations
Should be materialized
for performance
Thus we need a specialized
runtime engine . . .
Data Transformation Services
Intro to ETL
ETL = Extract, Transform, Load
Moving data from production systems
to DW
Checking data integrity
Assigning surrogate key values
Collecting data from disparate systems
Reorganizing data
Data Transformation Services
Data Transformation Services
Key Uses
Populating Data Warehouses from production systems
Grocery store chain uploads all store transactions daily, and
summarizes in a data warehouse used by HQ and store
managers
Collecting data from disparate systems
Airport security system gets a regular feed of employee lists
from airline HR systems
Reorganizing data
Web server logs are filtered, surrogate keys are assigned, and
data is aggregated for later analysis
Synthesizing data
Financial positions are valued each night, with suspect records
spooled to a special location for troubleshooting the next day
Analysis Services
Data organized as a dimensional model,
e.g. cubes, dimensions, etc.
Direct support for slice ‘n dice, drilldown
and pivot
Aggregates are pre-calculated for fast query response
DBAs can control what aggregates are calculated
(trade-off of ‘processing time’ verses ‘query time’)
Fact table is indexed for fast retrieval, if needed
Built-in support for calculated measures,
e.g. Profit = ([Sales] – [Cost])
Built-in statistical functions, e.g. std dev, trends,
period comparisons, top count, etc.
Standard query interface allows open client tool
market
Analysis Services
Key Idea: Aggregations
Subtotals at a certain level from every dimension
Highest Level Aggregation
Customers
Product
All Customers
Country
State
City
Name
All Products
Category
Intermediate Aggregation
Brand
countryCode
productID
Units Sold
Item
Can
sd452
9456
SKU
US
yu678
4623
Customer
Product
Units Sold
Sales
All
All
347814123
$345,212,301.30
Sales
$23,914.30
$57,931.45
…
Facts
custID
SKU
Units Sold
Sales
345-23
135123
2
$45.67
563-01
451236
34
$67.32
…
Analysis Services
Contrasting OLAP to relational
Analysis
Services
Aggregation
Designing
aggregations
Optimizing
queries
Query
language
Intrinsic to AS
Relational DW
Materialized view with
GROUP BY clauses
Wizards for simple
Analyze which aggs
design process
best support query
pattern
Run-time query
Query Analyzer +
engine selects
aggregate navigator
nearest aggregations (or very sophisticated
query tools)
MDX
SQL
About MDX…
You need to learn MDX to use AS
The query language
For expressing calculations and security
It looks a little like SQL, but don’t be fooled!
MDX
select
{[Measures].[Store Sales]} on columns,
{[Time].[1997].Children} on rows
from Sales
SQL
select time.quarter, sum(fact.store_sales)
from sales_fact_1997 fact,
time_by_day time
where fact.time_id = time.time_id and
time.the_year = '1997'
group by time.quarter
Reporting Services
Multiple report delivery targets
Common formats: HTML, PDF, .XLS, etc.
Report definition is separated from rendering: can
re-render without re-running report
Scheduling / Subscriptions / Snapshots
Parameterized reports, linked reports,
sub-reports, graphics, charts, bursting
Extensible and embeddable
Report Rendering
HTML
PDF
Excel
Report Authoring
Report developers can create reports to be published to the Report
Server using Microsoft or 3rd party design tools that support the
Reporting Services XML report definition format.
Authoring
Management
Execution & Delivery
End
Users
Report
Definition
Managed
Report
Generated
Report
Delivery
Targets
Report Management
Report definitions, folders, and resources are published and
managed in a SQL Server database. Reporting Services provides
Web Service APIs and graphical tools for report management.
Authoring
Management
Execution & Delivery
End
Users
Report
Definition
Managed
Report
Generated
Report
Delivery
Targets
Report Execution
And Delivery
Execute on-demand or via schedule
Queries are executed and the result sets are merged with the report layout
On-demand “pull” or server-based “push” delivery
View reports from the included web-based front-end or subscribe to
reports for delivery
Authoring
Management
Execution & Delivery
End
Users
Report
Definition
Managed
Report
Generated
Report
Delivery
Targets
Client Tool Options
Reporting Renderers
HTML, PDF, Excel, etc.
Exploration
MS Excel, OWC, MS Excel Accelerator, MS
MapPoint, MS Data Analyzer
Proclarity, Panorama, and other 3rd party OLAP
query tools
Portal / KPI
MS SharePoint Portal Server, MS Balanced
Scorecard Accelerator, and other web-based
tools
Exploration Tool
Proclarity (as an example)
Portal / KPI
SharePoint Portal Server
Office Solution Accelerator for Business Scorecards (beta)
Geospatial Exploration
MapPoint
Tie BI data to map
display
Connect directly to
OLAP data from
MapPoint (via add-in)
http://msdn.microsoft.com/downloads/sample.asp?url=/MSDNFILES/027/001/871/msdncompositedoc.xml
Client Selection Criteria
Web deployment?
Analysis Services functionality used?
Existing BI infrastructure?
Users?
Executives, Analysts, Management, Front
line
Visualization Needs
Data Mining Needs
Reporting Needs
Cost
What We’ve Covered…
Data Marts
and cubes
Source
Systems
Relational
Data
Store
Reports
Clients
Key BI Challenges: Data
Changing data
“Changing dimensions” issue
Surrogate keys
Rewriting history versus maintaining history
Restatements
Try not to alter facts; Use journal approach
What if you have to re-do a month of data?
Data lineage
Where does this number come from?
Data cleansing
Data normalization
Across various source systems
Surrogate keys help here too
Key BI Challenges: Scale
Often there are large data volumes in a DW
Must manage multiple technologies at scale!
Be aware of calculation complexity at scale
 Watch out for these in any large system; Might
work in the POC but not at scale
Complex calculations (“wide queries”)
Median calculations are simple example
Distinct counts
Very large dimensions
CrossJoins on large dims
Airline origins and destinations, for example
Summary
What BI is
Why it is important
What infrastructure you will need and
its technology components
Key challenges
More Information
Technical:
“The Data Warehouse Toolkit” by Ralph Kimball, and
“The Data Warehouse Lifecycle Toolkit”
http://www.ralphkimball.com/html/books.html
Business:
“Business Intelligence” by Elizabeth Vitt, et. al.
http://www.microsoft.com/MSPress/books/5735.asp
SQL Server 2000 Books Online is a great resource
White papers, tools, etc. on Microsoft BI technologies
http://www.microsoft.com/sql/evaluation/bi/
Please fill out a session evaluation on CommNet
Q1: Overall satisfaction with the session
Q2: Usefulness of the information
Q3: Presenter’s knowledge of the subject
Q4: Presenter’s presentation skills
Q5: Effectiveness of the presentation
© 2004 Microsoft Corporation. All rights reserved.
This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.