Overview of Business Intelligence and Data Warehousing

Download Report

Transcript Overview of Business Intelligence and Data Warehousing

Overview of Business Intelligence
and OLAP
6/13/2005
M
D
Metadata Solutions
Dan McCreary
President
Dan McCreary & Associates
[email protected]
(952) 931-9198
Objective
• Get an overview of Business Intelligence
(BI) and Data Warehouse (DW) concepts
and terminology
• Understand how BI and DW technologies
help users make better informed decisions
• Understand the constraints of BI and DW
M
D
Copyright 2007 Dan McCreary & Associates
2 of N
BI History and Definition
• The term "BI" coined by Gartner Group in
the late 1980s
• A user-centered process that includes
–
–
–
–
M
accessing
exploring
analyzing data
developing insights and understanding
• Leads to improved and informed decision
making
D
Copyright 2007 Dan McCreary & Associates
3 of N
Overlapping Terminology
Statistical Analysis
Data Mining
Business
Semantics
Data Dictionary
Transaction
Processing
(OLTP)
Data
Warehousing
Pattern
Discovery
Business
Intelligence
Indexing
Aggregates
Data Storage
(RDBMS)
OLAP
Dimensional
Analysis
M
D
Copyright 2007 Dan McCreary & Associates
4 of N
Definition of Data Warehouse
A data warehouse is a copy of transaction
data specifically structured for query and
analysis.
Ralph Kimball
M
D
Copyright 2007 Dan McCreary & Associates
5 of N
Definition of Data Warehouse
Subject-Oriented, Integrated, Time-Variant,
Nonvolatile collection of data in support of
decision making.
Bill Inmon
M
D
Copyright 2007 Dan McCreary & Associates
6 of N
Broad vs Precise
Broad
All of our old data
Precise
An integrated computer system that provides easy
browseable access by subject to non-technical
users to all data records of an organization using
consistent data definitions over time and across
organizations.
M
D
Copyright 2007 Dan McCreary & Associates
7 of N
DW vs OLTP?
• OLAP uses a "subject oriented" dimensional data
model
• OLAP uses consistent data definitions across time
and organizations. These definitions are stored in
a metadata registry
• OLAP uses contains publishable data from
potentially multiple sources
• OLAP works with reporting tools that prevent a
user from ever having to learn data query
languages such as SQL and MDX
M
D
Copyright 2007 Dan McCreary & Associates
8 of N
Data Mining
• Data mining is the semi-automatic
discovery of patterns, associations,
anomalies, structures, and changes in large
data sets
M
D
Copyright 2007 Dan McCreary & Associates
9 of N
BI Focuses on Quantitative Data
Qualitative data
Unstructured documents
Semi-structured
data
Quantitative data
Relational data
• BI focuses on the analysis of quantitative
data:
- highly structured tabular data found in relational
databases
M
D
Copyright 2007 Dan McCreary & Associates
10 of N
The BI Iterative Process
Access
Data Warehouse
Analysis
BI Project
Management
Publishing, Change, Data
Gap Analysis, New Data
Gathered
Insights, Conclusions and
Findings
• The BI process in an on-going iterative process where the
structure of the data warehouse changes based on what
data is critical to an organizations business objectives.
M
D
Copyright 2007 Dan McCreary & Associates
11 of N
BI Evolution
Increasing Responsiveness
Monthly Green Bar Reports
•
•
•
•
Browseable
Graphical Interface
Shorten the time-to-report interval
Allow users to "browse" data sets interactively
Remove programmers with "backlogs" of reports
Users frequently waited days, weeks for months to get a custom report
created
M
D
Copyright 2007 Dan McCreary & Associates
12 of N
Dimensions of BI
Technical Sophistication
Required
Low
(analysts)
Degree of
End User Control
High
(programmers)
Few Dimensions
few parameters, few filters
M
D
Copyright 2007 Dan McCreary & Associates
Many Dimensions
many variables
13 of N
BI Platform
• Complete sets of tools for the creation,
deployment, support and maintenance of BI
applications.
• Data-rich applications with custom end-user
interfaces
• Organized around specific business
problems and with targeted analyses and
models.
M
D
Copyright 2007 Dan McCreary & Associates
14 of N
Multi-Dimensional Databases (MDDB)
• Constructed specifically to support the
analysis of quantitative data, along multiple
dimensions.
• OLAP technology enables users to organize
and view the data in a hierarchical fashion
in multiple hierarchies
M
D
Copyright 2007 Dan McCreary & Associates
15 of N
Two Types of Data
• Categorical Data
• Measurement Data
M
D
Copyright 2007 Dan McCreary & Associates
16 of N
Categorical Variables
• Whenever we decide to break the continuous
observable world into a predefined list of
categories when each category has a label we call
this a categorical value. These will then become
the "dimensions" of our cube.
"red"
"green"
"blue"
Note: NO OVERLAP!
Statisticians call this type of "categorical data" and it requires the categories to be non-overlapping.
M
D
Copyright 2007 Dan McCreary & Associates
17 of N
Measures
• Something that you can do math on.
+
- X
%
/
sum
average
M
D
Copyright 2007 Dan McCreary & Associates
18 of N
The Star Schema
Dim5
PK
Cat1
Cat2
Cat3
Dim4
Dim1
PK
PK
Cat1
Cat2
Cat3
Dim3
M
D
Facts
Primary Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Measure1
Measure1
Cat1
Cat2
Cat3
Dim2
PK
PK
Cat1
Cat1
Cat2
Cat2
Cat3
Copyright 2007 Dan McCreary & Associates
Cat3
19 of N
Measure
Category
Category
M
D
Categories are on the X and Y axis
Measures are the heights of the bars
Copyright 2007 Dan McCreary & Associates
20 of N
Data Aggregation
M
• Data is aggregated rather than detailed (as is
the source data)
• Data Aggregations are sometimes
precompiled in the database and sometimes
on the fly, to produce the desired views and
reports quickly (typically under 1 second)
• Cube designers can use more disk space to
create more aggregations if faster
performance is required
D
Copyright 2007 Dan McCreary & Associates
21 of N
Level
• A layer of "aggregation" within a single
dimension – categorization of properties
All Shapes
Shapes
With Curves
Levels
Circle
Heart
Shapes
Without Curves
Moon
Square Trapezoid
Star
Diamond
M
D
Copyright 2007 Dan McCreary & Associates
22 of N
The Time Dimension
• Most applications involve a time dimension,
so that data can be analyzed over time to
uncover trends.
• The use of the Time Dimension must be
created consistently across many cubes.
M
D
Copyright 2007 Dan McCreary & Associates
23 of N
Dimension Table for Organization
Organization Dimension Table
OrganizationID (PK)
SchoolYearCode
OrganizationType
OrganizationCountyCode
OrganizationCityCode
OrganizationTelephoneAreaCode
OrganizationZipCode
OrganizationRegionCode
OrganizationEconomicRegionCode
OrganizationLegSenateCode
OrganizationLegRepCode
OrganizationCoopCode
OrganizationRegionMgmtCode
OrganizationGradeRangeServed
M
D
Copyright 2007 Dan McCreary & Associates
24 of N
Conformed Dimensions
Facts
Primary Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Measure1
Measure1
Dim1
PK
Cat1
Cat2
Cat3
Facts
Primary Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Foreign Key
Measure1
Measure1
• Allows a report to be created that pulls data from
two different data marts
• Requires that both data marts use a common
"conformed dimension"
M
D
Copyright 2007 Dan McCreary & Associates
25 of N
Sample of National Conformed
Dimensions
Process
Student Attendance
Student Assessment
District Financial Reporting
School and District Status
Teacher Licensing
School Food and Nutrition
Student Disciplinary Reporting
Student Safety Reporting
District Technology Planning
M
D
Copyright 2007 Dan McCreary & Associates
26 of N
Restaurant Metaphor
Cube
Cube
Cube
Cube
Cube
Cube
Kitchen
M
where sausages are made
D
Dining Room
where food and drink refresh the soul
Copyright 2007 Dan McCreary & Associates
27 of N
Thank You!
Please contact me for more information:
•
•
•
•
•
•
Metadata Management Services
Web Services
Service Oriented Architectures
Business Intelligence and Data Warehouse
Metadata Registries
Semantic Web
Dan McCreary, President
Dan McCreary & Associates
Metadata Strategy Development
[email protected]
(952) 931-9198
M
D
Copyright 2007 Dan McCreary & Associates
28 of N