Database Management Systems

Download Report

Transcript Database Management Systems

University of Manitoba
Asper School of Business
3500 DBMS
Bob Travica
Business Analytics and
Decision Making
OLTP, OLAP & SAP
Chapter 9 & SAP Materials
Updated 2016
D
B
S
Y
S
T
E
M
S
OLTP vs. OLAP
Online Transaction Processing (OLTP) = relational
database systems
Online Analytical Processing (OLAP)
Category
Data storage
Indexes
Joins
Duplicated data
Updates
Queries
OLTP
3NF tables
Few
Many
Normalized,
limited duplication
Continuous,
small data sets
Specific
OLAP
Multidimensional cubes
Many
Minimal
De-normalized DB
In some intervals,
Large data sets
Ad hoc
2 of 20
D
B
S
Y
S
T
E
M
S
OLAP via Data Warehousing
MIS 3500
Predefined
reports
Interactive
data analysis
Operations’
data
Periodical
transfers
Online Transaction
Processing (OLTP):
Querying Databases
with 3NF tables
Flat files
Online Analytical
Processing (OLAP);
Data warehousing;
Data Mining.
Usually de-normalized data.
3 of 20
D
B
S
Y
S
T
E
M
S
OLTP & OLAP in Enterprise Systems
 Enterprise Systems (Enterprise Resource Planning Systems)
support both.
 Example: An SAP-based system can be a TPS, MIS and DSS
for the entire organization. DSS capability draws on data
warehousing & cubing.
 Process approach to organization with data flowing smoothly
end-to-end. Process link up horizontally (department-todepartment) and vertically (process-sub-process).
 Business process* is for the most part the system process.
 More...
4 of 20
D
B
S
Y
S
T
E
M
S
Date Warehousing Goals
Data warehouse (DW):
 Integrates data from different sources to get a larger picture of
business
 Yields multidimensional view of data by creating data cubes
 Allows for statistical analysis on large data sets (test
hypotheses on relationships between pieces of data)
 Allows for discovering new relationships by querying cubes or
by applying data mining software.
5 of 20
D
B
Extraction, Transformation, and Loading
• Preparations performed on data – ETL process
Transform/Transport
S
Y
S
T
E
M
S
Customers
Extract
Convert “Client” to
“Customer”
Load
Apply standard product
numbers
Convert currencies
Fix region codes
Transaction data
from diverse
systems.
Data warehouse:
All data must be
consistent.
6 of 20
D
B
S
Y
S
T
E
M
S
Three-Dimensional View of Data: Cube
• Created in a datawarehouse
P1
Days in year
P2
P3
P4
P5
Sales at
Location
Logic similar to
crosstab query
And pivot table.
7 of 20
D
B
S
Y
S
T
E
M
S
Data Hierarchy
Year
Levels
Quarter
Roll-up
To get higher-level totals
Month
Week
Drill-down
To get lower-level details
Day
8 of 20
D
B
Datawarehouse Tables: Star Design
Dimension
S
Y
S
T
E
M
S
Product
ProductID
Price
Design is:
- Hierarchical (dimension tables have no
direct association)
- De-normalized (fact table): Price &
Quantity inputted to Fact table;
Calculated
fact
Dimension
Customer
Location
CustomerID
LocationID
Detail
Detail
Dimension
Fact Table *
Sale
Sale
SaleDate
SaleDate
Quantity
Quantity
Discount
Discount
StoreID
Revenue=Price*Quantity
Inputted from Product and Sale
Revenue broken down by product,
sales location, and desired time period
(time column/s – day of year, or even
smaller; basis for rollup).
New keys usually used in the fact table (e.g., SaleTbl#-Row#).
9 of 20
D
B
Datawarehouse Tables: Snowflake Design
Product
S
Y
S
T
E
M
S
ItemID
Description
Price
Category
Design is:
- Network-like (dimension tables
can connect directly)
- Still partly normalized (SaleCustomer-City)
OLAPItems
MerchTblRow
SaleTblRow
Price
Quantity
Fact Table
Sale
SaleID
SaleDate
CustomerID
Discount
SalesTax
Dimension Tables
City
CityID
ZipCode
City
State
Customer
CustomerID
Phone
FirstName
LastName
Address
ZipCode
CityID
• Advantage: Design simplified in the Fact table part.
10 of 20
SAP Datawarehouse
D
B
Datawarehouse Cube
Details
S
Y
S
T
E
M
S
Can also be
Dimensions 
More on SAP Datawarehouse
11 of 20
D
B
S
Y
S
T
E
M
S
Multidimensional View of Data –
Precursors to DW: Excel Pivot Table
Dimension:
Last Name
Facts (Key Figures,
Measures)
Dimensions:
Quarter, Month
Quarter
Month
Quarter 1 Quarter 2 Quarter 3 Quarter 4 Grand Total
LastName EmployeeIDData
Carpenter
8 Sum of Animal
1,668.91
Sum of Merchandise
324.90
Eaton
6 Sum of Animal
522.37
Sum of Merchandise
30.60
Farris
7 Sum of Animal
5,043.36
Sum of Merchandise
826.92
Gibson
2 Sum of Animal
4,983.51
Sum of Merchandise
668.25
Hopkins
4 Sum of Animal
3,747.96
Sum of Merchandise
476.91
James
5 Sum of Animal
3,282.77
Sum of Merchandise
505.89
O'Connor
9 Sum of Animal
2,643.69
Sum of Merchandise
263.70
Reasoner
3 Sum of Animal
4,577.43
Sum of Merchandise
762.30
Reeves
1 Sum of Animal
1,120.93
Sum of Merchandise
263.88
Shields
10 Sum of Animal
1,008.76
Sum of Merchandise
62.10
Total Sum of Animal
28,599.69
Total Sum of Merchandise
4,185.45
606.97
78.30
426.39
99.00
341.85
54.90
1,059.70
188.10
1,549.83
238.50
1,194.88
252.90
2,373.08
693.45
180.91
83.70
625.74
89.10
372.65
121.50
437.88
99.00
510.12
55.80
589.68
116.80
7,591.11
1,624.05
162.15
22.50
2,840.72
569.50
7.20
128.70
562.50
107.10
796.47
306.00
2,556.10
450.90
128.41
7.20
150.11
99.00
2,500.24
396.90
6,701.03
1,495.80
2,709.47
630.90
1,426.72
192.60
6,899.53
1,321.02
9,089.44
1,357.65
5,443.90
858.51
6,243.84
1,397.34
3,334.72
403.20
8,293.09
1,365.10
1,120.93
263.88
1,170.91
84.60
45,732.55
7,874.80
Can place data in rows or columns.
By grouping months, can instantly get quarterly or monthly totals.
12 of 20
D
B
S
Y
S
T
E
M
S
Multidimensional View of Data: CUBE
Option in SQL 99
SELECT Category, Month, Sum, GROUPING (Category) AS Gc,
GROUPING (Month) AS Gm
FROM …
GROUP BY CUBE (Category, Month...)
Category
Bird
Bird
…
Bird
Bird
Cat
Cat
…
Cat
(null)
(null)
(null)
…
(null)
Month
Amount
Gc
Gm
1
2
135.00
45.00
0
0
0
0
(null)
(null)
1
2
32.00
607.50
396.00
113.85
0
1
0
0
0
0
0
0
(null)
1
2
3
1293.30
1358.8
1508.94
2362.68
1
0
0
0
0
1
1
1
(null)
8451.79
1
1
13 of 20
D
B
S
Y
S
T
E
M
S
GROUPING SETS: Hiding Details
SELECT Category, Month, Sum
FROM …
GROUP BY GROUPING SETS
( ROLLUP (Category),
ROLLUP (Month),
() )
Category Month
Bird
(null)
Cat
(null)
…
(null)
1
(null)
2
(null)
3
…
(null) (null)
Amount
607.50
1293.30
1358.8
1508.94
2362.68
8451.79
14 of 20
D
B
S
Y
S
T
E
M
S
SQL RANK Functions
SELECT Employee, SalesValue
RANK() OVER (ORDER BY SalesValue DESC) AS rank
DENSE_RANK() OVER (ORDER BY SalesValue DESC) AS dense
FROM Sales
ORDER BY SalesValue DESC, Employee;
Employee
SalesValue
rank
dense
Jones
18,000
1
1
Smith
16,000
2
2
Black
16,000
2
2
White
14,000
4
3
DENSE_RANK
does not skip
numbers
• Therefore, advances in SQL motivate DBMS vendors to support
OLAP and data warehousing.
15 of 20
D
B
S
Y
S
T
E
M
S
Broader Data Analysis with
Data Mining
 Goal: To discover unknown relationships in the data that can be
used to make better decisions.
 Exploratory analysis.
 A bottom-up approach that scans the data to find relationships
 Some statistical routines, but they are not sufficient
 Statistics relies on averages
 Sometimes the important data lies in more detailed pairs
 Supervised by developer vs. unsupervised (self-organizing
artificial neural networks)
16 of 20
D
B
Common Techniques
 1. Classification/Prediction
S
Y
S
T
E
M
S
 2. Association Rules/Market Basket Analysis
 3. Clustering
17 of 20
D
B
S
Y
S
T
E
M
S
1. Classification
(Prediction)
 Purpose: “Classify” things that are causes and those that are
effects.
 Examples
 Which borrowers/loans are most likely to be successful?
 Which customers are most likely to want a new item?
 Which companies are likely to file bankruptcy?
 Which workers are likely to quit in the next six months?
 Which startup companies are likely to succeed?
 Which tax returns are fraudulent?
18 of 20
D
B
S
Y
S
T
E
M
S
Classification Process




Clearly identify the outcome/dependent variable.
Identify potential variables that might affect the outcome.
Use sample data to test and validate the model.
Regression/correlation analysis, decision tables and trees,
etc.
Income Credit History Job Stability
Credit
Success
50000
Good
Good
Yes
75000
Mixed
Bad
No
19 of 20
D
B
2. Association/Market Basket
 Purpose: Determine what events or items go together/co-occur.
S
Y
S
T
E
M
S
 Examples:
 What items are customers likely to buy together?
(Business use: Consider putting the two together to
increase cross-selling.)
20 of 20
D
B
S
Y
S
T
E
M
S
Association Challenges
 If an item is rarely purchased, any other item bought with it
seems important. So combine items into categories.
 Some relationships are obvious.
 Burger and fries.
 Some relationships are puzzling/meaningless.
 Hardware store found that toilet rings sell well only when a new
store first opened. But what does it mean?
21 of 20
D
B
S
Y
S
T
E
M
S
3. Cluster Analysis
 Purpose: Determine groups of people or some entities.
 Examples
 Are there groups of customers? (If so, we could target
them; market segmentation)
 Do the locations for our stores have elements in common?
(If so, we can search for similar clusters for new locations.)
 Do employees have common characteristics?
(If so, we can hire similar, or dissimilar, people.)
Large intercluster
distance
Small intracluster distance
22 of 20