Loading a Fact Table - Zhangxi Lin
Download
Report
Transcript Loading a Fact Table - Zhangxi Lin
ISQS 3358, Data Management and Business Intelligence
Cubism – Measures and
Dimensions
Zhangxi Lin
Texas Tech University
1
Outline
Measures
Where we’ve been
Populating fact table
Types of dimensions
2
Structure and Components of Business
Intelligence
SSMS
SSIS
SSAS
SSRS
SAS
EG
SAS
EM
3
Snowflake Schema of the Data Mart
10
DimBatch
9
DimMachine
8
Manufacturingfact
3
2
1
DimProduct
5
DimMachineType
DimPlant
7
4
DimMaterial
DimCountry
6
DimProductSubType
DimProductType
4
Where we’ve been and where we are now
Exercise 1: Getting started
Exercise 2: Creating a data mart with SSMS
Exercise 3: Import tables from a database
Exercise 4: Populating dimensions of a data mart
Exercise 5: Loading fact tables
Exercise 6: Create and customize a cube
5
What we need to do with the half-done data
mart?
Populate DimBatch dimenstion table
Populate ManufacturingFact table
Build an OLAP cube (we already did this before)
Check measures
Check dimensions
6
LOADING FACT TABLES
7
Facts
Facts are measurements associated with a specific
business process.
Many facts can be derived from other facts, including
additive and semiadditive facts.
Non-additive facts can be avoided by calculating it
from additive facts.
Measures are clustered together in a group, called
measure group.
8
Types of measures
Three types
◦ Additive measures. Most facts are additive (calculative), such as
sum
◦ Semiadditive measures. The measures that can be added along
some dimensions, but not along others. For example, inventory
level can be added along product dimension but not time
dimension.
◦ Non-additive (such as max, average), or descriptive (e.g. factless
fact table).
Aggregate functions
◦ Additive: Sum
◦ Semiadditive: ByAccount, Count, FirstChild, FirstNonEmpty,
LastChild, LastNonEmpty, Max, Min
◦ Nonadditive: DistinctCount, None.
Measures and dimensions
Dimensions are used to aggregate
measures. Therefore, they must be
somehow related to measures
Granularity
◦ Important for the analysis
◦ There could be missing values in the fact table
EXERCISE 5: LOADING
A FACT TABLE
Exercise 5: Loading Fact Tables
Project name: Use the same project with a new Package
Package name: FactLoad.dtsx
Tasks
◦ Create Inventory Fact table
◦ Load Dim Batch
◦ Load Manufacturing Fact
Deliverable: email a screenshot of the “green”
outcome of the ETL project to
[email protected], with a subject title “ISQS
3358 EX5 - <lastname>”
12
Data Sources for Loading Fact
For loading DimBatch table and ManufacturingFact table
◦ BatchInfo.CSV in network folder ~\Downloaded
data\Batchinfor CSV File\
13
Control Flow for Loading Facts and the
Remaining Dimension
Note: to ease
debugging, you
may use separate
packages and test
them one by one,
instead of doing
everything in one
package
14
Flat File Connection
Data types
◦ BatchNumber, MachinNumber: four-byte
signed integer [DT_I4]
◦ ProductCode, NumberProduced,
NumberRejected: four-byte signed integer
[DT_I4]
◦ TimeStarted, TimeStopped: database
timestamp [DT_DBTimeStamp]
Only check BatchNumber as the input of Dim
Batch
All columns are needed for fact tables
15
Some Frequently Used Nodes
Load DimBatch Data Flow
Note: Because of duplication in the source file, we may insert
An Aggregate item after the Flat File Source item.
17
The Flat File Source
18
Sort Transformation
In the Aggregate item,
Define “Group-by”
BatchNumber.
In Derived column item,
Define BatchName
From BatchNumber
Use the expression
(DT_WSTR, 50)[BatchNumber]
To change the data type
Of BatchName.
19
Load Fact Data Flow
20
Derived Columns for the Fact table
21
Expressions for the Derived Columns
AcceptedProducts
◦ [NumberProduced] – [NumberRejected]
ElapsedTimeForManufacture
◦ DATEDIFF(“mi”, [TimeStarted],[TimeStopped])
DateOfManufacture
◦ (DT_DBTIMESTAMP)SUBSTRING((DT_WSTR,25)[TimeS
tarted],1,10)
This expression converts TimeStarted into a string and
selects the first ten characters of that string. This string is
then converted back into a date time, without the time
portion.
22
OLE DB Destination
For loading the fact
table
23
Debugging Results
Loading DimBatch
Loading ManufacturingFact
24
Exercise 6: Building an
OLAP Cube
25
Exercise 6: Building an OLAP Cube
This is a SSAS project
Project name: ISQS3358-002-EX6-2016-lastname
Tasks
◦ Add in new date items (year, quarter, and month) to two fact tables
◦ Create time dimension using Manufacturing Fact table
◦ Define calculated measures (Total Products, Percent Rejected)
◦ Define hierarchies of attributes in dimension tables
◦ Create a cube from the MaxMinManufacturing data mart with
hierarchical date dimension
Deliverable:
◦ Screenshots: dimension hierarchies, dimensions, relationships of facts
and dimensions, deployment result, format of measures, and browsing
results.
26
Three Steps to Create a Cube from Data
Sources
Defining data source
Defining data source view
◦ Add in three new columns of year, quarter, and month for the
two fact tables
Building a cube.
◦ Define a new dimension Dim Time from Manufacturing Fact
table
Customize the cube:
◦ Link two fact tables in a cube
◦ Define new primary key for Dim Time
◦ Define calculated measures
◦ Relate dimensions to measures
27
T-SQL Expressions for DS View Definition Manufacture
YearOfManufacture
CONVERT(char(4),YEAR(DateOfManufacture))
QuarterOfManufacture
CONVERT(char(4),YEAR(DateOfManufacture)) +
CASE
WHEN MONTH (DateOfManufacture) BETWEEN 1 AND 3 THEN 'Q1'
WHEN MONTH (DateOfManufacture) BETWEEN 4 AND 6 THEN 'Q2'
WHEN MONTH (DateOfManufacture) BETWEEN 7 AND 9 THEN 'Q3'
ELSE 'Q4'
END
MonthOfManufacture
CONVERT(char(4),YEAR(DateOfManufacture)) +
RIGHT('0'+CONVERT(varchar(2), MONTH(DateOfManufacture)),2)
28
Data Source View
We don’t do this
In this semester
New columns
29
Select Measures Page
Uncheck Manufacture
Fact Count
30
The finished cube
31
We don’t do this
table
Defining a format string
32
Calculated measures – made-up facts
The definition of calculated measure is stored in the
OLAP cube itself.
The actual values that result from a calculated measure
are not calculated, however, until a query containing that
calculated measure is executed. The results of that
calculation are then cached in the cube. The cached
value is then delivered to any subsequent users
requesting the same calculation.
The expressions of calculation are created using a
language known as Multidimensional Expression
Language (MDX) script. MDX is different from T-SQL. It
is a special language with features designed to handle
the advanced mathematics and formulas required by
OLAP analysis. This is not found in T-SQL.
33
35
Configure DIMENSIONS
in SQL Server
36
Types of Dimensions
Fact dimensions: the Dimensions created from attributes in a fact
table
Parent-Child dimensions: Built on a table containing a selfreferential relationship, such as a parent attribute.
Role playing dimensions: related to the same measure group
multiple times; each relationship represents a different role the
dimension play; for example, time dimension plays three different roles:
date of sale, data of shipment, and date of payment.
◦ To create a role playing dimension, add the dimension to the Dimension
Usage tab multiple times. Then create a relationship between each instance
of the dimension and the measure group.
Reference dimensions: Not related directly to the measure group
but to another regular dimension which in turn related to the measure
group
Data mining dimensions: the information discovered by data mining
Many-to-many dimensions: e.g. multiple ship to addresses
Slowly changing dimensions
37
Slowly changing dimensions
Type 1 SCD – no track
Type 2 SCD – tracking the entire history, adding four
attributes: SCD Original ID, SCD Start Date, SCD
End Date, SCD Status
Type 3 SCD – Similar to Type 2 SCD but only track
current state and the original state; two additional
attribute: SCD Start Date, SCD Initial Value
Add a time dimension (a fact dimension)
Rename time dimension
Date Hierarchy
Material Hierarchy & Plant Hierarchy
Product Hierarchy
Relating Dimensions in the Cube