Data Warehousing - 2 - San Francisco State University

Download Report

Transcript Data Warehousing - 2 - San Francisco State University

Data Warehousing - 2
ISYS 650
Data Warehouse Design
- Star Schema • Dimension tables
– contain descriptions about the subjects of the
business such as customers, employees, locations,
products, time periods, etc.
• Fact table
– contain detailed business data with links to
dimension tables.
Star schema example
Fact table provides statistics for sales
broken down by product, period and
store dimensions
Dimension tables contain descriptions about the subjects of the business
Note: What is the key of the fact table?
Star schema with sample data
On-Line Analytical Processing (OLAP) Tools
• The use of a set of graphical tools that provides users
with multidimensional views of their data and allows
them to analyze the data using simple windowing
techniques
• OLAP Operations
–
–
–
–
Cube slicing–come up with 2-D view of data
Drill-down–going from summary to more detailed views
Roll-up – the opposite direction of drill-down
Reaggregation – rearrange the order of dimensions
Slicing a data cube
Summary report
Example of drill-down
Starting with summary
data, users can obtain
details for particular
cells
Drill-down with
color added
Excel’s Pivot Table
• Insert/Pivot Table or Pivot Chart
– Drill down, rollup and reaggregation
– Pivot: change the dimensional orientation of a
report or an ad hoc query-page display
– Filter
• Pivot Chart
– Filter
– Drilldown, rollup, reaggregation
Data Warehouse Lifecycle
• Requirement gathering
– Determine the reports that DW is supposed to support.
• Identify data sources and data modeling
– based on user requirements
• Extract data and populate the staging area with the
data extracted from transactional sources.
• Build and populate a dimensional database.
• Build Extraction Transformation and Loading (ETL)
routines to populate the dimensional database
regularly.
• Build reports and analytical views
• Maintain the warehouse by adding/changing
supported features and reports
Example:
Transaction Database
CID
Cname
City
OID
ODate
Rating
SalesPerson
Customer
1
Has
M
Order
M
Qty
Has
M
Product
Price
PID
Pname
Analyze Sales Data
Detailed Business Data
• Total sales:
– by product:
• Qty*Price of each detail line
• Sum (Qty*Price)
• Detailed business data: qty*price
• Total quantity sold:
– By product:
• Sum(Qty)
• Detailed business data: Qty
Dimensions for Data Analysis:
Factors relevant to the business data
• Analyze sales by Product
• Analyze sales related to Customer:
– Location: Sales by City
– Customer type: Sales by Rating
• Analyze sales related to Time:
– Quarterly, monthly, yearly Sales
• Analyze sales related to Employee:
– Sales by SalesPerson
Data Warehouse Design
- Star Schema • Dimension tables
– contain descriptions about the subjects of the
business such as customers, employees, locations,
products, time periods, etc.
• Fact table
– contain detailed business data with links to
dimension tables.
Star Schema
Location
Dimension
LocationCode
State
City
Can group by State, City
FactTable
LocationCode
PeriodCode
Rating
PID
Qty
Amount
Product
Dimension
PID
Pname
Category
CustomerRating
Dimension
Rating
Description
Period
Dimension
PeriodCode
Year
Quarter
Define Location Dimension
• Location:
– In the transaction database: City
– In the data warehouse we define Location to be
State, City
• San Francisco -> California, San Francisco
• Los Angeles -> California, Los Angeles
– Define Location Code:
• California, San Francisco -> L1
• California, Los Angeles -> L2
Define Period Dimension
• Period:
– In the transaction database: Odate
– In the data warehouse we define Period to be:
Year, Quarter
• Odate: 11/2/2003 -> 2003, 4
• Odate: 2/28/2003 -> 2003, 1
– Define Period Code:
• 2003, 4 -> 20034
• 2003, 1 -> 20031
The ETL Process
• Capture/Extract
• Transform
– Scrub(data cleansing),derive
– Example:
• City -> LocationCode, State, City
• OrderDate -> PeriodCode, Year, Quarter
• Load and Index
From SalesDB to MyDataWarehouse
• Extract data from SalesDB:
– Create query to get the fact data
• FactData
– Download to MyDataWareHouse
• Transform:
– Transform City to Location
– Transform Odate to Period
• Query FactDataScrubing
• Load data to FactTable
Performing Analysis
• Analyze sales:
– by Location
– By Location and Customer Type
– By Location and Period
– By Period and Product
• Pivot Table:
– Drill down, roll up, reaggregation
HR Database
• Historical data:
– Job_History
A record in this table keep track the starting date and
ending date of an employee working on a job at a
department.
We may study:
• Average days an employee stays in assigned
jobs.
• Average days employees stay in a specific
job_id.
• Any difference among departments in how
long employees stay in job.
• Will the starting year affect how long
employees stay in job?
• Basic measurement:
– DaysOnJob: End_Date – Start_Date
Star Schema
City
Dimension
City
Country_Name
City
Dimension
City
Country_Name
FactTable
Empliyee_ID
SartedYear
Job_ID
Department_ID
City
DayOnJob
Department
Dimension
Department_ID
Department_Name
Employee
Dimension
Empliyee_ID
FullName
Email
StartYear
Dimension
StartedYear
Define Dimensions
• Employee dimension:
– Employee_ID, FullName, Email
• FullName = First_name || ‘ ‘ || Last_Name
• Job dimension:
– Job_ID, Job_Title
• City dimension:
– City, Country_Name
• Join Locations and Countries
• Department dimension:
– Department_ID, Department_Name
• StartYear dimension
– StartedYear
• extract(year from start_date)
Create DWHR Using Access
• Each dimension is defined as a view in HR
database.
• Communication between Access and Oracle is
using ODBC.
• In Access, we can import Oracle’s view to
create a table.
Create View to Retrieve Fact Data
FactData view is a join of Job_History, Departments and
Locations.
Transform Fact Data
select employee_id, extract(year from start_date) as StartedYear,
Job_id,department_id,city, End_date-Start_date as DaysOnJob from
factdata ;
Reference
• http://msdn.microsoft.com/enus/library/aa902672(SQL.80).aspx#sql_dwdesi
gn_tool