Transcript 슬라이드 제목 없음
Dimensional Model vs. ER Model
• ER(Entity-Relationship) Modeling
–
–
–
–
“to take redundancy out of data”
one-to-many or many-to-many relatioship
normalized table simplifies update and insert operations
many tables make complex query difficult
• Dimensional Model
– designed for easy queries and navigation
– produce few tables and keys compared to ER Model
Star Join Schema (Retail Store)
Schema not Recommended
(Retail Store)
Converted Tables (Retail Store)
• SalesFact(time_key, product_key, store_key,
promotion_key, dollars_sold, units_sold, dollars_cost)
– 4 foreign composite keys
• Time(time_key, day_of_week, month, quarter, year,
holiday_flag)
– time hierarchy
• Product(product_key, description, brand, category)
• Store(store_key, store_name, address, floor_plan_type)
• Promotion(promotion_key, promotion_name)
Data Warehouse Query
• Find all the products that were sold in ‘4th Q 1995’
•
•
•
•
•
•
•
– Brand
Dollar Sales
Unit Sales
– Axon
780
263
– Widget
213
444
– ….
Select p.brand, sum(f.dollars_sold), sum(f.units_sold)
from SaleFact f, Product p, Time t
where f.product_key = p.product_key
and f.time_key = t.time_key
and t.quarter = 4 and t.year = 1995
group by p.brand
order by p.brand
Grocery Store Case (1)
• Time Dimension -> Time Table
–
–
–
–
–
–
–
–
–
–
–
–
–
time_key
day_of_week
day_member_in_month
day_member_overall
week_member_in_year
week_member_overall
quarter
fiscal_period
holiday_flag
weekday_flag
last_day_in_month_flag
season : Christmas, Thanksgiving, Easter, Valentine’s Day, July 4th
event : Super Bowl Sunday, Labor Strike, Hurricane Hugo, etc..
Grocery Case (2)
• Product Dimension -> Product table
–
–
–
–
–
–
–
–
–
–
–
–
–
–
product_key
SKU_description -- Green 3-pack Brawny Paper Towels
SKU_number
-- UPC # xxxxxxxxxxxxxx
package_size
-- 3-pack
brand
-- Brawny
subcategory
-- Paper Towels
category
-- Paper
department
-- Grocery (Bakery, Photo, Drug, etc.)
weight
weight_unit_of_measure
units_per_retail_case
units_per_shipping_case
cases_per_pallet
shelf_width, shelf_height, self_depth, etc...
Grocery Case (3)
• Store Dimension -> Store Table
–
–
–
–
–
–
–
–
–
–
–
–
–
–
store_key
store_name
store_number
store_street_address
store_city
store_county
store_state
store_zip
store_district
store_region
store_manager
store_phone
floor_plan_type, first_opened_date, last_remodeled_date, …
store_sqft, grocery_sqft, frozen_sqft, meat_sqft, etc….
Grocery Case (4)
• Promotion Dimension -> Promotion Table
–
–
–
–
–
–
–
–
–
–
–
promotion_key
promotion_name
price_reduction_type
ad_type
display_type
coupon_type
ad_media_name
display_provider
promotion_cost
promo_begin_date
promo_end_date, etc...
Grocery Case (5)
• Grocery Store Fact --> SaleFact Table
– Units_sold, dollars_sold, dollars_cost,
customer_count
– several formulas
• gross profit
• several ratios like gross margin
• Please take a look at grocery_new.mdb
– unzip “grocery_new.zip” from class web
References
• Ralph Kimbell
– “The Data Warehouse Toolkit: How to Design
Dimensional Data Warehouse”, John Wiley,
1996
• Ralph Kimbell’s DBMS article
– http://www.dbmsmag.com/9510d05.html
– http://www.rkimball.com/
– http://pwp.starnetinc.com/larryg/index.html
OLAP(Online Analytic Processing)
• Applies to all analytic functions
• Steps of OLAP
–
–
–
–
–
Interface : get user’s selection
Query :generate SQL code
Process : data analysis on the result from query
Format :properly label rows and columns
Display : report and chart
OLAP Capabilities
•
•
•
•
Query and Reporting
Multidimensional Analysis
Statistical analysis (user-directed) *extra
Data Mining(agent processing) *extra
– * extra is functionality added recently
– DW/OLAP/DM overlap each other
Multidimensional Analysis
• Let users start from single dimension and
navigate to other dimensions
– slice, dice
• navigating hierarchy of dimension
– drill-down/drill-up/drill-across
• Involves massive calculations
– How have advertising expenses affected sales?
– Which products should be discontinued?
Statistical Analysis
• SAS and SPSS want to play BIG roles
• Calculating average, regression, correlation,
factoring, clustering, etc.
• Example, relationship of “product sales” to
“price”, “distribution”, “weather”
Data Mining
•
•
•
•
Uses same technique as statistical analysis
Adds neural network/learning algorithms
computer program directed
New market
– IBM Intelligent Miner
– Information Discovery
– DataMind
Pivot Table
• MS Excel provides a limited
multidimensional data base capability
– usually works with Excel Chart
– drill down/up is very limited
– users have to prepare one table which is a result
of joining Fact and several dimension tables
– PivotTable = page/Table(row/column)
OLAP using Access & Excel
• Use SQL to prepare tables from Access
– By Joining several tables(dimension + fact)
– Learn SQL enough to join several tables
– Please take a look at handout on SQL or DB
book
• Use Pivot table from Excel for browsing
– Source of Pivot table is joined table
– Pivot table provides limited navigation
Grocery Case
• Use Grocery_new.mdb
• Generate Pivot table which let us do the
following analysis
Want to do analysis on dollars sales by date(day/month),
category, store_name on all the transaction which belongs
to year 1998