Consensus Algorithms - Stellenbosch University

Download Report

Transcript Consensus Algorithms - Stellenbosch University

Data Warehousing
Willem Visser
RW334
Somebody is watching!
• Everybody seems to be recording your every
move
• Loyalty cards
• Cookies
– Facebook, Twitter,…
– Check out Collusion plug-in for Firefox
• They want to know how to market to you
• Same is true in business
– Know your data, know your business
Data Warehousing
• Integrated repository of data to understand
your business
• Separate from the Operational Database
• Supports decision making
• Subject oriented
• Time variant
• Non-volatile
Features
• Only necessary data to allow modeling and
decision making
• Coming from potentially many sources
• Time component
– Even though operationally there might not be
• Data doesn’t change after loading
– No operational updates
– Periodic refresh
Operational vs Warehouse
• Operational
– Optimized for on-line transactional processing
– OLTP
• Warehouse
– Optimized for online analytic processing
– OLAP
– Complex queries
– Very large volumes
Data cube
• Multi dimensional data
– Not just 3D (but mostly shown as such)
Lattice of Cuboids
all
0-D(apex) cuboid
product
product,date
date
country
product,country
1-D cuboids
date, country
2-D cuboids
3-D(base) cuboid
product, date, country
How much of the cube is materialized before the query:
• Full (complete cuboid)
• None (materialized on the fly)
• Partial
Slide by Dr. Hany Saleeb
OLAP
• “Querying and presenting text and numeric
data from data warehouses in a dimensional
cube-style”
– Slicing a dimension:
Per region, per product, per period
– Drill-down:
Country  Region  Town  Suburb
– Drill-up, drill-around, etc.
– Visualization
Slide by Cor Winkler
Multi-Tiered Architecture
other
Metadata
sources
Operational
DBs
Extract
Transform
Load
Refresh
Monitor
&
Integrator
Data
Warehouse
OLAP Server
Serve
Analysis
Query
Reports
Data mining
Data Marts
Data Sources
Data Storage
OLAP Engine Front-End Tools
Slide by Dr. Hany Saleeb
Steps
• Data extraction:
– get data from multiple, heterogeneous, and external
sources
• Data cleaning:
– detect errors in the data and rectify them when possible
• Data transformation:
– convert data from legacy or host format to warehouse
format
• Load:
– sort, summarize, consolidate, compute views, check
integrity, and build indices and partitions
• Refresh
– propagate the updates from the data sources to the
warehouse
Star-Schema Example
SALES FACT TABLE
TIME DIMENSION
time_key (FK)
product_key (FK)
store_key (FK)
promo_key (FK)
dollars
units
cost
time_key (PK)
SQL_date
day_of_week
week_number
month
PRODUCT
STORE DIMENSION
store_key (PK)
store_ID
store_name
address
district
region
District
Atherton
Atherton
Atherton
Belmont
Belmont
Belmont
PRODUCT DIMENSION
product_key (PK)
SKU
description
brand
category
package_type
size
flavor
PROMOTION DIMENSION
promotion_key (PK)
promotion_name
promotion_type
price_treatment
ad_treatment
display_treatment
coupon_type
Brand
Clean Fast
More Power
Zippy
Clean Fast
More Power
Zippy
Total Dollars
$ 1,233
$ 2,239
$ 848
$ 2,097
$ 2,428
$ 633
Total Cost
$ 1,058
$ 2,200
$ 650
$ 1,848
$ 2,350
$ 580
Gross Profit
$ 175
$ 39
$ 198
$ 249
$ 78
$ 53
Slide by Cor Winkler
Data Warehouse Usage
• Three kinds of data warehouse applications
– Information processing
• supports querying, basic statistical analysis, and reporting using
crosstabs, tables, charts and graphs
– Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
– Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models,
performing classification and prediction, and presenting the
mining results using visualization tools.
• Differences among the three tasks
Slide by Dr. Hany Saleeb
Information Dashboards
Slide by Cor Winkler
Information Exploitation
Business Intelligence (BI)
R Value
of
decision
Reporting
What
happened?
Historical info
← # of users
Analysis
Why it
happened?
Dynamic
slice&dice
Data Mining
What might
happen?
Obscure data
relationships
and trends
Intelligent
agents
Make it
Complexity →
happen!
Automatic
response to
business
triggers
Complexity →
Slide by Cor Winkler