Data Warehousing

Download Report

Transcript Data Warehousing

Virtual University of Pakistan
Data Warehousing
Lecture-4
Introduction and Background
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
FAST National University of Computers & Emerging Sciences, Islamabad
1
DWH-Ahsan Abdullah
Introduction and Background
2
DWH-Ahsan Abdullah
How is it Different?
 Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
 Decision makers typically don’t work 24 hrs a day and 7
days a week. An ATM system does.
 Once decision makers start using the DWH, and start
reaping the benefits, they start liking it…
 Start using the DWH more often, till want it available
100% of the time.
3
DWH-Ahsan Abdullah
How is it Different?
 Starts with a 6x12 availability requirement ...
but 7x24 usually becomes the goal.
 For business across the globe, 50% of the world may be
sleeping at any one time, but the businesses are up 100%
of the time.
 100% availability not a trivial task, need to take into
account loading strategies, refresh rates etc.
4
DWH-Ahsan Abdullah
How is it Different?
 Does not follows the traditional development
model
Requirements

Program
Classical SDLC

 Requirements gathering
 Analysis
 Design
 Programming
 Testing
 Integration
 Implementation
5
DWH-Ahsan Abdullah
How is it Different?
 Does not follows the traditional development
model
DWH
Program

Requirements
DWH SDLC (CLDS)
 Implement warehouse
 Integrate data
 Test for biasness
 Program w.r.t data
 Design DSS system
 Analyze results
 Understand requirement
DWH-Ahsan Abdullah
6
Data Warehouse Vs. OLTP
OLTP (On Line Transaction Processing)
Select tx_date, balance from tx_table
Where account_ID = 23876;
7
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP
DWH
Select balance, age, sal, gender from
customer_table, tx_table
Where age between (30 and 40) and
Education = ‘graduate’ and
CustID.customer_table =
Customer_ID.tx_table;
8
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP
OLTP
DWH
Primary key used
Primary key NOT used
No concept of Primary Index Primary index used
Few rows returned
Many rows returned
May use a single table
Uses multiple tables
High selectivity of query
Low selectivity of query
Indexing on primary key
(unique)
Indexing on primary index
(non-unique)
9
DWH-Ahsan Abdullah
Data Warehouse Vs. OLTP
OLTP: OnLine Transaction Processing (MIS or Database System)
Data Warehouse
OLTP
Scope
* Application –Neutral
* Single source of “truth”
* Evolves over time
* How to improve business
* Application specific
* Multiple databases with repetition
* Off the shelf application
* Runs the business
Data
Perspective
* Historical, detailed data
* Some summary
* Lightly denormalized
* Operational data
* No summary
* Fully normalized
Queries
* Hardly uses PK
* Number of results
returned in thousands
* Based on PK
* Number of results returned in
hundreds
Time factor
* Minutes to hours
* Typical availability 6x12
* Sub seconds to seconds
* Typical availability 24x7
10
DWH-Ahsan Abdullah
Comparison of Response Times
 On-line analytical processing (OLAP) queries must
be executed in a small number of seconds.
 Often requires denormalization and/or sampling.
 Complex query scripts and large list selections can
generally be executed in a small number of
minutes.
 Sophisticated clustering algorithms (e.g., data
mining) can generally be executed in a small
number of hours (even for hundreds of thousands
of customers).
11
DWH-Ahsan Abdullah
Putting the pieces together
Data
(Tier 0)
Data Warehouse Server
(Tier 1)

Semistructured
Sources
www data











IT
Users
Archived
data
Extract
Transform
Load
(ETL)
Clients
(Tier 3)
MOLAP
Query/Reporting

Meta
Data
Data
Warehouse
Operational
Data Bases
Data sources
OLAP Servers
(Tier 2)
Analysis


Business
Users
Data Mining
ROLAP

Data Marts



Tools
Business Users
12
DWH-Ahsan Abdullah