Data Warehousing/Mining

Download Report

Transcript Data Warehousing/Mining

Data Warehousing/Mining
Introduction
Data Warehousing/Mining
1
Outline of Lecture






Brief History of Data Warehousing
What is a Data Warehouse?
Need For Strategic Information
Information Crisis
Operational and Decision Support System
Difference B/W standard DB and Data
warehouse
Data Warehousing/Mining
2
Data Warehouse Evolution
Relational
Databases
1960
Company
DWs
1975
1980
PC’s and
Spreadsheets
End-user
Interfaces
Data Warehousing/Mining
1985
1990
Data Replication
Tools
1995
2000
Information“Middle Data
Based
Revolution
Ages”
Management
1st DW
Article
DW
Confs.
TIME
“Prehistoric
Times”
“Building the
DW”
Inmon (1992)
Vendor DW
Frameworks
3
Escalating Need For Strategic Information
Organizations need information to formulate
the business strategies,establish Goals,set
Objectives
e.g.



Increase the customer by 10% over the next 5 years
Gain market share by 15% in the next 2 years
Increase product quality levels in the top five product
groups
Data Warehousing/Mining
4
The Information Crisis
Information is said to be doubled every 18
months
 Organizations have tons of data available
Then why information Crisis?
Why cant organizations convert the data into
useful information for strategic decision
making?

Data Warehousing/Mining
5
Problem: Heterogeneous Information
Sources
“Heterogeneities are everywhere”
Personal
Databases
Scientific Databases




Digital Libraries
Different interfaces
Different data representations
Diverse structure of databases
Duplicate and inconsistent information
Data Warehousing/Mining
World
Wide
Web
6
About Some Definitions



What is data?
What is information?
What is Warehouse?
Data Warehousing/Mining
7
What is a Data Warehouse?
A Practitioners Viewpoint
“A data warehouse is simply a single, complete,
and consistent store of data obtained from a
variety of sources and made available to end
users in a way they can understand and use it
in a business context.”
-- Barry Devlin, IBM Consultant
Data Warehousing/Mining
8
A Data Warehouse is...

Stored collection of diverse data
– A solution to data integration problem
– Single repository of information

Subject-oriented
– Organized by subject, not by application
– Used for analysis, data mining, etc.


Large volume of data (Gb, Tb)
Non-volatile
– Historical
– Time attributes are important
Data Warehousing/Mining
9
A Data Warehouse is... (continued)


Updates infrequent
Examples
– All transactions EVER at WalMart
– Complete client histories at insurance firm
– Stockbroker financial information and portfolios
Data Warehousing/Mining
10
Summary
Business Information
Interface
Data
Warehouse
Data Warehouse
Population
Operational Systems
Data Warehousing/Mining
11
What is Operational and Decision
Support System
Operational Systems
 Making the wheels of Business Turn
–
–
–
–
–
–
Take an order
Process a claim
Make shipment
Generate an invoice
Receive cash
Reserve an airline seat
Data Warehousing/Mining
12
What is Operational and Decision
Support System (Contd…)
Decision Support System
 Watching the wheels of business turn
–
–
–
–
–
Show the top selling products
Show the problem regions
Tell me why (drill down)
Let me see other data (drill across)
Alert me when a district sells below target
Data Warehousing/Mining
13
Difference
Operational
Informational
Data Content
Current Values
Archived, derived, optimized
Data Structure
Optimized for
transaction
Optimized for complex
queries
High
Medium to Low
Read, update, delete
Read
Predictable, repetitive
Ad hoc, random, Heuristic
Access
Frequency
Access Type
Usage
Response Time Sub seconds
Users
Data Warehousing/Mining
Large Number
Several Seconds to Minutes
Relatively Small number
14
Warehouse is a Specialized DB
Warehouse
Standard DB







Mostly updates
Many small transactions
Mb - Gb of data
Current snapshot
Index/hash on p.k.
Raw data
Thousands of users (e.g.,
clerical users)
Data Warehousing/Mining







Mostly reads
Queries are long and complex
Gb - Tb of data
History
Lots of scans
Summarized, reconciled data
Hundreds of users (e.g.,
decision-makers, analysts)
15
Warehousing and Industry

Warehousing is big business
– $2 billion in 1995
– $3.5 billion in early 1997
– About $8 billion in 1998 [Metagroup]

WalMart has largest warehouse
– 900-CPU, 2,700 disk, 23 TB Teradata system
– ~7TB in warehouse
– 40-50GB per day
Data Warehousing/Mining
16
Data Warehousing: Two Distinct
Issues
(1) How to get information into warehouse
“Data warehousing”
(2) What to do with data once it’s in warehouse
“Warehouse DBMS”
 Both rich research areas
 Industry has focused on (2)
Data Warehousing/Mining
17
Thank You Very Much
Data Warehousing/Mining
18