Data Warehouse

Download Report

Transcript Data Warehouse

W
I
N
T
E
R
C
O
R
P
Large Scale Data Warehousing:
Trends and Observations
Richard Winter
President, WinterCorp
[email protected]
Pekka Kostamaa
Director of Advanced Development, Teradata
[email protected]
ICDE 2010
THE LARGE SCALE DATA MANAGEMENT EXPERTS
Emerging Trends
•
•
•
•
•
Big Data
“Big” Complexity
Real-Time Analytics
Data Warehousing for Operational Impact
In-Database Processing
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
2
Military Grapples With Information Overload
InformationWeek, July 9, 2009
"As
the sensors associated with the various
surveillance missions improve, the data volumes
are increasing with a projection that sensor data
volume could potentially increase to the level of
Yottabytes (10^24 Bytes) by 2015," the report says.
Referring to Data Analysis Challenges, JSR-08-142, JASON, The
MITRE Corp, 12/08
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
3
Military Projection of Sensor Data Volume
(later refuted)
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
4
Data Analysis Challenges
JSR-08-142, JASON, The MITRE Corp, 12/08
JASON report actually projects
that data volumes will be in the
hundreds of petabytes by 2015
(217 not 224)
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
5
Size of the Largest Publicly Reported Data Warehouse
(Terabytes of Stored Data)
TB
Survey Data
1000
Customer Reports
800
600
400
200
Moore’s Law
Growth Rate
Winter TopTen Survey
CAGR = 173%
1998
2000
2002
2004
2006
2008
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
6
Size of the Largest Publicly Reported Data Warehouse
(Petabytes of Stored Data)
PB
Projects to ~100
PB in 2015
Projections
10
Customer Reports
Survey Data
8
Ebay
6
Projects to ~500
PB in 2015
Moore’s Law
Growth Rate
4
CAGR = 173%
2
2002
2004
2006
2007
2009
2011
2013
2015
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
7
“Big Data”
•
•
Large scale, parallel analysis
Uses alternatives to data warehouse technology
– MapReduce
• Scale reported to be 10x – 100x data warehousing
• At observed data warehouse growth rates, projects to 5-50 EB
(1019) by 2015
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
8
Drivers
Information Sources
•
•
•
•
•
•
•
•
•
•
Cell phones
Sensors (in products, buildings, factories….)
Cameras (still and video)
Scanners (e.g., MRI, CAT, X-ray)
Emitters: RFI, transponders….
Bokode
Devices worn by people
Many other electronic devices
Device cost declining exponentially
Device capacity increasing exponentially
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
9
Bokode
• Barcodes for the rest of us
• Camera Culture Group, MIT Media Lab
• Bokode: Imperceptible Visual Tags for
Camera Based Interaction from a Distance
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
©2007 Winter Corporation. All Rights Reserved.
10
ideal barcode
invisible to the human eye;
easily decodable by a machine.
Bokode
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
©2007 Winter Corporation. All Rights Reserved.
11
Big Complexity
•
•
•
•
Workload
Schema
Query Complexity
Concurrency
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
©2007 Winter Corporation. All Rights Reserved.
12
Early Data Warehousing –
Data Marts by Business Function
Business Functions
Planning
Product
Just
Just
Order
Order
Order
Material
Order
About
About
Employee
Component
Component
Every
Every
Customer
Inventory
Inventory
Subject
Subject
Account
Order
Customer
Account
Finance
Employee
Employee
Customer
General
Management
Manufacturing
Supplier
Supply Chain
Product
Marketing
Product
Sales
Product
Data Marts
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
13
Enterprise Processes Touch Multiple Functions
Business Functions
General
Management
Planning
Account
Employee
Manufacturing
Customer
Order
Supply Chain
Employee
Product
Finance
Order
Marketing
Sales
Product
Product
Customer Profitability
Product
Supplier
Just
Just
Order
Material
Order
About
About
Employee
Component
Component
Every
Every
Customer
Inventory
Inventory
Subject
Subject
Customer
Account
Order
Data Marts
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
14
Requirement – Leverage Data Across Enterprise
Business Functions
General
Management
Planning
Order
Manufacturing
Inventory
Supply Chain
Finance
Marketing
Sales
Product
Employee
Account
Material
Supplier
Customer
Component
Delivery
Subjects
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
15
Retail Sales
Reporting Data Mart
Store
Product
Sales
Customer
Date
• Single purpose: understand what, where, when products are selling
• One set of facts: the sales transactions
• Four analytical dimensions: products, customers, stores, date
© Winter
Winter Corporation
Corporation 2007,
2009. 2009.
All Rights
Reserved.
©
All Rights
Reserved.
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
©2007 Winter Corporation. All Rights Reserved.
16
Data Warehouse for Analysis & Management of
Channel
Lifetime Customer Value
Date
Product
Order
Inventory
Employee
Payment
Customer
Interaction
Order Detail
Supplier
Account
Service
Delivered
Material
Component
Customer
Shipment
Location
• Multiple analytical problems
• Multiple, related sets of facts
• Complex web of relationships
Real data models of higher complexity
often have hundreds or thousands of data
entities and relationships
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
17
Requirement –Complex Query and Analysis
• Involve many tables
• Touch more data
• More query
operations
– Many table Joins
– Group by
– Analytic functions
(rank, etc.)
• More complexity
Order By
Join
Join
Retrieve
Group By
Group By
Retrieve
Retrieve
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Winter
Corporation
2009.
All
Rights
Reserved.
©
2007,
AllReserved.
Rights
Reserved.
©2007
WinterCorporation
Corporation.
All2009.
Rights
18
Complexity
Measures
Data Storage
25TBs +
(raw, user data)
20 TB
Multiple, Integrated
Stars and Normalized
Schema
Sophistication
15 TB
Normalized
5 TB
Simple
Star
3-5 Way
Joins
5-10 Way
Joins
# of
Concurrent
Queries
10 TB
Multiple,
Integrated
Stars
15+ way Joins +
OLAP operations +
Aggregation +
Complex “Where”
constraints +
Views
Parallelism
Complex
Simple
MB’s
Batch Reporting,
Repetitive Queries
“Iterative”, Ad Hoc Queries
Data Analysis/Mining
Near Real Time Data Feeds
GB’s
Active Data Warehousing
Workload
Query Data
Query
W I N T E R C O R P : T H E L A R G E TB’s
S C A L EVolumes
D A T A M A N A G E M E N Mix
T EXPERTS
Complexity
©2007 Winter Corporation. All Rights Reserved.
19
Market Segmentation
Extreme
100x
Data Volumes
Scalable
Data Mart
General Purpose
Data Mart
Enterprise
Data Warehouse
AEDW
Low
Low
Source:
20 IDC; Gartner; Teradata Analysis
Complexity
Number of users
Data Integration
Mixed Workload
Availability
Extreme
The (Business) Intelligence Imperative
•
Competitive advantage resides in the exploitation of:
– More detailed information
– More comprehensive histories
– More in depth analysis
– Fuller integration
– More insightful plans and strategies
– More rapid response to events
– More precise and appropriate response to events
WINTERCORP: THE LARGE SCALE DATA MANAGEMENT EXPERTS
© Winter
Corporation
2007, 2009.
All Rights
Reserved.
©2007
Winter
Corporation.
All Rights
Reserved.
21
Real-Time Analytics
22
Accelerating Operational Decisions
Business event
Value
Data captured
Intelligence delivered
Action
taken
• “Moment of Impact”
Opportunity
23
• Too late to take action
Missed Opportunity
Time
Evolving to Real-Time Analytics and Action
STRATEGIC INTELLIGENCE
OPERATIONAL INTELLIGENCE
REPORTING ANALYZING PREDICTING OPERATIONALIZING
WHAT
WHY did it
WHAT WILL
WHAT IS
happened?
happen?
happen?
happening now?
Batch
24
Ad Hoc
Analytics
Continuous updates,
tactical queries
ACTIVATING
MAKE it happen
Event driven
Data Warehousing For Operational Impact
25
Active Enterprise Intelligence™ In Banking
“Faster Loans”
Situation
EMEA bank with 3M customers had costly
and slow manual process for loans and
assessing customer creditability. 3500
people issued 900,000 loans/year. Needed
also to improve customer servicing.
Problem
Lacked ability to access 3600 view of
customer data (internal and external) and
use this data for real-time credit decision
making.
Solution
Used Data Warehouse to drive real-time
internal and external data consolidation
and delivery of near-instant credit
decisions.
26
Impact
• Before: Loan took up to 6
days, cost 240 Euros
• After: Loan takes < 15
minutes and 32 Euros
• Data Warehouse risk
rating portion: 5 seconds
• Cut out human error in
credit decision making
• Payback time 8 months
Applying for Credit: Before
Customer
1
2
3
accept proposal
signed contract
prepare contract
release money
paper application
Decision +
Terms
customer data gathering
Decision
maker 1
Decision
maker 2
Credit
Committee
paper contract
paper decisions
27
6 working days, 240 Euros
Contract
Document archives
Applying for Credit: After
Customer
Result
Monitoring
Multi-channel
Real time
Call
Center
data
gathering
External data
providers
•Retail customers 30 seconds
•Corporate 120 seconds
28
Branch
scoring
Automation
Internet
pricing
Data
Warehouse
ATM
special
conditions
In-Database Processing
29
Superior In-Database Analytics and Processes
Enabling Better, Faster Insight
OLAP Cubes
“What if”
analysis and
reporting
Agile
Analytics
“Sand Boxes”,
new theories
on new data
Excel/
Access
Reporting,
individual
analysis and
models
30
Data Mining
Predictive
analysis on
“what is going
to happen?”
Geospatial
Data
Warehouse
New analytics
on geocoded
data
Superior In-Database Analytics and Processes
Enabling Better, Faster Insight
OLAP Cubes
“What if”
analysis and
reporting
Agile
Analytics
“Sand Boxes”,
new theories
on new data
Data Mining
Predictive
analysis on
“what is going
to happen?”
Excel/
Access
Reporting,
individual
analysis and
models
31
Geospatial
Data
Warehouse
New analytics
on geocoded
data
20-40+% wasted moving data
OLAP Architectures
Various Amounts of Data Being Moved
Executive
Power Users
Marketing
Analyst
Multi-Dimensional View
Physical
Cube
(MOLAP)
10110111011
01010011101
10110111011
10110111011
01010011101
10110111011
01010011101
1101
1001
0101
1011
32
Hybrid
Approach
(HOLAP)
10110111011
01010011101
10110111011
OLAP
Engine
(ROLAP)
1
0
0
1
Data
Warehouse
ORDER
ORDER NUMBER
ORDER DATE
STATUS
ORDER ITEM BACKORDERED
QUANTITY
CUSTOMER
CUSTOMER NUMBER
CUSTOMER NAME
CUSTOMER CITY ORDER ITEM SHIPPED
CUSTOMER POST QUANTITY
SHIP DATE
CUSTOMER ST
CUSTOMER ADDR
CUSTOMER PHONE
ITEM
CUSTOMER FAX
ITEM NUMBER
QUANTITY
DESCRIPTION
OLAP Optimization Results
Landline Communications Provider
• 38 dimensions/24 measures with 5 years of history
> Add 39th dimension: Wire Center
• Maintenance: 13 hours to 3 minutes
• Cube size: 22.4 GB to <10GB
• Detail: Month to Daily
Response Comparison:
OLAP Server
In-Database Processing
300
250
200
150
100
50
0
33
5 Canned
5 Canned with Wire
Center
6 Interactive
6 Interactive with Wire
Center
Summary
• Data Size growth is accelerating
• Complexity is increasing
• Time to respond is shrinking
• Operational agility is a business advantage
• In-database processing improves efficiencies and speed
of analysis
34
Questions?
35