Generic Information Builders` Presentation Template

Download Report

Transcript Generic Information Builders` Presentation Template

Maximize WebFOCUS Performance
with Hyperstage
Louisville User Group Meeting
April 25, 2012
Lori Pieper
Agenda
 The “Big Data” Business Challenge
 Pivoting Your Perspective
 Introducing WebFOCUS Hyperstage
 How does it work?
 So what’s the big deal?
 Demonstration
 Wrap Up and Q&A
The “Big Data”Business
Challenge
Copyright 2007, Information Builders. Slide 3
Data Warehousing Challenges
Limited Resources
and Budget
More Data,
More Data Sources
010101010101010101010101010
01010101010101010101010101
Real time data
01 1
010101010101010101010
0101010101010101010101010
Multiple databases
More Kinds of Output
Needed by More Users,
More Quickly
1010
01010101010101010101
0101010101010101010
External Sources 01
101
1
10
1
1
1
010 1
0
 Labor intensive, heavy
Traditional Data
Warehousing
indexing, aggregations and
partitioning
 Hardware intensive:
massive storage; big servers
Copyright 2007, Information Builders. Slide 4
 Expensive and complex
IT Manager’s try to mitigate these response times …..
How Performance Issues are Typically Addressed – by Pace of Data Growth
Tune or upgrade existing databases
66%
Upgrade server hardware/processors
54%
Upgrade/expand storage systems
30%
Upgrade networking infrastructure
21%
32%
0%
20%
44%
High Growth
Low Growth
4%
7%
Don't Know / Unsure
70%
60%
33%
Archive older data on other systems
75%
40%
60%
80%
100%
When organizations have long running queries that limit the business, the
response is often to spend much more time and money to resolve the
problem
Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
Limitations of “Traditional” Solutions
Adding indexes:
 Increases disk space requirements
 Sum of index space requirements can even exceed the
source DB
 Index Management
 Increases load times to build the index
 Predefines a fixed access path
 Reports run slow if you haven’t “anticipated” the
reporting needs correctly
Limitations of “Traditional” Solutions
Building OLAP Cubes:
 Cube technology has limited scalability
 Number of dimensions is limited
 Amount of data is limited
 Cube technology is difficult to update (add Dimension)
 Usually requires a complete rebuild
 Cube builds are typically slow
 New design results in a new cube
 Reports run slow if you haven’t “anticipated” the
reporting needs correctly
Pivoting Your Perspective:
Turn Row-based into Column-based
Copyright 2007, Information Builders. Slide 8
Why is Row-based Limiting for Analytics?
The Ubiquity of Rows …
30 columns
Row-based databases are
ubiquitous because so many
of our most important business
systems are transactional.
50
millions
Rows
Row-oriented databases
are well suited for
transactional environments,
such as a call center where a
customer’s entire record is
required when their profile
is retrieved and/or when fields
are frequently updated.
But - Disk I/O becomes a substantial limiting factor since
a row-oriented design forces the database to retrieve all
column data for any query.
Why is Column-based Perfect for Analytics?
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Row Oriented
(1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)
 Works well if all the columns are needed for every query.
 Efficient for transactional processing if all the data for the row is available
Column Oriented
(1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)
 Works well with aggregate results (sum, count, avg. )
 Only columns that are relevant need to be touched
 Consistent performance with any database design
 Allows for very efficient compression
Why is Column-based Perfect for Analytics?
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Data stored in rows
Data stored in columns
1
Smith
New York
50,000
1
Smith
New York
50,000
2
Jones
New York
65,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
4
Fraser
Boston
70,000
Introducing Hyperstage
Copyright 2007, Information Builders. Slide
12
Introducing WebFOCUS Hyperstage ….
Hyperstage is a high performance
analytic data store designed to
handle business-driven queries on
large volumes of data—with minimal
IT intervention—achieving
outstanding query performance,
with less hardware, no database
tuning and easy migration.
Introducing WebFOCUS Hyperstage ….
But really…
What is it?
Easy to implement and manage, Hyperstage provides the
answers to your business users’ needs at a price you can afford.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage combines
a columnar database
with intelligence we
call the Knowledge
Grid to deliver fast
query responses.
Knowledge Grid
Compressor
Bulk
Loader
Unmatched Administrative Simplicity:
• No indexes
• No data partitioning
• No materialized views
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage adds data
compression of 10:1
to 40:1 so you can
manage large
amounts of data using
much smaller disk
footprint.
Knowledge Grid
Compressor
Bulk
Loader
Powerful Data compression:
• Store terabytes of data with only
gigabytes of disk space
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage adds a
bulk loader plus an
easy to use extraction
and load tool, called
HyperCopy, making
data loading a breeze.
Knowledge Grid
Compressor
Bulk
Loader
Includes embedded ETL:
• Easy and seamless migration of existing
analytical databases
• No change in query or application
required
How Does it Work?
Copyright 2007, Information Builders. Slide
18
WebFOCUS Hyperstage Engine
How does it work?
Column Orientation
Smarter
Architecture
Knowledge Grid – statistics
and metadata “describing”
the super-compressed data
 No maintenance
 No query planning
 No partition schemes
 Easy “load and go”
Data Packs – data
stored
in manageably sized,
highly compressed
data packs
Data compressed
using algorithms
tailored to
data type
Data Organization and the Knowledge Grid ….
Data Packs and Compression
Data Packs
64K
 Each data pack contains 65, 536 data values
 Compression is applied to each individual data pack
 The compression algorithm varies depending on data
64K
type and data distribution
Compression
 Results vary depending on the
64K
64K
Patent Pending
Compression
Algorithms
distribution of data among data
packs
 A typical overall compression
ratio seen in the field is 10:1
 Some customers have seen
results have been as high as 40:1
Data Organization and the Knowledge Grid ….
Data Pack Nodes (DPN)
A separate DPN is created for every data pack created
in the database to store basic statistical information
Character Maps (CMAPs)
Every Data Pack that contains text creates a matrix
that records the occurrence of every possible ASCII
character
Histograms
Histograms are created for every Data Pack that
contains numeric data and creates 1024 MIN-MAX
intervals.
Pack-to-Pack Nodes (PPN)
PPNs track relationships between Data Packs when
tables are joined. Query performance gets better as
the database is used.
This knowledge grid layer = 1% of the compressed volume
WebFOCUS Hyperstage Example: Query and Knowledge Grid
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: salary > 50000
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: age<65
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: job = ‘shipping’
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
3. Find the Data Packs that have job = ‘shipping’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: city = ‘Louisville’
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
1.
2.
3.
4.
salary age
job city
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Eliminate Pack Rows
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
1.
2.
3.
4.
5.
salary age
job city
All packs
ignored
All packs
ignored
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
Eliminate All rows that have been flagged as
irrelevant
All packs
ignored
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Decompress and scan
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
Eliminate All rows that have been flagged as
irrelevant
6. Finally we identify the pack that needs to be
decompressed
salary age
job city
All packs
ignored
All packs
ignored
1.
2.
3.
4.
5.
All packs
ignored
Only this pack will
be de-compressed
All values match
Completely Irrelevant
Suspect
Hyperstage –
So what’s the big deal?
Copyright 2007, Information Builders. Slide
29
WebFOCUS Hyperstage
The Big Deal…
 No indexes
 No partitions
 No views
 No materialized aggregates
 Value proposition
Low IT overhead
Reduced I/O = faster response times
Ease of implementation
Fast time to market
Less Hardware
Lower TCO
“Load and Go”
Some Real World Results



Insurance Company
 Query performance issues with SQL Server - Insurance claims
analysis
 Compression achieved 40:1
 Most queries running 3X faster in Hyperstage
Large Bank
 Query performance issues with SQL Server - Web traffic analysis
 Compression achieved 10:1
 Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds
in Hyperstage
Government Application
 Query performance issues with Oracle – Federal Loan/Grant
Tracking
 Compression achieved 15:1
 Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in
Hyperstage
31
Demonstration …
Copyright 2007, Information Builders. Slide
32
Q&A
Copyright 2007, Information Builders. Slide
33