Generic Information Builders` Presentation Template
Download
Report
Transcript Generic Information Builders` Presentation Template
Maximize WebFOCUS Performance
with Hyperstage
Louisville User Group Meeting
April 25, 2012
Lori Pieper
Agenda
The “Big Data” Business Challenge
Pivoting Your Perspective
Introducing WebFOCUS Hyperstage
How does it work?
So what’s the big deal?
Demonstration
Wrap Up and Q&A
The “Big Data”Business
Challenge
Copyright 2007, Information Builders. Slide 3
Data Warehousing Challenges
Limited Resources
and Budget
More Data,
More Data Sources
010101010101010101010101010
01010101010101010101010101
Real time data
01 1
010101010101010101010
0101010101010101010101010
Multiple databases
More Kinds of Output
Needed by More Users,
More Quickly
1010
01010101010101010101
0101010101010101010
External Sources 01
101
1
10
1
1
1
010 1
0
Labor intensive, heavy
Traditional Data
Warehousing
indexing, aggregations and
partitioning
Hardware intensive:
massive storage; big servers
Copyright 2007, Information Builders. Slide 4
Expensive and complex
IT Manager’s try to mitigate these response times …..
How Performance Issues are Typically Addressed – by Pace of Data Growth
Tune or upgrade existing databases
66%
Upgrade server hardware/processors
54%
Upgrade/expand storage systems
30%
Upgrade networking infrastructure
21%
32%
0%
20%
44%
High Growth
Low Growth
4%
7%
Don't Know / Unsure
70%
60%
33%
Archive older data on other systems
75%
40%
60%
80%
100%
When organizations have long running queries that limit the business, the
response is often to spend much more time and money to resolve the
problem
Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
Limitations of “Traditional” Solutions
Adding indexes:
Increases disk space requirements
Sum of index space requirements can even exceed the
source DB
Index Management
Increases load times to build the index
Predefines a fixed access path
Reports run slow if you haven’t “anticipated” the
reporting needs correctly
Limitations of “Traditional” Solutions
Building OLAP Cubes:
Cube technology has limited scalability
Number of dimensions is limited
Amount of data is limited
Cube technology is difficult to update (add Dimension)
Usually requires a complete rebuild
Cube builds are typically slow
New design results in a new cube
Reports run slow if you haven’t “anticipated” the
reporting needs correctly
Pivoting Your Perspective:
Turn Row-based into Column-based
Copyright 2007, Information Builders. Slide 8
Why is Row-based Limiting for Analytics?
The Ubiquity of Rows …
30 columns
Row-based databases are
ubiquitous because so many
of our most important business
systems are transactional.
50
millions
Rows
Row-oriented databases
are well suited for
transactional environments,
such as a call center where a
customer’s entire record is
required when their profile
is retrieved and/or when fields
are frequently updated.
But - Disk I/O becomes a substantial limiting factor since
a row-oriented design forces the database to retrieve all
column data for any query.
Why is Column-based Perfect for Analytics?
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Row Oriented
(1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)
Works well if all the columns are needed for every query.
Efficient for transactional processing if all the data for the row is available
Column Oriented
(1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)
Works well with aggregate results (sum, count, avg. )
Only columns that are relevant need to be touched
Consistent performance with any database design
Allows for very efficient compression
Why is Column-based Perfect for Analytics?
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Data stored in rows
Data stored in columns
1
Smith
New York
50,000
1
Smith
New York
50,000
2
Jones
New York
65,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
4
Fraser
Boston
70,000
Introducing Hyperstage
Copyright 2007, Information Builders. Slide
12
Introducing WebFOCUS Hyperstage ….
Hyperstage is a high performance
analytic data store designed to
handle business-driven queries on
large volumes of data—with minimal
IT intervention—achieving
outstanding query performance,
with less hardware, no database
tuning and easy migration.
Introducing WebFOCUS Hyperstage ….
But really…
What is it?
Easy to implement and manage, Hyperstage provides the
answers to your business users’ needs at a price you can afford.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage combines
a columnar database
with intelligence we
call the Knowledge
Grid to deliver fast
query responses.
Knowledge Grid
Compressor
Bulk
Loader
Unmatched Administrative Simplicity:
• No indexes
• No data partitioning
• No materialized views
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage adds data
compression of 10:1
to 40:1 so you can
manage large
amounts of data using
much smaller disk
footprint.
Knowledge Grid
Compressor
Bulk
Loader
Powerful Data compression:
• Store terabytes of data with only
gigabytes of disk space
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage adds a
bulk loader plus an
easy to use extraction
and load tool, called
HyperCopy, making
data loading a breeze.
Knowledge Grid
Compressor
Bulk
Loader
Includes embedded ETL:
• Easy and seamless migration of existing
analytical databases
• No change in query or application
required
How Does it Work?
Copyright 2007, Information Builders. Slide
18
WebFOCUS Hyperstage Engine
How does it work?
Column Orientation
Smarter
Architecture
Knowledge Grid – statistics
and metadata “describing”
the super-compressed data
No maintenance
No query planning
No partition schemes
Easy “load and go”
Data Packs – data
stored
in manageably sized,
highly compressed
data packs
Data compressed
using algorithms
tailored to
data type
Data Organization and the Knowledge Grid ….
Data Packs and Compression
Data Packs
64K
Each data pack contains 65, 536 data values
Compression is applied to each individual data pack
The compression algorithm varies depending on data
64K
type and data distribution
Compression
Results vary depending on the
64K
64K
Patent Pending
Compression
Algorithms
distribution of data among data
packs
A typical overall compression
ratio seen in the field is 10:1
Some customers have seen
results have been as high as 40:1
Data Organization and the Knowledge Grid ….
Data Pack Nodes (DPN)
A separate DPN is created for every data pack created
in the database to store basic statistical information
Character Maps (CMAPs)
Every Data Pack that contains text creates a matrix
that records the occurrence of every possible ASCII
character
Histograms
Histograms are created for every Data Pack that
contains numeric data and creates 1024 MIN-MAX
intervals.
Pack-to-Pack Nodes (PPN)
PPNs track relationships between Data Packs when
tables are joined. Query performance gets better as
the database is used.
This knowledge grid layer = 1% of the compressed volume
WebFOCUS Hyperstage Example: Query and Knowledge Grid
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: salary > 50000
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: age<65
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: job = ‘shipping’
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
salary age
job city
1. Find the Data Packs with salary > 50000
2. Find the Data Packs that contain age < 65
3. Find the Data Packs that have job = ‘shipping’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: city = ‘Louisville’
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
1.
2.
3.
4.
salary age
job city
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Eliminate Pack Rows
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
1.
2.
3.
4.
5.
salary age
job city
All packs
ignored
All packs
ignored
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
Eliminate All rows that have been flagged as
irrelevant
All packs
ignored
All values match
Completely Irrelevant
Suspect
WebFOCUS Hyperstage Example: Decompress and scan
SELECT count(*) FROM employees
WHERE salary > 50000
AND age < 65
AND job = ‘Shipping’
AND city = ‘Louisville’;
Find the Data Packs with salary > 50000
Find the Data Packs that contain age < 65
Find the Data Packs that have job = ‘shipping’
Find the Data Packs that have city = ‘Louisville’
Eliminate All rows that have been flagged as
irrelevant
6. Finally we identify the pack that needs to be
decompressed
salary age
job city
All packs
ignored
All packs
ignored
1.
2.
3.
4.
5.
All packs
ignored
Only this pack will
be de-compressed
All values match
Completely Irrelevant
Suspect
Hyperstage –
So what’s the big deal?
Copyright 2007, Information Builders. Slide
29
WebFOCUS Hyperstage
The Big Deal…
No indexes
No partitions
No views
No materialized aggregates
Value proposition
Low IT overhead
Reduced I/O = faster response times
Ease of implementation
Fast time to market
Less Hardware
Lower TCO
“Load and Go”
Some Real World Results
Insurance Company
Query performance issues with SQL Server - Insurance claims
analysis
Compression achieved 40:1
Most queries running 3X faster in Hyperstage
Large Bank
Query performance issues with SQL Server - Web traffic analysis
Compression achieved 10:1
Queries that ran for 10 to 15 mins in SQL Server ran in sub-seconds
in Hyperstage
Government Application
Query performance issues with Oracle – Federal Loan/Grant
Tracking
Compression achieved 15:1
Queries that ran for 10 to 15 minutes in Oracle ran in 30 seconds in
Hyperstage
31
Demonstration …
Copyright 2007, Information Builders. Slide
32
Q&A
Copyright 2007, Information Builders. Slide
33