Generic Information Builders' Presentation Template
Download
Report
Transcript Generic Information Builders' Presentation Template
WebFOCUS Hyperstage Overview
Peter Azzarello
April 11, 2012
IB Toronto User Forum
Summit 2012
WebFOCUS
Higher Adoption & Reuse with Lower TCO
Visualization
& Mapping
Mobile
Applications
Data Updating
Predictive
Analytics
Enterprise
Search
High Performance
Data Store
Performance
Management
Reporting
Query &
Analysis
MS Office &
e-Publishing
Dashboards
Information
Delivery
Business to
Business
Data Warehouse
& ETL
Data Profiling &
Data Quality
Master Data
Management
Business Activity
Monitoring
Extensions to the
WebFOCUS
platform allow you
to build more
application types at
a lower cost
The Business
Challenge
Big Data
Copyright 2007, Information Builders. Slide 3
Today’s Top Data-Management Challenge
Big Data and Machine Generated Data
Machine- Generated
Data
Data
Storage
Human-Generated
Data
Time
IT Manager’s try to mitigate these response times …..
How Performance Issues are Typically Addressed – by Pace of Data Growth
Tune or upgrade existing databases
66%
Upgrade server hardware/processors
54%
Upgrade/expand storage systems
30%
Upgrade networking infrastructure
21%
32%
0%
20%
44%
High Growth
Low Growth
4%
7%
Don't Know / Unsure
70%
60%
33%
Archive older data on other systems
75%
40%
60%
80%
100%
When organizations have long running queries that limit the business, the
response is often to spend much more time and money to resolve the
problem
Source: KEEPING UP WITH EVER-EXPANDING ENTERPRISE DATA ( Joseph McKendrick Unisphere Research October 2010)
Data Warehousing Challenges
Limited Resources
and Budget
More Data,
More Data Sources
010101010101010101010101010
01010101010101010101010101
Real time data
01 1
010101010101010101010
0101010101010101010101010
Multiple databases
More Kinds of Output
Needed by More Users,
More Quickly
1010
01010101010101010101
0101010101010101010
External Sources 01
101
1
10
1
1
1
010 1
0
Labor intensive, heavy
Traditional Data
Warehousing
indexing, aggregations and
partitioning
Hardware intensive:
massive
storage;
big
servers
Copyright 2007,
Information
Builders.
Slide 6
Expensive and complex
Data Warehousing Challenges
New Demands:
Larger transaction volumes driven by the internet
Impact of Cloud Computing
More -> Faster -> Cheaper
Data Warehousing Matures:
Near real time updates
Integration with master data management
Data mining using discrete business transactions
Provision of data for business critical applications
Early Data Warehouse Characteristics:
Integration of internal systems
Monthly and weekly loads
Heavy use of aggregates
Classic Approaches to deal with Large Data
INDEXES
CUBES/OLAP
Limitations of Indexes
Increased Space requirements
Sum of Index Space requirements can exceed the source
DB
Index Management
Increases Load times
Building the index
Predefines a fixed access path
Limitations of OLAP
Cube technology has limited scalability
Number of dimensions is limited
Amount of data is limited
Cube technology is difficult to update (add Dimension)
Usually requires a complete rebuild
Cube builds are typically slow
New design results in a new cube
Easy Migration to Hyperstage
Most cubes will be fed from a relational source
Common that relational source is a star schema
The source star schema can be migrated directly to
Hyperstage
WebFOCUS metadata can be used to define hierarchies and
drill paths to navigate the star schema
Pivoting Your Perspective:
Columnar Technology ….
Copyright 2007, Information Builders. Slide
12
The Limitation of Rows
These Solutions Contribute to Operational Limitations
1. Impediments to business agility: Organizations often must wait for DBAs to
create indexes or other tuning structures, thereby delaying access to data. In
addition, indexes significantly slow data-loading operations and increase the size of
the database, sometimes by a factor of 2x.
2. Loss of data and time fidelity: IT generally performs ETL operations in batch
mode during non-business hours. Such transformations delay access to data and
often result in mismatches between operational and analytic databases.
3. Limited ad hoc capability: Response times for ad hoc queries increase as the
volume of data grows. Unanticipated queries (where DBAs have not tuned the
database in advance) can result in unacceptable response times, and may even fail
to complete.
4. Unnecessary expenditures: Attempts to improve performance using hardware
acceleration and database tuning schemes raise the capital costs of equipment and
the operational costs of database administration. Further, the added complexity of
managing a large database diverts operational budgets away from more urgent IT
projects.
The Limitation of Rows
The Ubiquity of Rows …
30 columns
Row-based databases are
ubiquitous because so many
of our most important business
systems are transactional.
50
millions
Rows
Row-oriented databases
are well suited for
transactional environments,
such as a call center where a
customer’s entire record is
required when their profile
is retrieved and/or when fields
are frequently updated.
But - Disk I/O becomes a substantial limiting factor since
a row-oriented design forces the database to retrieve all
column data for any query.
Pivoting Your Perspective: Columnar Technology
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Row Oriented
(1, Smith, New York, 50000; 2, Jones, New York, 65000; 3, Fraser, Boston, 40000; 4, Fraser, Boston, 70000)
Works well if all the columns are needed for every query.
Efficient for transactional processing if all the data for the row is available
Column Oriented
(1, 2, 3, 4; Smith, Jones, Fraser, Fraser; New York, New York, Boston, Boston, 50000, 65000, 40000, 70000)
Works well with aggregate results (sum, count, avg. )
Only columns that are relevant need to be touched
Consistent performance with any database design
Allows for very efficient compression
Pivoting Your Perspective: Columnar Technology
Employee Id
Name
Location
Sales
1
Smith
New York
50,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
Data stored in rows
Data stored in columns
1
Smith
New York
50,000
1
Smith
New York
50,000
2
Jones
New York
65,000
2
Jones
New York
65,000
3
Fraser
Boston
40,000
3
Fraser
Boston
40,000
4
Fraser
Boston
70,000
4
Fraser
Boston
70,000
Introducing
WebFOCUS Hyperstage
Copyright 2007, Information Builders. Slide
17
The Hyperstage Mission
Improve database performance for
WebFOCUS applications with less
hardware, no database tuning and easy
migration.
Introducing WebFOCUS Hyperstage ….
What is it?
The WebFOCUS Hyperstage high performance analytic data
store is designed to handle business-driven queries on large
volumes of data—without IT intervention. Easy to implement
and manage, Hyperstage provides the answers to your
business users need at a price you can afford.
Introducing WebFOCUS Hyperstage ….
How is it architected?
Hyperstage Engine
Hyperstage combines
a columnar database
with intelligence we
call the Knowledge
Grid to deliver fast
query responses.
.
Knowledge Grid
Compressor
Bulk
Loader
•
Unmatched Administrative Simplicity
• No Indexes
• No data partitioning
• No Manual tuning
Introducing WebFOCUS Hyperstage ….
What does this mean for Customers?
Self-managing: 90% less administrative effort
Low-cost: More than 50% less than alternative
solutions
Scalable, high-performance: Up to 50 TB using a
single industry standard server
Fast queries: Ad-hoc queries are as fast as
anticipated queries, so users have total flexibility
Compression: Data compression of 10:1 to 40:1
that means a lot less storage is needed, it might
mean you can get the entire database in memory!
Introducing WebFOCUS Hyperstage ….
How does it work?
Create Information
(Metadata) about the data,
and, upon Load,
automatically …
o
o
o
Stores it in the Knowledge Grid (KG)
KG Is loaded into Memory
Less than 1% of compressed data Size
Uses the metadata when
Processing a query to
Eliminate / reduce need to
access data
o
The less data that needs to be accessed,
the faster the response
Sub-second responses when answered by KG
o
o
Architecture Benefits
o
No Need to partition data, create/maintain indexes
projections, or tune for performance
Ad hoc queries are as fast as static queries,
so users have total flexibility
WebFOCUS Hyperstage Runtime Architecture
WebFOCUS
WebFOCUS
Pro Server
Hyperstage
Adapter
WebFOCUS Server
Hyperstage Engine
Knowledge Grid
Compressor
Bulk
Loader
Hypercopy
MySQL
Hyperstage Server
WebFOCUS Hyperstage Engine
How does it work?
Column Orientation
Smarter
Architecture
Knowledge Grid – statistics
and metadata “describing”
the super-compressed data
No maintenance
No query planning
No partition schemes
No DBA
Data Packs – data
stored
in manageably sized,
highly compressed
data packs
Data compressed
using algorithms
tailored to
data type
Summary
Copyright 2007, Information Builders. Slide
26
Business Intelligence – Meeting Requirements
Copyright 2007, Information Builders. Slide
27
WebFOCUS Hyperstage
The Big Deal…
No indexes
No partitions
No views
No materialized aggregates
Value proposition
Low IT overhead
Allows for autonomy from IT
Ease of implementation
Fast time to market
Less Hardware
Lower TCO
No DBA
Required!
What’s it look like?
What’s it look like?
Pay no attention to that man behind the curtain.
CREATE FILE baseapp/pa_inventory_ind_t DROP
-RUN
BULKLOAD baseapp/pa_inventory_ind_t FOR SQLINLD INV_CODE; TYPE; CATEGORY; NAME; MODEL;
MEASURE1_INV; MEASURE2_INV; MEASURE3_INV;
JOIN
SYMBOLS.SYMBOLS.SYMBOL IN SYMBOLS TO MULTIPLE QUOTES_2B.QUOTES_2B.SYMBOL
IN QUOTES_2B TAG J0 AS J0
END
TABLE FILE SYMBOLS
PRINT
SYMBOL CLOSE_DATE CLOSE_PRICE VOLUME OPEN_PRICE
WHERE ( SYMBOL EQ '&SYMBOL.(<MSFT,MSFT>).SYMBOL.' ) AND ( CLOSE_DATE GT '&START_DATE.(<2000-0301,2000-03-01>).yyyy-mm-dd.' ) AND ( CLOSE_DATE LT '&END_DATE.(<2000-03-31,2000-03-31>).yyyy-mm-dd.' );
ON TABLE SET PAGE-NUM NOLEAD
ON TABLE NOTOTAL
ON TABLE PCHOLD FORMAT HTML
ON TABLE SET HTMLCSS ON
ON TABLE SET STYLE *
INCLUDE = endeflt,
$
ENDSTYLE
END
Example – Focus to Hyperstage Compression
243639 Rows
Q&A
Copyright 2007, Information Builders. Slide
33
STAR SCHEMA
CONSIDERATIONS
Leverage the Knowledge Grid
•
•
•
•
Do constrain the fact table directly
Do use sub-selects instead of joins
Do use date based constraints as
much as possible
Do add additional columns to
create useful knowledge nodes
Everyone wants to be a Star
Adding as many WHERE conditions as you can to your SQL
increases the chance that knowledge grid statistics can be
used to increase the performance of your queries.