The New Analytical Ecosyste... 5205KB Feb 10 2014 12:05:35 PM

Download Report

Transcript The New Analytical Ecosyste... 5205KB Feb 10 2014 12:05:35 PM

The New BI Ecosystem:
How Big Data Merges Top Down and Bottom
up Computing
Wayne W. Eckerson
Director of Research and Founder
Founder, BI Leadership Forum
Agenda
• Big data platforms
–Relational databases
–Analytical databases
–Hadoop
• New analytical ecosystem
2
What comes next?
• Kilobyte (KB)
• Megabyte (MB)
• Gigabyte (GB)
• Terabyte (TB)
• Petabyte (PB)
• Exabyte (EB)
• Zettabyte (ZB)
• Yottabyte (YB)
– 103 bytes
–106 bytes
– 109 bytes
–1012 bytes
– 1015 bytes
– 1018 bytes
– 1021 bytes
– 1024 bytes
3
What is “big data”?
Data
Systems
Movement
Yes!
a)
b)
c)
d)
e)
f)
g)
h)
i)
Lots of data
Different types of data
More data than you can handle
Purpose-built analytical systems
Distributed file system
New staging area and archive
A Java developer’s employment act
A replacement for the RDBMS
A club for hip data people
Information explosion
Unstructured &
Content Depot
Structured &
Replicated
Source: IDC Digital
Universe 2009; White
Paper, Sponsored by
EMC, May 2009
2005
2006
2007
2008
2009
2010
2011
2012
Every 18 months, non-rich structured and unstructured enterprise
data doubles
5
Data deluge
• Structured data
– Call detail records
– Point of sale records
– Claims data
• Semi-structured data
– Web logs
– Sensor data
– Email, Twitter
• Unstructured data
– Video, Audio,
– Images, Text
“A Sea of Sensors”, The Economist, Nov 4, 2010
6
From transactions to observations
Structured 
Semi-Structured 
7
Unstructured
Three big data platforms (systems)
• General purpose relational database
• Analytical database
• Hadoop
8
1. General purpose RDBMS
- Powers first generation DW
Benefits:
- RDBMS already inhouse
- SQL-based
- Trained DBAs
Operational
System
Operational
System
ETL
Warehouse
DataData
Warehouse
ETL
Data
Mart
Operational
System
Operational
System
Challenges:
- Cost to deploy and upgrade
- Doesn’t support complex analytics
- Scalability and performance
9
BI
Server
Reports /
Dashboards
2. Analytical platforms
1010data
Aster Data (Teradata)
Calpont
Datallegro (Microsoft)
Exasol
Greenplum (EMC)
IBM SmartAnalytics
Infobright
Kognitio
Netezza (IBM)
Oracle Exadata
Paraccel
Pervasive
Sand Technology
SAP HANA
Sybase IQ (SAP)
Teradata
Vertica (HP)
Purpose-built database management
systems designed explicitly for query
processing and analysis that provides
dramatically higher price/performance
and availability compared to general
purpose solutions.
Deployment Options
-Software only (Paraccel, Vertica)
-Appliance (SAP, Exadata, Netezza)
-Hosted(1010data, Kognitio)
Game-changing technology
• Quicker to deploy
– Preconfigured and tuned
– Fast ROI
• Faster and more scalable
– Faster query response times
– Linear performance
• Built-in analytics
– Libraries of functions
– Extensible SDK
• Less costly
– Less power, cooling, space
– Fewer people to maintain
Business value of analytic platforms
• Kelley Blue Book –
Consolidates millions
of auto transactions
each week to calculate
car valuations
• AT&T Mobility – Tracks
purchasing patterns
for 80M customers
daily to optimize
targeted marketing
Analytical
appliance
Analytical
Database
3. Hadoop
•Ecosystem of open source projects
•Hosted by Apache Foundation
•Google developed and shared concepts
•Distributed file system that scales out on
commodity servers with direct attached
storage and automatic failover.
13
Hadoop distilled: What’s new?
Benefits
Unstructured data
Distributed File
System
Data scientist
BIG
DATA
“Schema at Read”
- Comprehensive
- Agile
- Expressive
- Affordable
Drawbacks
Open Source $$
No SQL
MapReduce
14
- Immature
- Batch oriented
- Expertise
- TCO
Hadoop ecosystem
Source: Hortonworks
Hadoop use cases
• Sabre Holdings
– Analyze airline shopping data
• Vestas
– Site wind turbines by modeling
larger volumes of weather data
• CBS Interactive
– Optimize ad placement and pricing
• Nokia
– Identify new data services
16
Hadoop hype
Overheard
“Hadoop will replace relational
databases.”
“Hadoop will replace data
warehouses.”
“Hadoop has a superior query
engine compared to analytical
platforms.”
Gartner Group – Hype Cycle
17
“Use Hadoop for any application
that requires more than one
node.”
Hadoop adoption rates
No plans
38%
Considering
32%
Experimenting
Implementing
In production
20%
5%
4%
Based on 158 respondents, BI Leadership Forum, April, 2012
18
Hadoop workloads
Today
In 18 Months
Staging area
92%
92%
Online archive
92%
92%
83%
Transformation Engine
58%
Ad hoc queries
42%
Scheduled reports
Visual exploration
67%
25%
Data mining
Based on respondents that19have implemented
Hadoop. BI Leadership Forum, April, 2012
67%
67%
58%
83%
92%
Which platform do you choose?
Hadoop
Analytic Database
General Purpose
RDBMS
Structured 
Semi-Structured 
20
Unstructured
Big data platform comparison
Analytical
Database
Analytics
RDBMS
Purpose
Volume
OLTP
Low
Variety
Hadoop
Moderate
Anything
High
Relational
Relational+
Variable
Access
Latency
Concurrency
SQL
Low
High
SQL+
Moderate
Moderate
Java+
High
Low
Cost per GB
High
DW Hub or
data mart
Moderate
DW or
Sandbox
Low
Staging area
and archive
Role
21
The New BI Ecosystem
22
BI Framework 2020
Business Intelligence
End-User Tools
Reports and Dashboards
Design Framework
MapReduce, XML schema,
Key-value pairs, graph
notation, etc.
HDFS, NoSQL
databses
Keyword search, BI tools,
Xquery, Hive, Java, etc.
Event-driven
Reporting
&
Analysis
Analytic
Analytic
Sandboxes
Sandboxes
Dashboard Alerts
Event-Driven Alerts and
Dashboards
Event detection
and correlation
Data Warehousing
Data Warehousing
CEP, Streams
Content Intelligence
Architecture
Ad hoc query,
Spreadsheets,
Ad hoc
SQL
OLAP, Visual Analysis, Analytic
Workbenches, Hadoop
Excel, Access, OLAP, Data
mining, visual exploration
Analytics Intelligence
23
Exploration
Power Users
Continuous Intelligence
MAD Dashboards
Pros:
- Alignment
-Consistency
Cons:
- Hard to build
- Politically charged
- Hard to change
- Expensive
- “Schema Heavy”
BI Framework
TOP DOWN- “Business Intelligence”
Corporate Objectives and Strategy
Reporting & Monitoring (Casual Users)
Data Warehousing
Architecture
Predefined
Metrics
Reports
Beget
Analysis
Pros:
- Quick to build
- Politically uncharged
- Easy to change
-Low cost
Cons:
- Alignment
- Consistency
- “Schema Light”
Analytics
Architecture
Non-volatile
Data
Analysis
Begets
Reports
Ad hoc
queries
Analysis and Prediction (Power Users)
Processes and Projects
24
Volatile
Data
The new analytical ecosystem
Operational Systems
(Structured data)
Operational
System
Extract, Transform, Load
(Batch, near real-time, or real-time)
Casual User
Streaming/
CEP Engine
Operational
System
Machine
Data
BI
Server
Data Warehouse
Hadoop Cluster
Virtual Sandboxes
Web Data
Audio/video
Data
External
Data
Documents & Text
Dept
Data
Mart
Top-down Architecture
Bottom-up Architecture
Inm em ory
Sandbox
FreeStanding
Sandbox
Analytic platform or nonrelational database
Power User
Analytical sandboxes
Operational Systems
(Structured data)
Operational
System
Extract, Transform, Load
(Batch, near real-time, or real-time)
Casual User
Streaming/
CEP Engine
Operational
System
Machine
Data
BI
Server
Data Warehouse
Hadoop Cluster
Virtual Sandboxes
Web Data
Audio/video
Data
External
Data
Documents & Text
Dept
Data
Mart
Top-down Architecture
Bottom-up Architecture
Inmemory
Sandbox
FreeStanding
Sandbox
Analytic platform or nonrelational database
Power User
Workflows
“Capture only what’s
needed”
Source
Systems
1. Extract, transform, load
Analytical
database
(DW)
“Capture in case
it’s needed”
5. Explore data
9. Report and mine data
6. Parse, aggregate
27
Analytical tools
Recommendations
• Explore applications for multi-structured data
• Apply the right tool for the job
– RDBMS, Analytical platform, Hadoop, NoSQL
• Make power users full-fledged members of your BI
environment
• Reconcile top-down and bottom-up BI environments
 Create an analytical ecosystem!
28
Questions?
•
•
•
•
•
Analytical thought leader
Founder, BI Leadership Forum
Director of Research, TechTarget
Former director of research at TDWI
Author
• Wayne Eckerson
• [email protected]
29