Using_Big_Data_To_Your_Advantage
Download
Report
Transcript Using_Big_Data_To_Your_Advantage
Source: http://blog.questionpro.com/2012/12/24/market-research-trends-2013-big-data/
Using Big Data to your Advantage
It’s not just about toy elephants anymore…
March 19, 2013
John Repko – [email protected]
John Repko -- Pikasoft LLC
So How Did We Get to Big Data Anyway?
Source: https://thedailyload.files.wordpress.com/2010/12/william_perry.jpg
Source: http://www.startribune.com/sports/164830346.html
Big Data Is Not Just About “Big” Data … It’s About FAST Data!
(http://www.pikasoft.com/journal/2011/5/13/not-big-data-fast-data.html)
John Repko -- Pikasoft LLC
2
There Are Big Data Breakthroughs Everywhere…
I’ve Heard About Big Data Successes…
“Watson” Wins on Jeopardy
Beat the best Jeopardy players ever
John Repko -- Pikasoft LLC
Google Wins
the Search
Market
Progressive’s
Instant “Overnight”
rate quotes
Massively parallel
web searches with
results back in a
tenth of a second
Progressive creates
an insurance quote
for every car and
truck in the US –
every night
3
How Can I Determine If These Big Data Wins Apply to My Business?
Source: http://www.beingjavaguys.com/2013/01/what-is-big-data-introduction-and.html
• Where do I put the data?
• How do I load the system?
• How do I find the value in the data?
• How do I present it?
• How long is this going to take?
• How much is this going to cost?
You Need A Proven Approach to Finding the Value in Your Data
John Repko -- Pikasoft LLC
4
The Key is to Recognize That There IS a Pattern to Big Data Wins
The Variety Of Big Data Wins In The Press Fall Into Just Two Solution Patterns
• Foresight
– We are presented a pattern – What has the
outcome been when we’ve seen similar
patterns in the past?
• Hindsight
– We are presented an outcome -- What pattern of
events anticipated the outcome in the past?
We Don’t Need Dozens Of Solution Approaches For Big Data – Just Two
John Repko -- Pikasoft LLC
5
Big Data Wins – Not “10 Problems” But Only 2
In This Light, Let’s Take A Look At The “10 Hadoop-able Problems”
Summary – 10 Common Hadoop-able Problems*
1. Modeling True Risk
•
What past patterns led to success or default?
2. Customer Churn Analysis
•
What do customer churn patterns predict about our products and markets?
3. Recommendation Engine
•
We have search terms – what have the results been from similar searches in the past?
4. Ad Targeting
•
We have profile information – what offers have led to sales for similar profiles in the past?
5. PoS Transaction Analysis
•
We have your purchase history – what deals might we offer in the future?
* http://info.cloudera.com/TenCommonHadoopableProblemsWhitePaper.html
Foresight
John Repko -- Pikasoft LLC
Hindsight
6
Big Data Wins – Not “10 Problems” But Only 2
These Two Solution Types Apply Generally To The Hadoop-able Problems
Summary – 10 Common Hadoop-able Problems*
6. Analyzing Data Logs to Forecast Events
•
We have your logs – what pattern of events have anticipated failures before?
7. Threat Analysis
•
We have a specific event – what results have we seen from similar threats in the past?
8. Trade Surveillance
•
Does this parcel raise any alarms, based on our history of past parcel-tracking?
9. Search Quality
•
We have a set of search terms – what have similar searches succeeded in finding in the past?
10. Data “Sandbox”
•
We have your data, possibly unstructured data. What patterns in that data might we bring to your
attention now?
Foresight
John Repko -- Pikasoft LLC
Hindsight
7
Data Warehouse Advanced Analytics Is Expensive and Generally Restricted To
Structured Data
•
According to Gartner, Enterprise Data will grow 650% by 2014. 85% of
these data will be “unstructured data”, with a CAGR of 62% per year, far
larger than transactional data
•
Growth is taking place in areas not well served by RDBMS’s and DW’s
Structured:
Managed by
RDBMS & DW
Unstructured:
Growth Areas Not
Managed well by
RDBMS or DW
Source http://www.vertica.com/writable/knowledge_articles/file/bi_vertica.pdf: http://thecloudtutorial.com/hadoop-tutorial.html
John Repko -- Pikasoft LLC
8
The Tremendous Growth Of Data Is In Unstructured Data That Is Best Managed
Outside The RDBMS
Unstructured:
Not Managed by
RDBMS or DW
Structured:
Managed by RDBMS
or DW
John Repko -- Pikasoft LLC
9
The New Areas Of Non-RDBMS Managed Data Are Rich In Business Value And
Are Ripe For Analysis
Unstructured:
Not Managed by
RDBMS or DW
Structured:
Managed by
RDBMS
John Repko -- Pikasoft LLC
10
Big Data Stores Are Increasingly Architected With Open-Source Tools
Map
Reduce
Languages
Map
Reduce
Engine
Higher-level wrapper languages
which simplify Map Reduce
development efforts.
Processes (‘Map’ and ‘Reduce’
functions) which analyze very large
datasets across distributed systems.
NoSQL
Data Store
Datasets structured as columnar,
key-value, or document-based in
order to overcome limitations in
traditional relational modeling for
‘Big’ datasets.
Data
Integration
Tools which extract, transform, and
load data between Relational and
Non-Relational datasets.
John Repko -- Pikasoft LLC
Cloud
MapReduce
11
You Have Data. Here’s What You Need to Unlock It
Needs
•
Load the data in a system
equipped with the tools to
analyze it
– Via a standard interface, or
– Programmatically
Requirements
• The system has to live where the data lives (otherwise
transmission costs become prohibitive)
• REST or SOAP are the most common interfaces
• Bloom Filters can provide set operations in large data sets
•
Determine valid relationships in
the data
•
Analyze the data for these
common patterns
•
Tune the analytics
• ORM (Object-Relational Management) simplifies data access
• Hadoop provides parallelized analysis for unstructured data
• Starfish provides automatic analytics tuning for Hadoop
•
Visualize the results
•
Pursue the patterns that emerge
• Structured data can be analyzed via statistical analysis (for
numbers) or free-text search (for text)
• Solution patterns can be applied automatically once the data is
sandboxed
• Visualization can help to grasp the key patterns and results
The Right Platform Can Meet All Of These Requirements
John Repko -- Pikasoft LLC
12
Additional Tools: With a Platform for Big Data, We Can Expand Our Analysis
with Rich Analytics Tools
Key Big Data Analytics Solution Patterns
1.
Predictive Modeling
4.
Outlier Analysis
2.
Data Visualization
5.
AB Testing
3.
Cluster Partitioning
6.
Markov Chains
Source: http://www.cognizant.com/InsightsCognizantiarticles/Cognizanti_Sow'sEar_Analytics.pdf
These Patterns Provide Straightforward Way to Finding Big Data Wins –
Here’s How
John Repko -- Pikasoft LLC
13
Big Data And Classic Analysis Patterns Are Creating A New Class Of Enterprise
Applications
Data Sources
Data Processing
Data Presentation
Public Data Sets on AWS
Google Chart Tools
These Offerings Emerged In The Consumer Domain And
Enterprise Users Are Coming To Have Similar Expectations
John Repko -- Pikasoft LLC
14
But New Applications Will Remain Just Curiosities, “One-Offs” Unless The
Underlying Patterns Are Drawn Out
•
There’s Nothing New Here: Hadoop is Turing-complete, as are most general-purpose
processing and analytics packages
•
To provide richer insights, tools like Hadoop need more advanced processing patterns:
Basic Patterns
Filtering | Parsing | Counting/Summing | Collating | Sorting | Distributed Tasks | Chained Jobs
Advanced Patterns
Distinct | Group By | Secondary Sorts | Joins | Distributed Sorting
Leading-Edge Work
Classification | Clustering | Regression | Dimension Reduction | Evolutionary Code
To See More Advanced Patterns and Richer Presentation, The Basic
Patterns Must First Become Routine
John Repko -- Pikasoft LLC
15
Software Will Capture the Value of Intellectual Property
•
Pure services companies generally yield a company valuation of 0.5 to 1.0x Annual Revenue
•
Recurring revenue businesses (hosting, support) typically generate 2.5 – 4.0x Revenue
•
Product businesses derive their multiples from: growth, product margin, network effects, customer
lock-in, and ecosystem effects) – with a good product, valuations of > 5X Revenue are possible
2012 Internet Company Valuations as %Revenue
http://abovethecrowd.com/wp-content/uploads/2011/05/pr_mults.png
John Repko -- Pikasoft LLC
17
Capturing Trends – Where Is the IT Industry Headed?
IT Product Breakthroughs Happen When Technology Advances Invalidate “Old”
Product Assumptions. Here Are The Principal Areas Where Old Assumptions
Will Be Obsoleted.
• 5 major trends
– Big Data:
Big Data Just Beginning to Explode
– Cloud:
Cloud Computing Market Size – Facts and Trends
– In-Memory:
The Coming In-Memory Database Tipping Point
– Handheld:
Five Emerging Trends in Analytics
– Real-time:
Organization
Using Analytics to Create a Sense-and-Respond
John Repko -- Pikasoft LLC
18
Capturing Trends – Why Bother? Who Cares?
•
Big Data:
–
–
•
Cloud:
–
•
Spinning disk is "the new tape" (overflow, recovery)
Handheld:
–
•
Even PCI and HIPAA data is evolving into cloud-hosted models
In-Memory:
–
•
According to Michael Stonebraker and Jeremy Kepner the future of Hadoop is doomed
According to Mike Miller of Cloudant the days are numbered for Hadoop as we know it
Mobile Internet devices will outnumber humans this year, Cisco predicts
Real-time:
–
Future of computing technology belongs to handheld devices
“You can’t just ask customers what they want and then try to give that to them. By the time you get it built,
they’ll want something new. It took us three years to build the NeXT computer. If we’d given customers what
they said they wanted, we’d have built a computer they’d have been happy with a year after we spoke to
them — not something they’d want now.”
~ Steve Jobs
John Repko -- Pikasoft LLC
19
The Cloud Provides a Platform For Do It Yourself Analytics
•
•
•
Why the cloud matters
–
Analytics cannot be “do it yourself” until everyone has access to a platform suitable for
holding and processing Big Data.
–
Only the cloud has the scale, speed, and availability to process Big Data universally
What it gives us that is unique and differentiating
–
Big Data projects today are 1) expensive, 2) long lead-time, and 3) run on masses of
local hardware. With inevitable commoditization this has to change.
–
The trend is to “do it yourself” analytics – if we build the ability to give do it yourself
analytics, applications will appear that were inconceivable before the environment was
created
What we need to make happen
–
Robustness –at least 3-nines of availability and zero data loss
–
Security – starting with things like 5 Ways Amazon Web Services Protects Cloud Data
–
Privacy – where it begins: Complying to the Higher Standard
John Repko -- Pikasoft LLC
20
Handhelds Make Analytics Available Everywhere
•
•
•
Why handheld client delivery matters
–
There are now more smartphones than client PCs
–
More than 25% of users use smartphones for their primary web access
–
The future of internet computing is mobile
What it gives us that is unique and differentiating
–
Hadoop is dreadfully mismatched with handheld access (batch, no standard client or
reporting interface)
–
Coming in-memory databases (HANA, Vertica, VoltDB) will provide a much-better
mesh with handheld
What we need to make happen
–
Make handheld our primary target UI (design for thumbs, not mice … and more)
–
Target do-it-yourself analytics use cases
John Repko -- Pikasoft LLC
21
Real-time Makes Previously Unthinkable Apps Possible
•
•
Why real-time matters
–
Users increasingly expect real-time analytics
–
The first wave of real-time analytics tools is becoming available
What it gives us that is unique and differentiating
–
"Self-service" analytics
–
Intuitive and unconstrained data exploration
–
Instant visualization of complex datasets
–
Viable plays for a variety of asset types
• Credit card debt, Student load debt, Properties, Insurance, etc.
•
What we need to make happen
–
If Hadoop – we must evolve to interactive batch execution (or overnight batch, like
Progressive Insurance)
–
If In-memory DB– need to select and groom a handheld interface and design for sub100ms response times
John Repko -- Pikasoft LLC
22
Beyond Big Data – The Emerging Big Data Tech Platform
Here’s Where Our World Is Headed
For what?
Lumpenprogramming
By whom?
Report Specialists
Reports
Stored where?
RDBMS
Processed where?
On-Premise
How?
Structured Data
When?
Batch
What Happened?
John Repko -- Pikasoft LLC
Tomorrow
Data Scientists
Everyone
Hindsight
What?
With what?
Today
Foresight
Data
Warehouses
Big Data
DIY
Analytics
In-Memory RDBMS
Distributed
DWs
Cloud
Big Data
Hadoop Batch
Why Did That Happen?
Universal Data
Always
What’s Next?
23
The Future: Here’s What The Evolution Looks Like
Trend
Big Data
Development Initiatives
• APIs. No one is likely to reach a market with Big Data analytics fronted Open territory! Infochimps has
Level 1, Amazon (Elastic
by their own UI. Success will come from API links to
•
•
•
Cloud
Who’s Doing It
Level 1: REST Access API
Level 2: Plug-in API
Level 3: Runtime environment
• All of the Cloud players are investigating DB-rich offerings
• VoltDB options with AWS High IO option
• “38% of all companies are planning a BI SaaS project before the end of 2013.”
Mapreduce) has levels 2 and
3. Who else will play???
Everybody: Amazon,
Rackspace, Heroku ...
Accenture
• SAP / Hana
• HP / Vertica
• other NewSQL players
In-Memory
• Move demo to DAHANA architecture (not hand-coded)
• Select non-HANA in-memory DB (probably VoltDB) as secondary
platform
• Hadoop evolves for a processing platform to an ETL gateway from
unstructured to structured data
Handheld
• Evolving UIs with HTML5 + JQuery Mobile
• Reporting platforms increasingly offer mobile interfaces
• Review Big Data interfaces to IPad and Android devices
Two principal camps -- Apple
IOS and Android
Real-Time
• Investigate CDN options for Big Data deployment
• Confirm DB performance on buffer pool, locking, latching, recovery
• Design for sub-100ms delivery
Just getting started...
John Repko -- Pikasoft LLC
24
Today’s Tools: The Killer Apps
Today’s Killer Apps: Recommendation Engine For Enhanced Retail Marketing
•
Vision:
– Target Audience: Product Executives
– Anticipated Benefit: Keep up with
market leader Amazon, build up-sell and
cross-sell revenue
– Delivered Benefit: Better market
segmentation, enhanced revenue through
“customers who bought xxx also bought...”
recommendations.
– Alternatives: CRM recommendations do
not draw on deep sense of customer intent
– Why It Kills: Provable revenue growth
through A-B testing
John Repko -- Pikasoft LLC
• How to Implement It:
– Proof of Concept: Small cloud-based recognition
engine, based on readily-available (customer profile,
purchase history) data stores
– Initial Rollout: Still cloud-based, but with broader
streams (e.g. search histories) and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a
limited basis) into retailing process and systems
– Full Rollout: Could be cloud or in-house, but moving to
richer streams and real-time (i.e. in-memory) data access
– Maintenance: Tools updates, streams updates,
transition to real-time data access
25
Today’s Tools: The Killer Apps
Today’s Killer Apps: Analysis and Prediction Engine
• Vision:
• How to Implement It:
– Target Audience: High end retailers with profitable
– Proof of Concept: Small cloud-based run with limited data
service contracts (e.g. computers, cameras, sound
systems)
sets to confirm data adoption approaches and identify most
profitable segments in that sub-population
– Anticipated Benefit: Increase penetration rate of
service contracts by pre-calculating terms in advance of
sale or service renewal
– Delivered Benefit: Reward customer with historically
low service costs, and increase penetration of profitable
service deals by pre-calculation of ideal rates
– Alternatives: Consumers generally know one-size-fits-
– Initial Rollout: Still cloud-based, but with larger data sets
and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a limited
basis) promotions and target marketing
– Full Rollout: Could be cloud or in-house, but moving to
larger data stores, real-time (i.e. in-memory) data access and
notifications across the full customer set
all service contracts are overpriced. If you can’t fit the
terms to the customer then you can’t complete the service
– Maintenance: Tools updates, stores updates, transition to
contract
real-time data access and notifications
– Why It Kills: Big data approach pre-calculates
appropriate terms for all customers in advance of a sales
or renewal transaction
John Repko -- Pikasoft LLC
26
Today’s Tools: The Killer Apps
Today’s Killer Apps: Log Analysis Engine
• Vision:
• How to Implement It:
– Target Audience: Utilities executives
– Proof of Concept: Small cloud-based run with limited
– Anticipated Benefit: Sell a energy or utilities
package that better fits customer interests and reduces
customer costs while increasing energy/utility margins
– Delivered Benefit: Customer gets a package that
better fits their specific interests (e.g. “green”) and exec
sells higher-margin offerings
– Alternatives: One size plan fits all does not capture
customer interests or delivery high-margin offerings well
– Why It Kills: More customized packages better fit
customer needs while reducing capital expenses and
increasing margins for the utility
data sets to capture basic patterns and confirm data
adoption approaches
– Initial Rollout: Still cloud-based, but with larger data
stores and dynamic updates
– Test and Customer Acceptance: Pilot program with
configuration from the Initial Rollout, but now tied (on a
limited basis) into production logs with reporting
– Full Rollout: Could be cloud or in-house, but moving to
larger data stores, real-time (i.e. in-memory) data access
and notifications
– Maintenance: Tools updates, stores updates, transition
to real-time data access and notifications
John Repko -- Pikasoft LLC
27
Summary
This Is Only The Beginning. With A Standard
Platform We’ll See Richer Big Data
Discoveries Become Routine
The Solution Tools (Slide 13) Become
Straightforward if We Run Them on a
Standard Architecture
“One man’s noise is another man’s data.”
~ Bill Stensrud - InstantEncore
John Repko -- Pikasoft LLC
29
Contacts
•
John Repko:
[email protected] - (720) 624-6025
https://pikasoft.s3.amazonaws.com/Using_Big_Data_To_Your_Advantage.ppt
John Repko -- Pikasoft LLC
30