Technology Strategies for Big Data Analytics

Download Report

Transcript Technology Strategies for Big Data Analytics

TECHNOLOGY STRATEGIES
FOR BIG DATA ANALYTICS
BERNARD BLAIS
PRINCIPAL, GLOBAL TECHNOLOGY PRACTICE
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
THE CHALLENGE?
VOLUME
DATA SIZE
VARIETY
VELOCITY
TODAY
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
THE FUTURE
Copyright © 2012, SAS Institute Inc. All rights reserved.
What is Big Data ?
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
 A flexible architecture that supports
many data types and usage patterns
Technology
Checklist for
Big Data
Analytics
 Upstream use of analytics to optimize
data relevance
 Real-time visualization and advanced
analytics to accelerate understanding
and action
 Collaborative approaches to align
Business and IT executives
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGHPERFORMANCE
ANALYTICS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
KEY COMPONENTS
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGHPERFORMANCE
ANALYTICS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® GRID COMPUTING
Copyright © 2012, SAS Institute Inc. All rights reserved.
HIGHPERFORMANCE
ANALYTICS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® IN-DATABASE
Copyright © 2012, SAS Institute Inc. All rights reserved.
HOW DO WE MANAGE DATA IN THE PHYSICAL WORLD?
1. Acquire
2. Determine Relevance
3. Store
Trash
Cache
Copyright © 2012, SAS Institute Inc. All rights reserved.
Storage
HOW DO WE MANAGE INFORMATION IN THE IT WORLD?
Users
Relevance is traditionally
determined at query time . . .
“Acquire, Store, Analyze”
Systems
Queries
Data Acquisition
Data Transformations
Data Normalization
DATA
Copyright © 2012, SAS Institute Inc. All rights reserved.
A Big Data Analytics strategy
requires a new approach . . .
“Stream it, Score it, Store it”
CUSTOMER
CASE STUDY
HIGH-PERFORMANCE ANALYTICS PROCESS
Past Approach
• Daily process begins
with flat file creation at 6:30am
– SLA delivered at ~9:30am.
In-Database Approach
• Daily process begins at
4:00am with EDW load.
Business
Valuedata loaded
• All operational
• File transferred to SQL Server,
limited to ~350K customer
records based on specific
criteria.
directly to EDW. No flat file or
intermediate processing is
needed.
- Scope of customer analysis: 350K vs. 40M
- Monthly collections: $1M-$3M per month
• 300 step process to support
data mining life cycle.
30 MINUTES TO SCORE ~350k
customers
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
• 10 step process
• Scoring and customer
selection done in-database
against ALL customer rows
4 MINUTES TO SCORE ~40M
customers
Copyright © 2012, SAS Institute Inc. All rights reserved.
12
minutes
HIGHPERFORMANCE
ANALYTICS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS® IN-MEMORY ANALYTICS
Copyright © 2012, SAS Institute Inc. All rights reserved.
IN-MEMORY
ARCHITECTURE
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MODEL DEVELOPMENT & DEPLOYMENT
Copyright © 2012, SAS Institute Inc. All rights reserved.
A PUBLIC SECTOR EXAMPLE…
•
Source: NHTSA (USA’s National Highway Traffic Safety Administration)
 Publicly
available information on Driving safety and Vehicle Safety in the
USA
•
Our Data Extract: 700,000 accident records on:

Vehicles: make and models, manufacturing date, purchase date, failures,
mileage, number of cylinders, etc… for each individual vehicles
 Car components (385): air bags, child seats, electrical system, engine, fuel
system, etc
 Accidents: vehicle speed, injuries, deaths
 45,000+ cities and locations
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
WHAT THE DATA LOOKS LIKE…
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
IN-MEMORY
ARCHITECTURE
MODEL DEVELOPMENT & DEPLOYMENT
5½
HRS
82
SECONDS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
IN-MEMORY
ARCHITECTURE
EXPLORATION AND VISUALIZATION
> 1.1 BILLION RECORDS
10
SECONDS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
IN-MEMORY
ARCHITECTURE
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
REPORTING AND MOBILE DISTRIBUTION
Copyright © 2012, SAS Institute Inc. All rights reserved.
CUSTOMER
CASE STUDY
TRADITIONAL ANALYTICS PROCESS
167 Hours
DATA
EXPLORATION
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
MODEL
DEVELOPMENT
MODEL
DEPLOYMENT
Copyright © 2012, SAS Institute Inc. All rights reserved.
CUSTOMER
CASE STUDY
IN-MEMORY ANALYTICS PROCESS
167 Hours
MODEL
DEPLOYMENT
MODEL
DEVELOPMENT
DATA
EXPLORATION
Bottom-line Impact:
Tens of Millions of
Dollars
84
SECONDS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
BEST
PRACTICE
Business Analytics Maturity Assessment
Overview:
Two-day on-site discovery session focused on understanding the client’s business and IT
objectives, key initiatives, existing information management and analytics architecture, top
challenges, and priorities.
Process:
• Review current business requirements, timeframes, critical success factors, and key
business metrics (e.g. customer retention, customer acquisition).
• Review operational data sources to support business priorities.
• Review analytical priorities, strategy, process, and gaps.
Deliverables:
• Technology roadmap to optimize the client’s current and future IT-enabled analytical
process.
• Projected high-level ROI analysis resulting from proposed analytical architecture and
process improvements.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
SAS
INDUSTRY
PROVEN VALUE PROPOSITION
ACROSS MULTIPLE INDUSTRIES
FINANCIAL
SERVICES
PUBLIC
SECTOR
TELCO
RETAIL
SERVICES
Risk
Management
Revenue
Leakage
Campaign
Optimization
Inventory
Management
Promotions
Management
COMPANY
USE CASE
VALUE
•
•
356X faster
risk
calculations
Faster in/out
markets
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
•
Better able to
audit
•
Detect issues
pre-refund
•
15% better
campaign
response
rates
Copyright © 2012, SAS Institute Inc. All rights reserved.
•
Markdown
optimization –
from 30 hours
to 2 hours
•
More precise
than
competition
•
Coupon
redemption
rate +15%
RETAIL
USE
CASE
In-database Model Scoring
4½
HRS
Overview:

The largest customer behavior marketing company in the world, Catalina Marketing analyzes and
predicts shoppers’ buying behaviors to generate customized point-of-sale color coupons,
advertisements and informational messages for retail stores and pharmacies nationwide.
Process and Deliverables:

Leveraging In-database scoring, automated the execution of scoring models against their entire
140 million consumer database;
Impact:


Catalina Marketing has reduced its model-scoring times from 4.5 hours to around 60 seconds
using SAS Scoring Accelerator. As a result, it is able to use more complex, varied models to obtain
analytical results faster for more efficient, reliable decisions -- improving brand performance on
behalf of its food, drug, and mass advertising and marketing partners.
Implementation of marketing campaigns in days vs. more than 1 month before.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
60
SECONDS
FINANCIALSERVICES
USE
CASE
Credit Risk on Banking Data
Overview:
Data Source: Bank loan portfolio covering:

3 million loans;

5,000 stress scenarios;

40 time horizons;

3
MINUTES
Transition matrix approach
Process and Deliverables:


Estimates of credit losses under stress over multiple horizons.
Completed compute time: under 3 minutes.
Impact:

Fast estimates of credit losses under stress over multiple horizons,
enables the Bank to make changes to lending practices throughout the day
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
PUBLIC SECTOR
USE
CASE
Text Mining on Unstructured Data
5½
HRS
Overview:

USA’s National Highway Traffic Safety Administration

700,000 accident reports on Vehicles make and models, manufacturing date, purchase date,
failures, mileage, number of cylinders, etc… Car components, Accidents information, etc
Process and Deliverables:

Text Mining on accident reports. Analyze, Understand, Validate and Predict contents.

Report on content categorization. Text mining process runs in 1 minute 22 second on a High
Performance Analytics Server, instead of in 5 ½ hours on a regular server.
Impact:

99% time improvement means the whole process can now be considered an ITERATIVE,
SECONDS
DYNNAMIC process

Analyst can run it 20 times before lunch, each time fine-tuning the model and improving the
output, instead of maybe twice during the whole week.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
82
Copyright © 2012, SAS Institute Inc. All rights reserved.
UTILITIES
USE
CASE
Forecasting On Smart Meter Data
Overview:

Oklahoma Gas & Electric Company (OG&E) serves nearly 800,000 customers in
Oklahoma and western Arkansas. It was named the 2011 Utility of the Year.

Forecast energy demand with SAS Analytics, plan for future changes to its energy
portfolio and optimize programs that encourage wiser use of energy.
12 records
Process and Deliverables:

Use smart meter data coming from customers every 15 minutes (versus once a month) to
create and measure the effectiveness of programs that reduce energy consumption.
Impact:
30,000 records

What previously took one to three days can now be done in a matter of hours.

We've gone from receiving 12 records for each customer to over 30,000 records per
year.
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
CONCLUSION
What High Performance Analytics Really Mean
 It’s not just about incredible speed, it’s also about:
 Confidence: No more sampling, subsetting, summarizing
 Accuracy: More complex models, more variables
 Efficiency: Leverage the Analytical Brain on valuable tasks
 Agility: Adapt and (re)Act faster
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.
 A flexible architecture that supports
many data types and usage patterns
Technology
Checklist for
Big Data
Analytics
 Upstream use of analytics to optimize
data relevance
 Real-time visualization and advanced
analytics to accelerate understanding
and action
 Collaborative approaches to align
Business and IT executives
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Copyright © 2012, SAS Institute Inc. All rights reserved.