Predictive Analytics - Individual CMG Regions and SIGs
Download
Report
Transcript Predictive Analytics - Individual CMG Regions and SIGs
DB2 User Group Meeting
Big Data, Business Analytics
and System z
Shantan Kethireddy
[email protected]
DB2 System z FTSS
Analytics-driven Organizations Can…
Identify Risk
…and immediately control it
Increase system capacity and
availability while keeping IT costs flat
Insights into overlapping policies from
multiple insurance companies
Getting their reports as much as 70
percent faster
5
There is an Explosion in Data and Real World Events
2 Billion
Internet
users by 2011
1.3 Billion RFID tags in 2005
30 Billion RFID
tags by 2010
Capital market
data volumes grew
1,750%, 2003-06
World Data Centre for Climate
220 Terabytes of Web data
9 Petabytes of additional data
4.6 Billon
Mobile Phones
World Wide
Twitter process
7 terabytes of
data every day
Facebook process
10 terabytes of
data every day
eXtreme Analytics
High Volume Data Arriving From Many Sources
Auto Correlation and Cross Correlation Across Sources
Add social networking (reduce the size of Online trxn systems)
Search
Online Transaction
Processing
System
Embedded
Analytics
ClickSteam, CRM
Dashboards
Claim data (text, picture, video)
Location Tracking (GPS),
iPhone, Vehicle Use Data,
$ Trans tracking (Across borders
& IP providers),
Billions of
mobile devices
Continuous arrival of
high volume
information
(evolving, highly
Auto/Cross
variant)
(struct-/semi--/unstructured)
Financial
Planning
Analytics,
Census Bureau Data
Predictive Analytics
Search
Market Data, Weather Data
Web Buz data
About products/companies
(for reputation analysis)
Scorecards
Correlation
Feeds:
Sensors data
Mash ups
100’s TBs/
PetaBytes
Deep & Wide
Analytics
Fine grained –
individual product
and customer at
a time and place
12
IBM’s Big Data Platform Vision
Bringing Big Data to the Enterprise
Client and Partner Solutions
IBM Big Data Solutions
Data
Warehouse
Warehouse
Appliances
Big Data User Environments
Developers
End Users
Netezza
Administrators
Master Data
Mgmt
InfoSphere MDM
INTEGRATION
AGENTS
Big Data Enterprise Engines
Database
DB2
Content
Analytics
ECM
Internet Scale Analytics
Open Source Foundational Components
Hadoop
HBase
Pig
Lucene
Jaql
Information Server
Streaming Analytics
Business
Analytics
Cognos & SPSS
Marketing
Unica
Data Growth
Management
InfoSphere Optim
13
Insight Analytics
Internet-scale Analytics
Based upon Apache Hadoop and Open Source
– Apache Hadoop (HDFS, Map-Reduce), Jaql (programming query language), Pig,
Flume, Hive, Lucene (text search), Zookeeper (process coordination), Avro (data
serialization), HBase (real time read/write), Oozie
Unique Features:
– Improving upon open source – enterprise-scale indexing
– Complex Analytics - Text analytics
– Enterprise Scale - Enterprise class storage and security
– Workflow - orchestration and prioritization
– End-user environments - Enterprise Console – data explorer
Organizations use BigInsights for processing an extreme variety and volume of
data – ranging from weather predictions to social media and multi-channel
customer pattern analysis to IT multi-system log analysis.
14
Internet-Scale Analytics in Action
Financial Services
Improved risk decisions
Customer sentiment analysis
AML
Transportation
Weather and traffic
impact on logistics and
fuel consumption
Call Centers
Voice-to-text mining for
customer behavior
understanding
Telecommunications
Operations and failure
analysis from device, sensor,
and GPS inputs
Utilities
Weather impact analysis on
power generation
Smart meter data analysis
IT
Transition log analysis
for multiple
transactional systems
E Commerce
Analyze internet behavior
and buying patterns
Digital Asset Piracy
Multi-channel integration
Integrated customer behavior
modeling
15
Streaming Analytics
Streaming Analytics
Unique Features:
– Complex Analytics - Analysis of structured and
unstructured (video, audio, geo-spatial, and other
non-relational data) streams
• Mining Toolkit to score data models in real time
against streaming data
– Fast - Clustered runtime for high-performance,
extremely low-latency streaming applications
– Enterprise-scale - High availability via runtime
restart and recovery services
– Scalable
Organizations use streaming analytics for extreme velocity and variety in
various applications – ranging from real-time traffic analysis to predicting
stock fluctuations depending on weather.
16
Streaming Analytics in ActionStock market
Impact of weather on securities prices
Analyze market data at ultra-low latencies
Natural Systems
Wildfire management
Water management
Law Enforcement,
Defense & Cyber Security
Real-time multimodal surveillance
Situational awareness
Cyber security detection
Transportation
Intelligent traffic
management
Fraud prevention
Detecting multi-party fraud
Real time fraud prevention
Manufacturing
Process control for
microchip fabrication
e-Science
Space weather prediction
Detection of transient events
Synchrotron atomic research
Health & Life Sciences
Neonatal ICU monitoring
Epidemic early warning
system
Remote healthcare
monitoring
Other
Telephony
CDR processing
Social analysis
Churn prediction
Geomapping
Smart Grid
Text Analysis
Who’s Talking to Whom?
ERP for Commodities
FPGA Acceleration
Predictive Analytics:
The power of social media to forecast auto sales
Social media serves as a proxy of people’s opinions (past experiences
and current beliefs) about products or services
Social media influences people’s buying behavior and the future sales
Thus, social media is a powerful platform to predict the future
COBRA/Cognos Consumer Insight (CCI) harness this unstructured
information for predictive analytics
People’s
opinions
(experiences
and beliefs)
reflected
influence
Social
Media
Use social media
as a predictive platform
19
People’s
buying
behavior and
the future
sales
COBRA (CCI) + SPSS modeler = Social Predictive
Analytics
Backend - Building the System
Social Media Online News
Persistence
Queries
Frontend - Discovery
List of Blogs/
Boards
Information
Extraction
Internal
Customer
Data
News
Feed / Wires
Targeted
Websites
Discovery
Analyzing
Analyzing Influencers
Taxonomies
Analyzing Sentiment
URL’s
COBRA (CCI) ingests
targeted social media contents
analyzing sentiment and taxonomies
20
SPSS modeler implements
social media-based
prediction models for sales
Significant correlations between social media
measures and auto sales
People’s
opinions
(experiences
and beliefs)
reflected
influence
Social
Media
Use social media
as a predictive platform
People’s
buying
behavior and
the future
sales
(From COBRA)
Monthly change of the sentiment
(positive/neutral/negative) about a
brand
Monthly change of the frequency
(# postings) of topic keywords
about a brand
21
(From the company)
Observing significant
correlations
Monthly sales data
Sentiment change correlates with the auto
sales
Finding 1: When the sentiment measure increased (or decreased) from
the previous month, the auto sales tended to go up (or down).
Examples: Postings with positive or negative sentiment
Sentiment-based Correlation Analysis: We investigate the correlation
“xyz the
is oneauto
of the best
selling
in the US
market.”
(positive
between
sales
ofsedans
a target
car
brand
andsentiment)
the sentiment change
on the “It's
car
brand-related
social
media content
hard
to know....if the problem
is widespread....xyz
should fix it. I would probably never
buy a new xyz. Today's xyz’s seem over priced, their salesmen act condescending, and
Results:
For
CarI want
(Jan.2009
~ car.”
Dec.2010,
monthly sample data),
well....
truthfully,
an American
(negative sentiment)
“Hi,there
I have ais
2005
my question iscorrelation
my rear drum brakes
are not selfthe
adjusting.
a xyz,
significant
between
autoTake
sales
the car to the shop, have it adjusted and soon after it loosens again.” (negative
sentiment
change:
sentiment)
and the
Pearson’s correlation coefficient = 0.418 (p < .05)
Note that the sentiment measure M(t) for a month t on the target car brand-related social media content
M(t) = (P(t) – N(t)) / V(t)
where P(t) = number of postings with positive sentiment for t,
N(t) = number of postings with negative sentiment for t, and
V(t) = number of total postings for t
Sentiment change for a month = the change in the sentiment compared to the previous month
22
23
Predictive Analytics: the social media power to predict the auto sales
© Copyright IBM Corporation 2011
Keyword frequency changes correlates with
the auto sales
Finding 2: When people mentioned terms such as “safety”, “brakes”, “solid” and “torque”
more (or less) compared to the previous month, xyz Sales tended
to go down (or up).
Keyword-based Correlation Analysis
“safety”
“brakes”
xyz-related
Social Media
Content
COBRA
text clustering
Automatically
discovering
topic keywords
Corr = – 0.525 (p < .05)
Corr = – 0.525 (p < .05)
“solid”
Corr = – 0.578 (p < .05)
“torque”
.
.
.
Corr = – 0.503 (p < .05)
Compute
Pearson’s Correlation
Coefficient
Keyword Frequency Change for each month
= the change in the number of postings
containing the keyword
compared to the previous month
24
xyz
Sales
Jan.2009 ~ Dec.2010
monthly sample data
25
Machine Learning Example: Topic Detection and Evolution
What are people talking about in social media about a product?
documents
words
1 1 0.10
1 2 0.30
1 3 0.22
1 4 1.24
: : :
: : :
K topics
documents
K topics
words
1 1 0.10
1 2 0.30
: : :
H
W
26
26
We have developed novel technologies, such as IMARS, to automatically
recognize semantic categories for diverse visual content
Traditional object tracking, face
detection, event composition
activities
IMARS is built on foundation of large-scale semantics modeling and
generalized visual feature-based machine learning
fireworks
parade
earthquake
Abandoned bag
flag burning
combat
shoplifting
launch
fire
flood
wreckage
scenes
bridge
mountains
waterfront
traffic
buildings
cityscape
street scene
monument
people
couple
glasses
face
team photo
person
with baby
few
crowd
objects
helicopter
airplane
ferry
police car
vehicle
10’s
specialized
military ship
100’s
soldiers
1K
# categories
humvee
bus
truck
10K
100K
generalized
27
27
DB2 Analytics Accelerator V3
Capitalizing on the best of both worlds – System z and Netezza
What is it?
The IBM DB2 Analytics Accelerator is a
workload optimized, appliance add-on, that
enables the integration of business insights into
operational processes to drive winning
strategies. It accelerates select queries, with
unprecedented response times.
How is it different
Performance: Unprecedented
response times to enable 'train of
thought' analyses frequently blocked by
poor query performance.
Integration: Deep integration with DB2
V9 and V10 provides transparency to all
applications.
Self-managed workloads: queries are
executed in the most efficient location
Transparency: applications connected
to DB2 are entirely unaware of the
Accelerator
Simplified administration: appliance
hands-free operations, eliminating most
database tuning tasks
Breakthrough Technology Enabling New Opportunities
28
DB2 Analytics Accelerator for z/OS
Netezza appliance connected to System z only accessible through DB2
Blending System z and Netezza
What is the value?
technologies to deliver unparalleled,
•
Fast, predictable response times
for “right-time” analysis
•
Accelerate analytic/ad hoc query
response times
•
Improve price/performance for
analytic workloads. Quick ROI
•
Ease of deployment
•
Minimize the need to create data
marts for performance
•
Highly secure environment for
sensitive data analysis
•
Transparent to the application and
user
mixed workload performance for
complex analytic business needs.
OLTP vs. Analytics – Examples
OLTP - “Transactional”
Transactional Analytics:
(Operational BA)
Deep Analytics
Withdrawal from a bank
account using an ATM
Approve request to
increase credit line based
on credit history and
customer profile
Regular reporting to central
bank – sum of transactions
by account
Buying a book at
Amazon.com
Propose additional books
based on similar purchases
by other customers
Which books were bestsellers in Europe over the
last 2 months?
Check-In for a flight at the
airport
Offer an upgrade based on
frequent flyer history of all
passengers and available
seats
Marketing campaign to sell
more tickets in off-peak
times
Hand-over manufactured
printers to an overseascarrier
Optimize shipping by
selecting cheapest and
most reliable carrier on
demand
Trend of printers sold in
emerging countries versus
established markets.
30
Business Intelligence Example
Predictive Analytics Example
Data Integration Example