Using Structured and Unstructured Data as part of an Analytical

Download Report

Transcript Using Structured and Unstructured Data as part of an Analytical

Using Structured and
Unstructured Data as part
of an Analytical Process
Managing Future Requirements Now
Copyright © 2006, SAS Institute Inc. All rights reserved.
Agenda
Copyright © 2006, SAS Institute Inc. All rights reserved.
Agenda
 Why is this stuff important?
• Trends in analytics and analytical data
 Why can’t I simply use the same approach I’m
currently using?
• Analytics’ unique characteristics
 So what’s the solution?
• Strategies to plan for and manage information growth
for analytics
Copyright © 2006, SAS Institute Inc. All rights reserved.
Key Messages
 Analytics has been and will continue to be a
competitive differentiator
 Structured and unstructured data volumes are
rapidly increasing
 Traditional reporting-driven information
management strategies are not always effective
 Data quality is paramount to effective analytics
 Execution times can be in the order of years, so
you need to plan now to succeed in the future
Copyright © 2006, SAS Institute Inc. All rights reserved.
Why is this stuff
important?
Copyright © 2006, SAS Institute Inc. All rights reserved.
We’re going through an information revolution …
WWW: 170 Terabytes
Emails: 35,000,000,000/day
(400,000 Terabytes/yr)
Telephone: 17.3 Exabytes/yr
Source: University of California, Berkeley
Copyright © 2006, SAS Institute Inc. All rights reserved.
And our information sources keep growing …
Customer
information
ID columns
Copyright © 2006, SAS Institute Inc. All rights reserved.
Purchases /
Services
Demographic,
Financial Profiling
Time Series
Text-based customer
interactions
Non-text-based
customer
interactions
Our customer knowledge keeps increasing ...
100
75
Terabytes
of data
Knowledge Gap
50
Customer Data
Availability
Analytical
Capacity
Execution Gap
Execution
Capacity
25
0
1960
1970
1980
1990
Time
Copyright © 2006, SAS Institute Inc. All rights reserved.
2000
2010
And we’re moving beyond reporting.
“Tedious data mining and static reports
have had their day. The new business
intelligence applies business analytics to
fresh data and puts analysis in the hands of
those who need it.”
Source: InfoWorld
2006, SAS Institute Inc. All rights reserved.
Copyright © 2005,
9
And some companies are specifically competing
on analytics…
 “The idea of competing on analytics is not entirely new”
 “What is new is the spreading of analytical competition
from individual business units to an enterprise-wide
perspective”
-- Thomas H. Davenport (author)
Source: Harvard Business Review
(January 2006)
Copyright © 2006, SAS Institute Inc. All rights reserved.
We’ve entered the “Era of Analytics”.
“Previous bases for competition …
have been eroded … That leaves three
things as the basis for competition:
• Efficient & effective execution
• Smart decision making
• Ability to wring every last drop of value
from business processes
… all of which can be gained through
sophisticated use of analytics.”
“Competing on Analytics” (Davenport & Harris)
Harvard Business School Press
Worldwide Release: March 6, 2007
Copyright © 2007,
2006, SAS Institute Inc. All rights reserved.
Why can’t I use the same
approach I’m currently
using?
Copyright © 2006, SAS Institute Inc. All rights reserved.
Analytics is …
Data-driven insight for better decisions.
A process encompassing a range of techniques
dealing with the collection, classification, analysis,
and interpretation of data to gain insight, reveal
patterns, anomalies, key variables and relationships.
Copyright © 2006, SAS Institute Inc. All rights reserved.
But, more importantly, what’s critical?
 Sufficient historical data
 Sufficient granularity
 Clean, accurate data
 Breadth and representativeness of data
Copyright © 2006, SAS Institute Inc. All rights reserved.
What is a “model”?
 An abstraction of reality
• Simplifies reality via assumptions
• Defines constraints and actors
• Narrows our focus by eliminating everything other than
what we’re concerned about
 Why do we use them?
• Helps us gain insight about real-world processes /
objects
• Gives us something we can “play” with
• They’re cheaper than using the real things
Copyright © 2006, SAS Institute Inc. All rights reserved.
No, really - what is a “model”?
 What are some examples?
• The theory of relativity
• A 100:1 scale architectural rendition of a proposed
building
• A “clay” of a car
• A catwalk / clothes model
Copyright © 2006, SAS Institute Inc. All rights reserved.
So what does an analytical model look like?
 Example specification: Risk of Default on a Loan
y    1 x1  2 x2    
 Example implementation: Risk of Default on a
Loan
CreditRisk  300  (15 * Income)  (22 * Age)  
Copyright © 2006, SAS Institute Inc. All rights reserved.
Text Mining is no different.
Reading
the text files
Singular Value
Decomposition
Term
weighting/rollup
Text
Preprocessing
Dimension
Reduction
Document
analysis
Copyright © 2006, SAS Institute Inc. All rights reserved.
Analytics drives significant value …
Customer Value
Acquisition /
Activating
Target/
acquire
prospect
Welcome
Prg.
Customer
development
Harvest
Win Back
Up/X-sale
Service/advice
Pro-activity based on “If”
events:
- Lifetime
- Usage/purchase
- Behaviour
- Critical
Churn Prevention /
Attrition
Cancellation
Analytical
insight
Time / insight
 Behaviour Scoring
 Response rates
 Entry Scoring
Copyright © 2006, SAS Institute Inc. All rights reserved.
 Contact Policy
 Fraud Detection
 Segmentation (Value / Needs)
 Tariff Plan Optimisation
 X Sell / Up Sell
 Credit / Collections
 Churn Propensity
 Churn Segmentation
 Satisfaction score
But it requires data, which can take many forms …
Highly
aggregated
data
Highly
disaggregated
data
Interactive
Copyright © 2006, SAS Institute Inc. All rights reserved.
Analytical
Different activities have different requirements …
Highly
aggregated
data
e
Th
p
re
tin
or
g
pa
th
e
Th
an
al
ic
yt
s
pa
th
Highly
disaggregated
data
Interactive
Copyright © 2006, SAS Institute Inc. All rights reserved.
Analytical
And different approaches are there for good
reason …
Highly
aggregated
data
e
Th
p
re
tin
or
g
pa
th
e
Th
an
al
ic
yt
s
pa
th
Highly
disaggregated
data
Interactive
Copyright © 2006, SAS Institute Inc. All rights reserved.
Analytical
Reporting-driven data management processes
aren’t always appropriate …
 Traditional reporting processes support highly managed
activities
 Analytical processes are flexible and iteratively driven
 Successful companies are managing the two processes
differently
Integrate
Hypothesis
Structured Process
Source
Data
Systems
Integration
DW
Storage
Metadata
Copyright © 2006, SAS Institute Inc. All rights reserved.
BI
Interpret
Hand
coded
Extracts
Analytical
Tools
Unstructured Process
However, the two are closely aligned.
Hypothesis
Copyright © 2006, SAS Institute Inc. All rights reserved.
Descriptive – Measures
Inferential – Brings deep understanding
the past (What)
of past and predictive of future (Why)
So what’s the solution?
Copyright © 2006, SAS Institute Inc. All rights reserved.
Predictive Analytics: A Summary
 Going from seeing small bits to understanding
the bigger picture
 Integration
• Data: being able to link the unseen
• Models: provide the complete picture of the customer
• Technology: support the integration
• People: integrate all stakeholders of analytics into the
business process
Copyright © 2006, SAS Institute Inc. All rights reserved.
We’re using more and more data …
 Used to work with a couple of dozens of
variables
 Nowadays at least a couple of hundreds
• Data from different sources
• Derived data (differences, rations, trends etc.)
• Data from combined algorithms (market basket
analysis, combined with clustering combined with
predictive modeling)
 Can become thousands
• Pharma: micro-array data
• Interactions
Copyright © 2006, SAS Institute Inc. All rights reserved.
This has some major implications …
 History is key
• To build a model, you need historical data
• This history must be collected over time
• To be able to effectively use analytics now, you must have
planned and executed up to two years ago
• Start collecting data now if you want to remain
competitive
 Data quality can be showstopper
• All the data in the world is useless if it isn’t accurate
• Capturing and storing this data can be expensive if it isn’t
useful
• Bad quality data can delay an analytics project by years
• Solve the data quality problem when you start, not
afterwards
Copyright © 2006, SAS Institute Inc. All rights reserved.
This has some major implications …
 Granularity is essential
• Statistics works by extracting trends of large amounts of
information
• Pre-summarised information is almost always useless
• The enterprise data warehouse may not be the best location
for this data
• Don’t assume everything must be in the single data
warehouse
 It’s not just about data, it’s about the right data
• Knowing what data is important can be a challenge
• Requires a highly consultative approach with the business
• Helps to be tied back to strategic business drivers / business
model
• Understand not only the business and the problem, but
involve the right stakeholders
Copyright © 2006, SAS Institute Inc. All rights reserved.
Copyright © 2006, SAS Institute Inc. All rights reserved.