Transcript Data Mining

Open World 2003
Session id: 40332
Data Warehousing for the Communications Industry:
A Data Mining Approach to Customer
Churn Analysis in Wireless Industry
Shyam Varan Nath
Senior Database Engineer
Daleen Technologies
Introduction
 Oracle Data Mining
–
–
JDeveloper
DM4J
 Wireless Industry and Customer Churn
 Data Modeling for Churn Management
“WLNP Threatens to
Significantly Impact
Wireless Churn
Rates.”
Source In-Stat 2002
Churn
North American Wireless industry monthly
churn rate in Q4-02
2.8%
2.4%
U.S. Average
Canadian
Average
Monthly Churn (%) - 4Q-02

Source: Company & analyst reports
Wireless Industry: Some Facts
 Wireless Local Number Portability (WLNP)
from Nov 2003
 Average Cost to Acquire a New Wireless
Customer: $400 to $500
 Data Mining as a Solution to the Business
Problem
…facts
Source: Duke Teradata 2002
…facts
Reasons for Churn
 Many companies to choose from
 Similarity of their Offerings
 Cheap prices of the handsets
The biggest current barrier to churn:
the lack of phone number portability!
A Dilemma
 Cross-Selling Through Database Marketing
–
–
cross-selling is effective for customer retention by
increasing switching costs and enhancing
customer loyalty
on the other hand, cross-selling can also
potentially weaken the firm’s relationship with the
customer, because frequent attempts to crosssell can render the customer non-responsive or
even motivated to switch to a competitor
Role of Data Mining
Business Issues in a Wireless Industry
Some Definitions
 Data Warehousing: Data warehousing is a
database or a collection of databases designed to
give business decision-makers instant access to
information
 Data Mining: The Data Mining is the process of
using raw data to infer important business
relationships that can then be used for business
advantage
“Simply put, data mining is
used to discover [hidden]
patterns and relationships in
your data in order to help you
make better business
decisions.”
Source: Oracle9i Data Mining 2001
Choice of Tools
Justification for Data Mining
 Reporting Tools: Good at drilldowns into the details
 OLAP/Statistical Tools: Used to draw conclusions
from representative samples
 Data Mining: Goes deep into the data. It uses
machine-learning algorithms to automatically sift
through each record and variable to uncover patterns
and information that may have been hidden.
Predictive Modeling
Visual Representation of Predictive Modeling
Benefits Of Data Warehousing
And Predictive Modeling
 Immediate Information Delivery
 Data Integration from across—and even
outside—the Organization
 Future Vision from Historical Trends
 Tools for Looking at Data in New Ways
What is ODM?
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL>
Oracle9i Data Mining, an option to Oracle9i
Enterprise Edition, that allows users to build
advanced business intelligence applications that mine
corporate databases to discover new insights, and
integrate those insights into business applications.
Why Oracle?
Integrated Environment of Oracle Relational Database
Supervised v/s Unsupervised
Learning
 Supervised learning requires identification of a target field or
dependent variable. The supervised-learning technique then sifts
through data trying to find patterns and relationships between the
independent variables and the dependent variable. (ODM
provides the Naïve Bayes data mining algorithm for supervisedlearning problems.)
 Unsupervised learning allows the user not to indicate the
objective to the data mining algorithm. Associations and
clustering algorithms make no assumptions about the target field.
Instead, try to find associations and clusters in the data
independent of any a priori defined business objective – Marketbasket analysis etc. (ODM provides the Association Rules data
mining algorithm for unsupervised-learning problems.)
Naive Bayes algorithm
 The Naive Bayes algorithm uses the mathematics
of Bayes' Theorem to make its predictions. The
algorithm is typically used for:
–
–
–
Identifying which customers are likely to
purchase a certain product
Identifying customers who are likely to churn
Predicting the likelihood that a part will be
defective
IF RELATIONSHIP = "Husband"
 Adaptive Bayes Network
–
Human readable rules
AND
EDUCATION_NUM = "13-16"
THEN CHURN= "TRUE"
Bayes Theorem
According to the Bayesian rule, the probability of an example E
being in class c is:
P(C = c|a1, a2 ……, an) = p(a1, a2 ……, an|C = c) p(C = c)
p(a1, a2 ……, an)
The classification is taken as the C’s value
with the largest probability:
Assume all attributes are independent given the class:
p(a1, a2 ……, an|c) = p(a1|c) p (a2|c) ….p(an|c)
The resulting Bayesian classifier is called the
Naïve Bayesian classifier.
Major Steps Of Data Mining
 Build Model: Models are built in the data-mining
server
 Test Model: Model testing gives an estimate of model
accuracy
 Compute Lift: ODM supports computing lift for a
binary classification model (confidence of prediction)
 Apply Model: Applying a supervised learning model
to data results in scores or predictions with an
associated probability
computing lift for a binary classification model,
Build Model
Apply Process
Data For Modeling
Calibration
Current Score
Data
Future Score
Data
Sample Size
100,000
51,306
100,462
# of Predictor
Variables
171
171
171
Yes
1,000,001 –
1,100,000
No
2,000,001 –
2,051,306
No
3,000,001 –
3,100,462
Churn Indicator
Customer ID
Nature of Dataset Used for Study
(real Wireless Customer Data)
System Setup
 Database
 Java Environment
 Data Mining Wizard
Database: Oracle 9.2.0.1.0
Installation of Oracle Database Software 9.2.0.1.0 with Oracle Data
Mining Option, with the database patch for version 9.2.0.2.1 .
Java Environment: JDeveloper
Installation of JDeveloper 9.0.3
Data Mining Wizard: DM4J
Question
Getting Started…
•Unlock odm user
•Grants on the tables for wizard to display
•Odm_mtr schema
Working with the DM4J Wizard
Creating a new Workspace
Configuring a Database Connection
…DM4J
Selecting a model type in the DM4J wizard.
Algorithm for Data Modeling
Selecting the Algorithm
Fine tuning the algorithm
…DM4J
The DM4J wizard generates the Java code that is compiled and
executed to create the model.
…DM4J
Here is the
Java Code!
Our Study
The input data was stored in a
table called CALIBRATION.
Our target variable for prediction is CHURN.
…study
We pick all the input predictor variables (except customer Id) from
the list of 171 to predict churn.
…study
compilation and execution of the
Java code containing the ODM model.
The program runs in an asynchronous
mode and we can monitor the progress
of the task. The screen shot shows the
successful completion of the model.
…study
The Adaptive Bayes Network also generates the
rules for the model in human readable form.
…study
Confusion Matrix
Testing the Model using the
data from table PRESENT
Cumulative Lift Chart
…study
The last step is to apply the tested model to the data
set where we want to predict the CHURN
…study
After the Apply task is run
When we apply the model, the predictions are
obtained and stored in an output table
…study
Rating the importance of the various predictor
variables.
Top Ten Variables
1. DUALBAND type of phone set
2. CARTYPE dominant vehicle lifestyle
3. EDUC1 education level of first house hold member
4. ETHNIC ethnicity
5. TOT_ACPT total offers accepted from retention team
6. OCCU1 occupation of the first household member
7. AREA geographic area
8. INCOME estimated household income
9. DWLLSIZE dwelling size
10. PROPTYPE property type details
Cost Savings Based on Churn
Data
savings per churnable subscriber = [ net(no intervention) – net(incentive) ] / [ L + NL ]
net(no intervention) = [ L + NL ] X Cl
net(incentive) = [ L + LS ] Ci + [ Pi L + NL ] Cl
To estimate cost savings, the parameters Ci (cost of incentive per customer),
Pi (reduction in probability to churn due to incentive Ci), and Cl (lost-revenue
cost when a subscriber churns) are combined with four statistics obtained
from a predictor model:
L: number of subscribers who are predicted to leave (churn) and who actually leave barring
Intervention.
NL: number of subscribers who are predicted to stay (nonchurn) and who actually
leave barring Intervention.
LS: number of subscribers who are predicted to leave and who actually stay
SS: number of subscribers who are predicted to stay and who actually stay
Churn Management
Expected Saving to Carrier / Churnable Subscriber
Source: Mozer 2000
Future Trends and Conclusion
•Real time Analytics and Text Mining (Oracle
10G) can take Data Mining to next level.
•Oracle Data Mining can resolve a Business
problem.
•Churn Prediction and Churn Management
can yield significant savings to the wireless
provider.
Daleen at a Glance





Founded in 1989 with a mission to
build custom software for finance &
telecom sectors
Worldwide base of over 80 billing &
customer care contracts since 1997
Innovator in deployment of
convergent billing, event
management & revenue assurance
solutions for
next-generation services
Long term focus on delivering
exceptional customer service
through a site license or service
bureau relationship
Offices in Boca Raton, St. Louis,
Amsterdam & Sydney
RevChain – high performance billing &
customer management
 Commerce - convergent billing & customer mgmt.
 Interact - pure web CSR interface for
comprehensive account management
 Care - web-based self-care with EBPP
 mCommerce - account mgmt. via the mobile device
Asuriti – centralized event management
& revenue assurance
 Configurable, rules-based architecture
 Centralized management of event data
 Data transformation & enrichment
 Revenue assurance & error management
BillingCentral – comprehensive
outsourcing solution
 Advanced billing & event management technologies
 Proven best practices & process controls
 Carrier-class hardware & networks
 Performance guarantees & revenue assurance
QUESTIONS
ANSWERS
References & Useful Links
Technet http://technet.oracle.com/products/bi/odm/9idm4j
Armstrong, G., and P. Kotler. 2001. Principles of Marketing. Prentice Hall New
Jersey.
Duke Teradata 2002. Teradata Center for Customer Relationship Management. [Online]. Retrieved on: Nov 7, 2002.
Available:http://www.teradataduke.org/news_t_2.html
In-Stat. 2002. WLNP Threatens to significantly impact wireless churn rates. [Online].
Retrieved on Sep 2002.
Available: http://www.instat.com/newmk.asp?ID=312
Mozer, Michael, Richard Wolniewicz, Eric Johnson and Howard Kaushansky. 1999.
Churn reduction in the wireless industry, Proceedings of the Neural Information
Processing Systems Conference, San Diego, CA.
Oracle9i Data Mining 2001. An Oracle white paper December 2001. [Online].
Retrieved on: Nov 8, 2002.
Available: http://otn.oracle.com/products/bi/pdf/o9idm_bwp.pdf)
Skedd, Kirsten 2002. WLNP threatens to significantly impact wireless churn rates
[On-line]. Retrieved on Sep 14, 2002.
Available: http://www.instat.com/press.asp?ID=311&sku=IN020258WP
Acknowledgements
 Dr Ravi Behara, Faculty (Florida Atlantic University)
 David Eastlund and Jennifer from Oracle
 Cohorts at Daleen Technologies
Reminder –
Please complete the
OracleWorld online session
survey.
Session id: 40332
Data Warehousing for the
Communications Industry
Thank you.
Contact Information
 Email: [email protected]
 Cell Phone: (954) 609-2402
 Test Message: [email protected]