A Primer on Data Mining

Download Report

Transcript A Primer on Data Mining

Developing Business Insight Through Data Mining
A Primer On Data Mining
Berling Associates – Assisting Clients To Redefine How The Game Is Played
1
From Data To Insight
 Information is the life blood of successful businesses
 Organizations use information tactically to manage operations
 Organizations use information strategically to gain competitive advantage
 Using information effectively organizations improve sales and reduce costs
 Information assets are created from raw data
 Data mining creates the information assets from the raw data that businesses can use to improve
their competitive positions
 Data mining exists because the large volumes of low level data available for analysis is too “large”
to easily understand and digest
 Data mining converts this voluminous data into other forms that might be more compact (a short
report), more abstract (a descriptive approximation or model of the process that generated the data), or
more useful (a predictive model for estimating the value of future cases)
 Data mining is the process of sifting through vast amounts of data in order to extract meaning and
discover new knowledge
 Data is one of the most plentiful regenerating assets in a company; unfortunately many businesses
lack the capabilities to transform their data into information assets
Berling Associates – Assisting Clients To Redefine How The Game Is Played
2
Use Of Data
Traditional Use
Reports
Retrospective Views
Graphs
(What happened in the past?)
Charts
Data
Sources
Data Mining
Descriptive and Prospective Views
(Why did this happen?
What is likely to happen?)
Models
 Traditionally, data has been used to generate reports, tables and graphs of what happened in the past. The user
who knows what he is looking for can answer specific questions: How many new accounts were opened last month?
Which products sold in greater quantities this month versus last month? Did we meet or exceed our budget this
year?
 Data mining is the data-driven discovery and modeling of hidden patterns in large volumes of data. Data mining
differs from the traditional methods because it produces models that capture and represent hidden patterns in the
data. With data mining the user can discover patters and build models automatically without knowing exactly what
he is looking for. A user can pose “what-if” questions to a data mining model that cannot be queried directly from
the database or warehouse; for example: “What is the expected lifecycle profitability of every product? Which
customers will switch their phone service if we introduce new fees?
Berling Associates – Assisting Clients To Redefine How The Game Is Played
3
How Businesses Use Data Mining Results
 Marketing and Sales
In marketing, the primary application is database marketing systems, which analyze customer databases to
identify different customer groups and forecast their behavior. It has been estimated that over half of the
retailers are using or planning to use database marketing, and those who do use it have good results; for
example Wal-Mart uploads millions of POS transactions each day to understand down to the Q-Tip what
is selling in the aisles, other retailers are doing market-basket analysis to understand patterns such as if a
customer purchases X, he is likely to purchase Y. AT&T, A.C. Nielsen, and American Express are
crunching terabytes of data to better design promotional strategies.
 Investment
Many financial analysts at investment companies are manipulating vast sets of financial records, data
feeds and information from other sources using sophisticated data mining tools to search for relationships
among historic factors to predict the future performance of stocks and bonds.
 Fraud Detection
Many credit card companies use data mining models to predict credit card fraud. Transactions are
monitored and patterns in their usage are predictors of fraudulent use.
 Manufacturing
Data mining tools are used to model and predict the failure of aircraft engines. Hundreds of thousands of
parts are monitored across thousands of engines in service. Many data elements are tracked and analyzed,
and predictors of failure are formulated. Models predict which engines will fail when, driving quality
assurance inspections and repairs, before the failure events.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
4
From Data to Insight
Internal Data Sources
Call Center
- Employee
- Transaction
-Call duration
- Date/Time
-Promotion
Product
Inventory
Distribution
Returns
Marketing
Credit
Transactions
- Customer
- Product #
- Card type
- Color
- Size
- Date
- Price
- Promotion
Promotions
-Start/End
-Program Type
-Audience
-Media types
-Ring Code
-Response Method
-Promo Cost
Customer
- Account #
- Address
- Phone
-Recency,
Frequency,
Monetary
- Length of
relationship
Insight
 How likely are customers to churn?
 Who will buy a new service?
 How likely are they to trade up to an
enhanced feature set?
 Are we holding customers due to these nocharge services?
 Do our service levels need to change?
 How likely are they to recommend our
products and services to another?
 Are they profitable over their life time?
 What else are they most likely to purchase
when they purchase product X?
 Which group will be most likely to
purchase from this sales promotion effort?
 Which customers should we “fire”?
 Which advertising campaigns are most
effective?
 Data mining provides the analytical models to answer the questions that provide true insight into the business
 Internal data sources can be supplemented with data from external sources (web, census data, etc.) to enhance
the depth of insight available
Berling Associates – Assisting Clients To Redefine How The Game Is Played
5
Example
A telephone company wanted to increase its cell telephone revenues in a major geographic-market area. At the same time it
wanted to reduce its marketing expenditures during the current fiscal year. The company had approximately 85,000 cell phone
users. It was decided to attempt to increase phone usage within its current base versus attempting to sign up new users, believing
this tact would be more efficient. Its agency had proposed the idea of running a promotion consisting of giving a telephone
battery to users so as to increase usage. The question was which of the cell phone user segments presented the best financial
return for the promotion’s cost.
A sophisticated data mining process identified the segment with the greatest potential return. Running the promotion campaign to
a specifically targeted segment of users resulted in a 15% penetration at less than one-third of the cost of a traditional effort. Here
is how this was accomplished.
1. Three in-house databases (operational, marketing and credit) were analyzed. The operational data files identified usage times
and patterns of those who might be most likely to increase phone usage. Marketing data files identified those users who had
responded to other promotion programs for other company services in the past. Credit files identified those users who had the
ability to easily pay for additional phone usage. These three data files were analyzed and merged to develop the target market.
2. External census data was acquired and was used to identify commuting habits and locations of the target market. Database
information was purchased to obtain purchase behavior and demographics for the target group identified. A number of critical
dimensions were identified, such as: occupation, position in career path, time spent in car, wealth. One of the behavior
dimensions investigated in this analysis was the person’s comfort referring a product or service to another person. Database
information was also purchased to provide credit information on the targets identified. This information was used to cull the list
of targets created from the internal information. The target group identified at this point consisted of those who likely needed the
extra minutes, could pay for them, and would likely give a referral to another for the service.
3. Through the use of the data mining techniques, a list of 15,000 targets was identified.
The direct marketing campaign was run and the result was a participation rate of 15% of the number of targets identified. This
result is dramatically higher than the .5% participation rate typically experienced. Because the target group was smaller than the
traditional direct marketing target group, the cost of the promotion campaign was significantly less.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
6
Data Mining Process
Evaluation
Data Mining
Transformation
Preprocessing
Patterns
Selection
Transformed Data
Preprocessed Data
Data
Target Data
 Data mining is the data-driven discovery and modeling of hidden patterns in large volumes of data
 It is an iterative and interactive process with many decision points
 The basic steps include:
1. Develop an understanding of the area of focus and the data, then define goals for the analysis
2. Create a target data set on which to begin discovery
3. Clean the data (preprocess it) to be reviewed, i.e. how to handle missing fields, time sequences, etc.
4. Find useful features to represent the data, i.e. reduce the number of dimensions to be analyzed (data mining)
5. Identify which data mining analytical tools are best suited to the analysis (data mining)
6. Begin the exploratory analysis refining the algorithms and methods to be used (data mining)
7. Search for patterns of interest in particular representations, i.e. classification rules or trees, regression and
clustering (data mining)
8. Interpret the mined data patterns and proceed with additional iterations, as required (data mining)
9. Act on the insight discovered
 The process of data mining can involve many iterations and many loops between any two steps
Berling Associates – Assisting Clients To Redefine How The Game Is Played
7
Example
A bank wanted to identify opportunities to obtain a “greater share of their customers’ wallet.” The Marketing Department wanted to capture
product and service migration patterns within its customer base. It wanted to anticipate its customers’ needs so as to design, advertise and
promote new products proactively, thereby increasing the likelihood of customer retention through their periods of change. The challenge to the
bank’s marketing personnel was to identify which customer activities the bank could monitor in order to anticipate the need for a new product.
After reviewing and analyzing its internal data, the bank personnel felt stymied. The marketing personnel turned to their advertising agency
account executive for ideas. The agency observed the bank’s personnel might be too far into the forest to see the trees, and suggested outside
data mining consultants might be of assistance. With the consultants’ assistance, the bank’s personnel discovered that not an insignificant
number of their customers, as they became empty nesters, ventured into their own small businesses. As such, these entrepreneurs have a need
for higher margin commercial banking products and services, but the commercial side of the bank did not have visibility of the customers’ need.
With input from its agency, the bank created several migration promotion pieces to send to customers suspected to need these commercial
products. The effort proved to be successful in retaining valued customers and selling new services.
To identify the opportunity, the bank needed to go outside the data within its institution. It needed to acquire certain databases and to apply the
information acquired against that which it had internally to define its opportunity. Here is how they went about it.
1. Information from a number of systems within the bank was duplicated and stored in a data warehouse. Transaction data for the last several
years from the checking account systems, the mortgage loan systems, the trust account systems, etc. were loaded into the data warehouse for a
group of over 100,000 customers. (Not all customers had data in all the systems reviewed.)
2. Demographic, life-style, psychographic, behavioral, and credit database information was acquired from third parties for the customers in the
group being analyzed. This information was put into the data warehouse.
3. Once all the data was assembled, the complex analysis, that in part consisted of classification and regression decision trees, link analysis,
neural networks and cluster detection, profiled the customers’ behavior.
4. The bank learned, among other things, that if a customer was over 40 years of age, an empty nester, had an average income over the last
three years of greater than $75,000 and had drawdowns on his or her equity line account, the likelihood was strong that the customer was
starting a small business. The customers were obtaining their business banking needs from somewhere, and in all cases not from the bank.
The agency was tasked to do market research to confirm what the data analysis indicated. It confirmed through focus groups that these empty
nesters were indeed becoming entrepreneurs with their new found freedom, both personal and financial. With input from the Agency, the bank
crafted a business banking package for these types of individuals along with collateral promotion material. The bank now sends this material to
customers that fit the “emerging entrepreneur” profile, and wins praise from its satisfied customers.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
8
Data Mining Activities
Directed data mining has a goal of using the available data to build a model that describes one particular variable of interest
in terms of the rest of the available data. There are three directed data mining activities.
Classification – Classification consists of defining classes within the data and assigning each record a class code. The
activity is to build a model that can be applied to unclassified data in order to classify it. Examples include: assigning key
words to articles as they come off the newswire, classifying credit applications as low, medium, or high risk; categorizing
types of waste in a manufacturing process; assigning customers to predefined customer segments. In these examples we
have a limited number of classes, and we expect to assign any record into one or another of them.
Estimation – Classification deals with discrete outcomes: yes or no, debit card, mortgage or car loan. Estimation deals
with continuously valued outcomes. Given some input data, we use estimation to come up with a value for some unknown
continuous variable such as income, height or credit card balance. Estimation is typically used to perform a classification
task. A bank trying to decide to whom they should offer a home equity loan might run all its customers through a model
that gives them each a score, such as a number between 0 and 10. With this, the classification task now comes down to
establishing a threshold score. Anyone with a score greater than the threshold will receive an offer. Other classification
tasks can be recast as estimation tasks: figuring out which customers will stop being customers, estimating the number of
children in a family, estimating a family’s total household income, estimating the value of a piece of real estate.
Prediction – Prediction is frequently thought of as classification or estimation. The difference is one of emphasis. When
data mining is used to classify a phone line as primarily used for home-business purposes or a credit card transaction as
fraudulent, we do not expect to be able to go back later to see if the classification was correct. Our classification may be
correct or not, but uncertainty is only due to incomplete knowledge. In the real world the relevant actions have already
taken place. The phone line is or is not used primarily for business purposes. The credit card transaction is or is not
fraudulent. With enough time and effort we can find the outcome. Predictive tests are different because we do not wait or
take the time to find the outcome. We predict future behavior or estimate future value, then make the classification. We do
not check the final outcome. Prediction works by building a model using historical data that explains the current observed
behavior. The model is applied to current inputs, the result is predicted behavior.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
9
Example
A retail store and catalog house sent out well over 150 million catalogs each year. In one year the company set out to reduce the cost it
expended in its catalog business, while it increased the gross margin dollars generated from its catalog sales. To determine how to
achieve this seemingly contradictory goal, the personnel in the Marketing Department summoned the advertising agency account
executive, the catalog distribution company account representative and the strategy consulting firm’s representative. After receiving all
the learned inputs, the Marketing Department personnel decided that collectively they did not know enough about their customers
purchase behavior to develop a solution. It was decided to undertake a data mining exercise to determine if the company could garner the
“right” insights to their customers so as to meet its dual goal.
The data mining effort revealed two important observations, 1) looking at store sales data it was found that certain stock items were
typically purchased together, resulting in a higher revenue/margin ticket; 2) stock items purchased from catalogs varied regionally; and 3)
there was a cluster of people that received the catalog, but never purchased from it. Armed with this information the catalog planners
reformatted the catalog in two ways. First, the catalog contents were varied by a number of defined regions with the United States with
the net result being eight different (70% of the stock was the same) catalogs with fewer pages in each. Second, page layouts were
changed to position stock items, which were typically purchased in the store at the same time, within close proximity on the catalog pages.
Additionally, the mailing list was culled for a number of no purchase names. Here is how this was accomplished.
1. Sales tickets from the catalog sales for the last three years were analyzed. This analysis looked at individual customer purchases
individually and over time. This purchase behavior was contrasted with purchase behavior revealed in an analysis of store tickets from
the retail operations. The same purchase linkage between stock items was not seen in the catalog sales data. (Purchase linked stock items
are placed next to each other in the retail stores.)
2. The analysis of the sales tickets was also completed on a geographic basis using mapping technology. Once the analysis was
complete it was clear that certain stock items were moving in different parts of the country. But the most interesting aspect of the findings
was that this was not solely due to expected regional preferences, i.e. shorts in the South and mittens in the North. It seems the customer
demographics were different. This characteristic was identified by applying socio-economic factors to the customer data. This discovery
provided useful input for the advertising agency so as to adjust the messaging in different sectors of the country.
3. The most difficult decision to make was that of deleting names from the catalog mailing list. Again, the mailing lists were analyzed
against the socio-economic data collected and the purchase mapping data. The mailing list was culled of approximately 10% of the
names.
The net result of these efforts was that the company was able to achieve its seemingly contradictory goal. Catalog costs were decreased
and gross margin dollars generated were increased.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
10
Data Mining Activities
In undirected data mining no variable is singled out as the target; the goal is to establish some relationship
among all the variables.
Affinity Grouping or Association Rules – The activity of affinity grouping determines which things go together.
The most frequently this is used by retailer to determine what things go together in a shopping cart. This allows
retailers to plan arrangement of items on store shelves or in a catalog so that items often purchased together will
be seen together. Affinity grouping can also be used to identify cross-selling opportunities and to design
attractive packages or groupings of products or services.
Clustering – Clustering is the activity of segmenting a diverse group into a number of more similar subgroups or
clusters. What distinguishes clustering from classification is that clustering does not rely on predefined classes.
Records are grouped together based on self-similarity. A particular cluster of symptoms might indicate a
particular disease. Dissimilar clusters of video and music purchases might indicate membership in different
subcultures. Clustering is often done as a prelude to some other form of data mining or modeling. For example,
clustering might be the first step in a market segmentation effort. Instead of trying to come up with a one size
fits all rule for “what kind of promotion do customers respond to best:, first divide the customer base into
clusters of people with similar buying habits, and then ask what kind of promotion works best for each cluster.
Description and Visualization - Data visualization is one powerful form of descriptive data mining. It is not
always easy to come up with meaningful visualizations, but the right picture really can be worth thousands of
association rules. For example, “women support Democrats in greater numbers than do men”, this simple
description of what is in the data provokes tremendous interest.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
11
Example
A magazine publisher produces its magazines in a number of plants in the United States, which operate under
contract with the publisher. The publisher purchases paper in one-ton rolls and has it shipped from the mills to the
plants for conversion into magazines. Over the course of a year thousands of rolls of paper are consumed. The
publisher had an interest in reducing the paper waste within the conversion process. One of the key reasons the
publisher was sure that paper waste could be reduced substantially was the wide range of performance among the
printing plants.
The publisher's systems had vast quantities of extensive data. The publisher keeps detailed performance data on
every press run at every plant and on every roll of paper consumed in the process. The data is so extensive that a
page in a magazine can be traced to the section of the particular tree used to provide the pulp for the paper. But,
even with its sophisticated information and systems, the publisher did not have ready access to the explanation of
the uneven performance among the plants.
Through the effective application of data mining activities the publisher was able to define what best production
practices should be employed to reduce paper waste. The first step in the methodology was to put bounds around
what types of waste could be impacted by a change in practices, and to estimate to what extent the changes could
realistically reduce waste. Looking at the publisher's data, there were approximately seven different categories of
waste. Within these categories, a distinction was made between addressable and nonaddressable waste. Addressable
waste is avoidable waste. Once identifying the addressable waste, rules governing addressable waste were defined
from the bottom up using decision trees and association rules.
The results of the analyses provided information the company required to develop the changes to current production
practices. It also provided the publisher with the means to achieve a significant reduction in the annual waste levels.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
12
Different Techniques For Different Activities
Classification
Clustering
Affinity Groups
Prediction
Standard Statistics
Association Rules
Memory Based
Reasoning
Genetic Algorithms
Link Analysis
Decision Trees
Neural Networks
 An activity represents a desired outcome from our data mining effort. In order to achieve the outcome we use
different techniques. An example of a prediction activity is to determine which customers are likely to churn
based on their cellular phone usage, payment history and credit status.
 A technique is a group of mathematical operations, models or toolsets used that have proven successful in
data analysis, prediction, classification and clustering. An example of a link analysis technique is to use a tool to
model patient referrals between physicians to identify those engaged in illegal “daisy-chain” schemes for
insurance fraud. Each physician becomes a node in the link analysis and the referrals represent the links
between the nodes.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
13
Techniques Used in Data Mining
Memory Based Reasoning - A technique used to make predictions based on past behavior. By
matching unknown records with other records with known outcomes the goal is to predict the
behavior of the unknown records. Memory based reasoning is good for fraud applications where
new cases are similar to old cases.
Link Analysis - A technique based on graph theory. Data sets are represented by nodes on a graph
and the relationships between the nodes are represented by arcs. Parametric measurements can be
assigned to nodes and arcs to illustrate metrics associated with the node or the relationship.
Decision Trees - There are two main types of decision trees. Classification trees label records and
assign them to the proper class. Classification tress provide confidence the classification is
correct. Regression trees estimate the value of a target variable that takes on numeric values. So,
a regression tree might calculate the amount that a donor will contribute or the expected size of
claims made by an insured person. Every split in a tree is a test on a single variable, therefore,
decision trees can never discover rules that involve a relationship between variables. Accordingly,
decision trees are often chosen for their ability to generate understandable rules.
Neural Networks – Neural Networks are a good choice for most classification and predictions
tasks when the results of the model are more important than understanding how the model works.
Neural networks actually represent complex mathematical equations, with lots of summations,
exponential functions and many parameters. These equations describe the neural network, but are
quite opaque to human eyes. The equation is the rule of the network, and it is useless for the
customers’ consumption.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
14
Example
A large national distributor with over 300 branches and 8 distribution centers was experiencing rapid growth, shrinking gross
margins and ballooning inventories. The company had more than tripled its size in less than three years, primarily through
acquisitions. As a result of its rapid growth, the company’s managers were having difficulty controlling the broad scope of the
company’s business both from a growth perspective and from an inventory management perspective. Because its major growth
had come via acquisitions, its IT department was challenged by seven different legacy systems. The answer they thought lay in
a new enterprise solution, but the benefits of the proposed system were years away. In the mean time, the managers needed
better, more timely information to operate the business.
The company utilized data mining activities and techniques in conjunction with other technologies to create an innovative
solution which was deployable and yielding benefits within six months. First, the company addressed the challenge of getting
data from their seven legacy systems in a timely and accurate manner. Monthly reports did not come to the managers until 45
to 60 days after the end of the period. The task was made more challenging because of nomenclature and structure
inconsistencies across the various data files used in the seven separate systems, i.e. different SKU numbers across the stock
master record files for the same item.
Many man-years of effort were originally estimated to bring the stock record files into a consistent format and nomenclature.
Data mining tools were employed to complete a data extraction and analysis to define the range of possibilities of differing
SKU numbers for like-items within the stock master of over 160,000 SKUs. Once the task was defined, custom data sorting and
matching applications were run to filter and screen the data. To reduce the manpower required to complete this work,
applications were used in the majority of the instances to classify similar SKUs and to align and assign new SKU numbers.
This transformation of old SKU to new SKU was automated as a part of the recurring extraction and loading process. An
operational data store was created after data was filtered and cleaned. The operational data store was updated weekly (This
could have been accomplished daily.) with information on the SKUs and transaction data from the POS systems for sales and
inventory transactions. Following right behind this effort was the extracting, transforming and loading of data related to the
budget (down to the detail branch level) and transaction activity from the general ledger, human resource and other primary
applications. The operational data store became the information central from which managers kept up with the business.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
15
Getting Started
 Each of the cases presented demonstrate how data mining can yield substantial benefits to
businesses. These benefits can be at the top line from increased revenue or an expanded customer
base. As seen in the examples, benefits can be also from cost savings and enhanced productivity.
 Getting started typically involves a pilot project. This project should be short and well planned.
The project should be tightly focused on one very specific business need and involve business
managers up front and throughout the project.
 As you may guess from the description on the previous slides, these are not “overnight” projects.
They typically can consume up to three or more months of elapsed time and require a number of
people from different business functions to be involved. Because of the cross-functional personnel
requirements, it is always good to have a very senior member of the management team as the project
sponsor and champion.
 Before starting a data mining project be sure to evaluate the capabilities of the personnel assigned
to the project. The skill sets required for a successful data mining are not those typically found in
the IT department. As such the IT department frequently underestimates the required effort and
technical support required to manipulate the large volumes of data.
Berling Associates – Assisting Clients To Redefine How The Game Is Played
16
Where To Find Out More
 The success stories of Berling Associates’ technology partner
APower Solutions, Inc. are profiled in these authoritative best
selling books on data mining
 Please check out their website: http://www.apower.com
Berling Associates – Assisting Clients To Redefine How The Game Is Played
17