DATA MINING - University of Nebraska–Lincoln

Download Report

Transcript DATA MINING - University of Nebraska–Lincoln

Data Mining
David L. Olson
James & H.K. Stuart Professor in MIS
University of Nebraska Lincoln
Korea Telecom: KM1 Data Mining
David L. Olson
Definition
• DATA MINING: exploration & analysis
– by automatic means
– of large quantities of data
– to discover actionable patterns & rules
• Data mining a way to utilize massive
quantities of data that businesses generate
Korea Telecom: KM1 Data Mining
David L. Olson
Political Data Mining
Grossman et al., 10/18/2004, Time, 38
• 2004 Election
– Republicans: VoterVault
• From Mid-1990s
• About 165 million voters
• Massive get-out-the-vote
drive for those expected to
vote Republican
– Democrats: Demzilla
• Also about 165 million
voters
• Names typically have 200
to 400 information items
Korea Telecom: KM1 Data Mining
David L. Olson
Medical Diagnosis
J. Morris, Health Management Technology Nov
2004, 20,22-24
• Electronic Medical
Records
– Associated Cardiovascular
Consultants
• 31 physicians
• 40,000 patients per year,
southern NJ
– Data mined to identify
efficient medical practice
– Enhance patient outcomes
– Reduced medical liability
insurance
Korea Telecom: KM1 Data Mining
David L. Olson
Mayo Clinic
Swartz, Information Management Journal
Nov/Dec 2004, 8
• IBM developed EMR
program
– Complete records on almost
4.4 million patients
– Doctors can ask for how
last 100 Mayo patients with
same gender, age, medical
history responded to
particular treatments
Korea Telecom: KM1 Data Mining
David L. Olson
Retail Outlets
• Bar coding & Scanning generate masses of
data
–
–
–
–
–
customer service
inventory control
MICROMARKETING
CUSTOMER PROFITABILITY ANALYSIS
MARKET BASKET ANALYSIS
Korea Telecom: KM1 Data Mining
David L. Olson
FINGERHUT
• Founded 1948
–
–
–
–
today sends out 130 different catalogs
to over 65 million customers
6 terabyte data warehouse
3000 variables of 12 million most active
customers
– over 300 predictive models
• Focused marketing
Korea Telecom: KM1 Data Mining
David L. Olson
Fingerhut
• Purchased by Federated Department Stores
for $1.7 billion in 1999 (for database)
• Fingerhut had $1.6 to $2 billion business
per year, targeted at lower-income
households
• Can mail 400,000 packages per day
• Each product line has its own catalog
Korea Telecom: KM1 Data Mining
David L. Olson
Fingerhut
• Uses segmentation, decision tree,
regression, neural network tools from SAS
and SPSS
• Segmentation - combines order &
demographic data with product offerings
– can target mailings to greatest payoff
• customers who recently had moved tripled their
purchasing 12 weeks after the move
• send furniture, telephone, decoration catalogs
Korea Telecom: KM1 Data Mining
David L. Olson
Data for SEGMENTATION
cluster
subj
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
age
53
48
32
26
51
59
43
38
35
27
Korea Telecom: KM1 Data Mining
income
80000
120000
90000
40000
90000
150000
120000
160000
70000
50000
indices
marital grocery
wife
180
husband 120
single 30
wife
80
wife
110
wife
160
husband 140
wife
80
single 40
wife
130
David L. Olson
dine out
90
110
160
40
90
120
110
130
170
80
savings
30000
20000
5000
0
20000
30000
10000
15000
5000
0
Initial Look at Data
• Want to know features of those who spend a
lot dining out
• INCLUDE AS MANY ACTIONABLE
VARIABLES AS POSSIBLE
– things you can identify
• Manipulate data
– sort on most likely indicator (dine out)
Korea Telecom: KM1 Data Mining
David L. Olson
Sorted by Dine Out
cluster
subject
1004
1010
1001
1005
1002
1007
1006
1008
1003
1009
age
26
27
53
51
48
43
59
38
32
35
Korea Telecom: KM1 Data Mining
income
40000
50000
80000
90000
120000
120000
150000
160000
90000
70000
indices
marital grocery
wife
80
wife
130
wife
180
wife
110
husband 120
husband 140
wife
160
wife
80
single 30
single 40
David L. Olson
dine out
40
80
90
90
110
110
120
130
160
170
savings
0
0
30000
20000
20000
10000
30000
15000
5000
5000
Analysis
• Best indicators
– marital status
– groceries
• Available
– marital status might be easier to get
Korea Telecom: KM1 Data Mining
David L. Olson
Fingerhut
• Mailstream optimization
– which customers most likely to respond to
existing catalog mailings
– save near $3 million per year
– reversed trend of catalog sales industry in 1998
– reduced mailings by 20% while increasing net
earnings to over $37 million
Korea Telecom: KM1 Data Mining
David L. Olson
Banking
• Among first users of data mining
• Used to find out what motivates their
customers (reduce churn)
• Loan applications
• Target marketing
• Norwest: 3% of customers provided 44% profits
• Bank of America: program cultivating top 10% of
customers
Korea Telecom: KM1 Data Mining
David L. Olson
CREDIT SCORING
Bank Loan Applications
Age
24
20
20
33
30
55
28
20
20
39
Income
55557
17152
85104
40921
76183
80149
26169
34843
52623
59006
Assets Debts Want
27040 48191 1500
11090 20455
400
0 14361 4500
91111 90076 2900
101162 114601 1000
511937 21923 1000
47355 49341 3100
0 21031 2100
0 23054 15900
195759 161750
600
Korea Telecom: KM1 Data Mining
David L. Olson
On-time
1
1
1
1
1
1
0
1
0
1
Characteristics of Not On-time
Age
28
20
Income Assets Debts Want
26169
47355 49341 3100
52623
0 23054 15900
On-time
0
0
Here, Debts exceed Assets
Age Young
Income Low
BETTER: Base on statistics, large sample
supplement data with other relevant variables
Korea Telecom: KM1 Data Mining
David L. Olson
CHURN
• Customer turnover
• critical to:
–
–
–
–
telecommunications
banks
human resource management
retailers
Korea Telecom: KM1 Data Mining
David L. Olson
Identify characteristics of those
who leave
Age Time-job Time-town min bal checking
years
months months $
27
12
12
549
x
41
18
41
3259
x
28
9
15
286
x
55
301
5
2854
x
43
18
18
1112
x
29
6
3
0
x
38
55
20
321
x
63
185
3
2175
x
26
15
15
386
x
46
13
12
1187
x
37
32
25
1865
x
Korea Telecom: KM1 Data Mining
David L. Olson
savings card
x
x
x
x
x
x
x
x
x
x
x
x
loan
x
x
x
x
x
Analysis
• What are the characteristics of those who
leave?
– Correlation analysis
• Which customers do you want to keep?
– Customer value - net present value of customer
to the firm
Korea Telecom: KM1 Data Mining
David L. Olson
Correlation
Age
Age
1.0
Job
Town
Min-Bal
Check
Saving
Card
Loan
Korea Telecom: KM1 Data Mining
Time
Job
0.6
1.0
Time
Town
0.4
0.9
1.0
min-bal check
saving
card loan
-0.4
-0.6
-0.5
1.0
0.4
0.6
0.3
0.3
0.5
1.0
0.2
0.9
0.5
0.6
0.2
0.9
1.0
David L. Olson
0.0
0.1
-0.1
-0.2
1.0
0.3
-0.2
0.4
-0.1
0.2
0.3
0.5
1.0
Mortgage Market
• Early 1990s - massive refinancing
• need to keep customers happy to retain
• contact current customers who have rates
significantly higher than market
– a major change in practice
– data mining & telemarketing increased Crestar
Mortgage’s retention rate from 8% to over 20%
Korea Telecom: KM1 Data Mining
David L. Olson
Banking
• Fleet Financial Group
– $30 million data warehouse
– hired 60 database marketers,
statistical/quantitative analysts & DSS
specialists
– expect to add $100 million in profit by 2001
Korea Telecom: KM1 Data Mining
David L. Olson
Banking
• First Union
– concentrated on contact-point
– previously had very focused product groups,
little coordination
– Developed offers for customers
Korea Telecom: KM1 Data Mining
David L. Olson
CREDIT SCORING
• Data warehouse including demand deposits, savings,
loans, credit cards, insurance, annuities, retirement
programs, securities underwriting, other
• Statistical & mathematical models
(regression) to predict repayment
Korea Telecom: KM1 Data Mining
David L. Olson
CUSTOMER RELATIONSHIP
MANAGEMENT (CRM)
• understanding value customer provides to
firm
– Kathleen Khirallah - The Tower Group
• Banks will spend $9 billion on CRM by end of 1999
– Deloitte
• only 31% of senior bank executives confident that
their current distribution mix anticipated customer
needs
Korea Telecom: KM1 Data Mining
David L. Olson
Customer Value
Middle aged (41-55), 3-9 years on job, 3-9 years in town, savings account
year
annual purchases profit
discounted
net
1.3 rate
1
1000
200
153
153
2
1000
200
118
272
3
1000
200
91
363
4
1000
200
70
433
5
1000
200
53
487
6
1000
200
41
528
7
1000
200
31
560
8
1000
200
24
584
9
1000
200
18
603
10
1000
200
14
618
Korea Telecom: KM1 Data Mining
David L. Olson
Younger Customer
Young (21-29), 0-2 years on job, 0-2 years in town, no savings account
year
annual purchases profit
discounted
net
1.3
1
300
60
46
46
2
360
72
43
89
3
432
86
39
128
4
518
104
36
164
5
622
124
34
198
6
746
149
31
229
7
896
179
29
257
8
1075
215
26
284
9
1290
258
24
308
10
1548
310
22
331
Korea Telecom: KM1 Data Mining
David L. Olson
Credit Card Management
• Very profitable industry
• Card surfing - pay old balance with new
card
• promotions typically generate 1000
responses, about 1%
• in early 1990s, almost all mass-marketing
• data mining improves (lift)
Korea Telecom: KM1 Data Mining
David L. Olson
LIFT
• LIFT = probability in class by sample divided by
probability in class by population
– if population probability is 20% and
sample probability is 30%,
LIFT = 0.3/0.2 = 1.5
• best lift not necessarily best
need sufficient sample size
as confidence increases, longer list but lower lift
Korea Telecom: KM1 Data Mining
David L. Olson
Lift Example
• Product to be promoted
• Sampled over 10 identifiable segments of
potential buying population
– Profit $50 per item sold
– Mailing cost $1
– Sorted by Estimated response rates
Korea Telecom: KM1 Data Mining
David L. Olson
Lift Data
Seg Rate
Rev
Cost Profit
Seg Rate
Rev
Cost Profit
1
0.042 $2.10 $1
$1.10
6
0.013 $0.65 $1
-$0.35
2
0.035 $1.75 $1
$0.75
7
0.009 $0.45 $1
-$0.55
3
0.025 $1.25 $1
$0.25
8
0.005 $0.25 $1
-$0.75
4
0.017 $0.85 $1
-$0.15 9
0.004 $0.20 $1
-$0.80
5
0.015 $0.75 $1
-$0.25 10
0.001 $0.05 $1
-$0.95
Korea Telecom: KM1 Data Mining
David L. Olson
Lift Chart
Cumulative Proportion
LIFT
1.2
1
0.8
Cum Response
0.6
Random
0.4
0.2
0
0
1
2
3
4
5
6
7
8
Segment
Korea Telecom: KM1 Data Mining
David L. Olson
9 10
Profit Impact
PROFIT
12
10
Dollars
8
6
Cum Revenue
4
Cum Cost
2
Cum Profit
0
-2
0
1
2
3
4
5
6
7
-4
Segment
Korea Telecom: KM1 Data Mining
David L. Olson
8
9 10
INSURANCE
• Marketing, as retailing & banking
• Special:
– Farmers Insurance Group - underwriting system
generating $ millions in higher revenues, lower
claims
• 7 databases, 35 million records
– better understanding of market niches
• lower rates on sports cars, increasing business
Korea Telecom: KM1 Data Mining
David L. Olson
Insurance Fraud
• Specialist criminals - multiple personas
• InfoGlide specializes in fraud detection
products
– similarity search engine
• link names, telephone numbers, streets, birthdays,
variations
• identify 7 times more fraud than exact-match
systems
Korea Telecom: KM1 Data Mining
David L. Olson
Insurance Fraud - Link Analysis
claim
type
amount physician
back
50000 Welby
neck
80000 Frank
arm
40000 Barnard
neck
80000 Frank
leg
30000 Schmidt
multiple 120000 Heinrich
neck
80000 Frank
back
60000 Schwartz
arm
30000 Templer
internal 180000 Weiss
Korea Telecom: KM1 Data Mining
attorney
McBeal
Jones
Fraser
Jones
Mason
Feiffer
Jones
Nixon
White
Richards
David L. Olson
Insurance Fraud
• Analytics’ NetMap for Claims
– uses industry-wide database
– creates data mart of internal, external data
– unusual activity for specific chiropractors,
attorneys
• HNC Insurance Solutions
– workers compensation fraud
• VeriComp - predictive software (neural nets)
David L. Olson
– saved Utah over $2 million
Korea Telecom: KM1 Data Mining
TELECOMMUNICATIONS
• Deregulation - widespread competition
– churn
• 1/3rd poor call quality, 1/2 poor equipment
– wireless performance monitor tracking
• reduced churn about 61%, $580,000/year
– cellular fraud prevention
– spot problems when cell phones begin to go
bad
Korea Telecom: KM1 Data Mining
David L. Olson
Telecommunications
• Metapath’s Communications Enterprise
Operating System
– help identify telephone customer problems
• dropped calls, mobility patterns, demographics
• to target specific customers
– reduce subscription fraud
• $1.1 billion
– reduce cloning fraud
• cost $650 million in 1996
Korea Telecom: KM1 Data Mining
David L. Olson
Telecommunications
• Churn Prophet, ChurnAlert
– data mining to predict subscribers who cancel
• Arbor/Mobile
– set of products, including churn analysis
Korea Telecom: KM1 Data Mining
David L. Olson
TELEMARKETING
• MCI uses data marts to extract data on
prospective customers
– typically a 2 month program
– 20% improvement in sales leads
– multimillion investment in data marts &
hardware
– staff of 45
– trend spotting (which approaches specific
customers like) David L. Olson
Korea Telecom: KM1 Data Mining
Telemarketing
• Australian Tourist Commission
– maintained database since 1992
• responses to travel inquiries on tours, hotels,
airlines, travel agents, consumers
• data mine to identify travel agents & consumers
responding to various media
• sales closure rate at 10% and up
• lead lists faxed weekly to productive travel agents
Korea Telecom: KM1 Data Mining
David L. Olson
Telemarketing
• Segmentation
– which customers respond to new promotions, to
discounts, to new product offers
– Determine who
• to offer new service to
• those most likely to commit fraud
Korea Telecom: KM1 Data Mining
David L. Olson
Human Resource Management
• Identify individuals liable to leave company
without additional compensation or benefits
• Firm may already know 20% use 80% of
offered services
– don’t know which 20%
– data mining (business intelligence) can identify
• Use most talented people in highest
priority(or most profitable) business units
Korea Telecom: KM1 Data Mining
David L. Olson
Human Resource Management
• Downsizing
– identify right people, treat them well
– track key performance indicators
– data on talents, company needs, competitor
requirements
• State of Mississippi’s MERLIN network
– 30 databases (finance, payroll, personnel,
capital projects)
Cognos
- 230 users
Korea –
Telecom:
KM1 - Impromptu
Davidsystem
L. Olson
Data Mining
CASINOS
• Casino gaming one of richest data sets
known
• Harrah’s - incentive programs
– about 8 million customers hold Total Gold
cards, used whenever the customer spends
money in the casino
– comprehensive data collection
• Trump’s Taj Card similar
Korea Telecom: KM1 Data Mining
David L. Olson
Casinos
• Bellagio & Mandelay Bay
– strategy of luxury visits
– child entertainment
– change from old strategy - cheap food
• Identify high rollers - cultivate
– identify those to discourage from play
– estimate lifetime value of players
Korea Telecom: KM1 Data Mining
David L. Olson
ARTS
• computerized box offices leads to high
volumes of data
• Identify potential consumers for shows
• software to manage shows
– similar to airline seating chart software
Korea Telecom: KM1 Data Mining
David L. Olson