Achmea - LIACS Data Mining Group

Download Report

Transcript Achmea - LIACS Data Mining Group

Data mining in Health Insurance
Introduction
• Rob Konijn, [email protected]
– VU University Amsterdam
– Leiden Institute of Advanced Computer Science (LIACS)
– Achmea Health Insurance
• Currently working here
• Delivering leads for other departments to follow up
– Fraud, abuse
• Research topic keywords: data mining/
unsupervised learning / fraud detection
2
Outline
• Intro Application
– Health Insurance
– Fraud detection
• Part 1: Subgroup discovery
• Part 2: Anomaly detection (slides partly
by Z. Slavik, VU)
Intro Application
• Health Insurance Data
• Health Insurance in NL
–
–
–
–
Obligatory
Only private insurance companies
About 100 euro/month(everyone)+170 euro (income)
Premium increase of 5-12% each year
Achmea: about 6 million customers
Funding of Health Insurance Costs in
the Netherlands
rijksbijdrage
verzekerden 18-
vereveningsfonds
vereveningsfonds
vereveningsfonds
2 mld
inkomensafh.
bijdrage
werkgevers
vereveningsbijdrage
18 mld
17 mld
nominale premie 18+:
zorgverzekeraar
zorgverzekeraar
verzekerde
- rekenpremie (~€ 947/vrz): 12 mld
- opslag (~€ 150/vrz) : 2 mld
30 mld
zorguitgaven
Verevenings-model
Mannen
• By population
characteristics
–
–
–
–
Age
Gender
Income, social class
Type of work
• Calculation
afterwards
– High costs
compensation
(>15.000 euro)
0 - 4 jr
5 - 9 jr
10 - 14 jr
15 - 17 jr
18 - 24 jr
25 - 29 jr
30 - 34 jr
35 - 39 jr
40 - 44 jr
45 - 49 jr
50 - 54 jr
55 - 59 jr
60 - 64 jr
65 - 69 jr
70 - 74 jr
75 - 79 jr
80 - 84 jr
85 - 89 jr
90 jr e.o.
1,400
1,026
907
964
892
870
905
980
1,044
1,183
1,354
1,639
1,885
2,394
2,826
3,244
3,349
3,424
3,464
Vrouwen
1,210
936
918
1,062
1,214
1,768
1,876
1,476
1,232
1,366
1,532
1,713
1,905
2,201
2,560
2,886
3,018
3,034
3,014
Fraude in de zorg
Introduction Application:
The Data
• Transactional data
– Records of an event
– Visit to a medical practitioner
• Charged directly by medical practioner
• Patient is not involved
• Risk of fraud
Transactional Data
• Transactions:
Facts
– Achmea:
About 200 mln
transactions
per year
• Info of
customers and
practitioners:
dimensions
Different levels of hierarchy
• Records represent events
• However, for example for fraud detection, we are
interested in customers, or medical practitoners
• See examples next pages
• Groups of records: Subgroup Discovery
• Individual patients/practioners: outlier detection
Different types of fraud hierarchy
• On a patient level, or on a hospital level:
Handling different hierarchy
• Creating profiles from transactional data
• Aggregating costs over a time period
– Each record: patient
• Each attribute i =1 to n: cost spent on treatment i
• Feature construction, for example
– The ratio of long/short consults (G.P.)
– The ratio of 3-way and 2 way fillings (Dentist)
– Usually used for one-way analysis
Different types of fraud detection
• Supervised
– A labeled fraud set
– A labeled non-fraud set
– Credit cards, debit cards
• Unsupervised
– No labels
– Health Insurance, Cargo, telecom, tax etc.
Unsupervised learning in Health
Insurance Data
• Anomaly Detection (outlier detection)
– Finding individual deviating points
• Subgroup Discovery
– Finding (descriptions of) deviating groups
• Focus on differences and uncommon behavior
– In contrast to other unsupervised learning methods
• Clustering
• Frequent Pattern mining
Subgroup Discovery
• Goal: Find differences in claim behavior of medical
practitioners
• To detect inefficient claim behavior
– Actions:
• A visit from the account manager
• To include in contract negotiations
– In the extreme case: fraud
• Investigation by the fraud detection department
• By describing deviations of a practitioner from its peers
– Subgroups
Patient-level, Subgroup Discovery
• Subgroup (orange): group of patients
• Target (red)
– Indicates whether a patient visited a practitioner (1),
or not (0)
Subgroup Discovery: Quality Measures
• Target Dentist: 1672 patiënten
– Compare with peer group, 100.000 patients in
total
• Subgroup V11 > 42 euro : 10347 patients
– V11: one sided filling
• Crosstable
target
dentist rest
totaal
V11 >= 42
871
9476
10347
rest
801 88852
89653
totaal
1672 98328
100000
The cross table
• Cross table in
data
• Cross table
expected:
• Assuming
independence
V11 >= 42
rest
total
target dentist
871
801
1672
V11 >= 42
rest
total
target dentist
173
1499
1672
rest
9476
88852
98328
rest
10174
88154
98328
total
10347
89653
100000
total
10347
89653
100000
Calculating Wracc and Lift
•
•
•
V11 >= 42
rest
total
target dentist
871
801
1672
V11 >= 42
rest
total
target dentist
173
1499
1672
rest
9476
88852
98328
rest
10174
88154
98328
total
10347
89653
100000
total
10347
89653
100000
Size subgroup = P(S) = 0.10347, size target dentist = P(T) = 0.01672
Weighted Relative ACCuracy (WRAcc) = P(ST) – P(S)P(T) = (871 –
173)/100000 = 689/100000
Lift = P(ST)/P(S)P(T) = 871/173 = 5.03
Example dentistry,
at depth 1, one target dentist
ROC analysis, target dentist
Making SD more useful:
adding prior knowledge
• Adding prior knowledge
– Background variables patient (age, gender, etc.)
– Specialism practitioner
– For dentistry: choice of insurance
• Adding already known differences
– Already detected by domain experts themselves
– Already detected during a previous data mining run
Prior Knowledge, Motivation
Example, influence of prior knowledge
The idea: create an expected cross
table using prior knowledge
Quality Measures
• Ratio (Lift)
• Difference
(WRAcc)
• Squared sum
(Chi-square
statistic)
Example, iterative approach
• Idea: add subgroup to prior knowledge iteratively
• Target = single pharmacy
• Patients that visited the hospital in last 3 years removed
from data
• Compare with peer group (400,000 patients), 2929 patiënts
of target pharmacy
• Top subgroup : “B03XA01 (Erythropoietin)>0 euro”
1 ‘target’
pharmacy
subgroup
rest
T
F
B03XA01 > 0
T
1297
224
rest
F
1632
396,847
Next iteration
• Add “B03XA01 (EPO) >0 euro” to prior knowledge
• Next best subgroup: “N05AX08 (Risperdal)>= 500 euro”
Figure describing subgroup:
N05AX08 > 500
Left: target pharmacy, right: other
pharmacies
Addition: adding costs to quality
measure
– M55: dental cleaning
– V11: 1-way filling
– V21: polishing
• Cost of treatments in subgroup 370 euro (average)
• 791 more patients than expected
• Total quality 791*370 = 292,469 euro
Iterative approach, top 3
subgroups





V12: 2-sided filling
V21: polishing
V60: indirect pulpa covering
V21 and V60 are not allowed on the same day
Claim back (from all dentists): 1.3 million euro
3d isometrics, cost based QM
Other target types: double binary
target
• Target 1: year: 2009 or 2008
• Target 2: target practitioner
• Pattern:
– M59: extensive (expensive) dental cleaning
– C12: second consult in one year
• Crosstable:
Other target types: Multiclass target
• Subgroup (orange): group of patients
• Target (red), now is a multi-value column, one
value per dentist
Multiclass target, in ROC Space
Anemaly Detection
The example above contains a
contextual anomaly...
Outline Anomaly Detection
• Anomalies
–
–
–
–
Definition
Types
Technique categories
Examples
• Lecture based on
– Chandola et al. (2009). Anomaly
Detection: A Survey
– Paper in BB
38
Definition
• “Anomaly detection refers to the problem of
finding patterns in data that do not conform to
expected behavior”
• Anomalies, aka.
–
–
–
–
–
–
–
Outliers
Discordant observations
Exceptions
Aberrations
Surprises
Peculiarities
Contaminants
39
Anomaly types
Point anomalies
– A data point is anomalous with respect to the rest
of the data
40
Not covered today
• Other types of anomalies:
– Collective anomalies
– Contextual anomalies
• Other detection approaches:
– Supervised learning
– Semi supervised
• Assume training data is from normal class
• Use to detect anomalies in the future
We focus on outlier scores
• Scores
– You get a ranked list of anomalies
– “We investigate the top 10”
– “An anomaly has a score of at least 134”
– Leads followed by fraud investigators
• Labels
42
Detection method categorisation
1. Model based
2. Depth based
3. Distance Based
4. Information theory related (not covered)
5. Spectral theory related (not covered)
43
Model based
• Build a (statistical) model of the data
• Data instances occur in high probability regions of a
stochastic model, while anomalies occur in low
probability regions
• Or: data instances have a high distance to the model
are outliers
• Or: data instances have a high influence on the
model are outliers
Example: one way outlier detection
• Pharmacy records
• Records represent patients
• One attribute at a time:
– This example: attribute describing the costs spent
on fertility medication (gonodatropin) in a year
• We could use such one way detection for each
attribute in the data
Example, model = parametric
probability density function
Example, model = non-parametric
distribution
• Left: kernel density estimate
• Right: boxplot
Example: regression model
Other models possible
• Probabilistic
– Bayesian networks
• Regression models
– Regression trees/ random forests
– Neural networks
• Outlier score = prediction error (residual)
Depth based methods
• Applied on 1-4 dimensional datasets
– Or 1-4 attributes at a time
• Objects that have a high distance to the “center
of the data” are considered outliers
• Example Pharmacy:
– Records represent patients
– 2 attributes:
• Costs spent on diabetes medication
• Costs spent on diabetes testing material
Example: bagplot, halfspace depth
Distance based (nearest neighbor
based)
• Assumption:
– Normal data instances occur in dense neighbourhoods,
while anomalies occur far from their closest neighbours
Similarity/distance
• You need a similarity measure between two
data points
– Numeric attributes: Eucledian, etc.
– Nominal: simple match often enough
– Multivariate:
• Distance using all attributes
• Distance between attribute values, then combine
Example, dentistry data
• Records represent
dentists
• Attributes are 14 cost
categories
– Denote the percentage
of patients that received
a claim from the
category
Option 1:
Distance to kth neighbour as anomaly
score
Option 2:
Use relative densities of neighbourhoods
• Density of neighbourhood estimated for each
instance
• Instances in the low density neighbourhoods
are anomalous, others normal
• Note:
– Distance to kth neighbour is an estimate for the
inverse of density (large distance  low density)
– But this estimates outliers in varying density
neighbourhoods badly
56
LOF
• Local Outlier Factor:
• Local density:
Average local density of k nearest neighbours
Local density of instance
– k divided by the volume of the smallest hypersphere centred around the instance, containing k
neighbours
• Anomalous instance:
– Local density will be
lower than that of
the k nearest neighbours
57
Example LOF outlier, dentistry
3. Clustering based a.d.
techniques
• 3 possibilities;
1. Normal data instances belong to a cluster in
the data, while anomalies do not belong to
any cluster
– Use clustering methods that do not force all
instances to belong to a cluster
• DBSCAN, ROCK, SSN
2. Distance to the cluster center = outlier score
3. Clusters with too few points are outlying
clusters
59
K-means with 6 clusters, centers of the
dentistry data set
• Attributes: percent of
patient that received
claim from cost
category
• Clusters correspond
to specialism
1.
2.
3.
4.
5.
6.
Dentist
Orthodontist
Orthodontist
(charged by
dentist)
Dentist
Dentist
Dental hygenist
Combining Subgroup Discovery and
Outlier Detection
• Describe regions with outliers using SD
• Identify suspicious medical practitioners
• 2 or 3 step approach to describe outliers:
1. Calculate outlier score
2. Use subgroup discovery to describe regions with
outliers.
3. (optional) identify the involved medical
practitioners
Example output:
• Look at patients with ‘P30>1050 euro’ for
practitioner number 221
• Left: all data, right: practitioner 221
Descriptions of outliers:
LOCI outlier score
• 1. Calculate outlier score
– LOCI is a density based
outlier score
• 2. Describe outlying
regions
• Result top subgroup:
– Orthodontics (dentist) 0.044
^ Orthodontics 0.78
– Group of 9 dentists with an
average score of 3.9
Conclusions
• Health insurance: Interesting application
domain
– Very relevant
• Outlier Detection and Subgroup discovery are
useful