Data Mining Preliminary Approach to Card Fraud Detection

Download Report

Transcript Data Mining Preliminary Approach to Card Fraud Detection

Data Mining
Approach to Subscription Fraud
Detection for AT&T Cards
Hyunsook Lee, Summer Intern
Risk and Revenue Modeling Group, AT&T Labs
Supervised by Colin Goodall
AT&T Proprietary
1
Objective:
finding patterns in subscription fraud
Contents
a.
b.
c.
Background
Graphics
Association Rules
Discussion
AT&T Proprietary
2
Data mining
My definition :
finding patterns or systematic relationships
exploring data and TRANSFORMING them
to indicators of interest
Graphical Analysis
Using DATA MINING TOOLS
SAS Enterprise Miner
AT&T Proprietary
3
Subscription Fraud Detection Analysis
What is Fraud Subscription?
Why the analysis is needed
How to do it?
a.
b.
c.
d.
e.
Detecting subscription fraud from patterns of usage
High Usage : Thresholding, but not only that…
Other peculiar usage patterns : such as…
Understanding calling cards
Factors are possibly correlated
Design and create new signatures
graphics and association rules will help
AT&T Proprietary
4
Data Sets & properties
data sets: FASC, CARM
FRAT : contains fraudulent info
FPD : 1st default payment data
( ):indicates business
FRAT
FPD
#calls
604141(8040)
173234
#cards
7927(50)
917
#accounts
5053(34)
551
Period
358 days
158 days
focus on FRAT data to find specific patterns of fraudsters
AT&T Proprietary
5
Graphics..
# cards/(Paccount or BTN)
0.35
0.3
0.25
0.2
FRAT
FPD
0.15
0.1
0.05
0
1
3
5
AT&T Proprietary
7 and
more
6
7
Association Rules
What are Association Rules ?
customers’ item buying patterns
b. support : P(AB), confidence: P(A|B)
a.
How do we apply?
analyze calls of each card and generate
variables
b. Variable generation based on idea from
graphics and thresholding
a.
AT&T Proprietary
8
Variable generation & logics
Possible characters of fraudulent cards
a.
b.
c.
d.
e.
f.
Many international calls
Zero Length calls, No Recorded calls
Many calls
Long duration, High rate
Peculiar usage after certain period(such as 1
month)
Satisfy $ based threshold, etc.
AT&T Proprietary
9
NAME
Description
Reference
NonUSA
At least one calls made outside USA
FPD, 3rd
INT
> 4.3% international calls made
FPD, 3rd
Tint
> 1.5% calls terminated to outside USA
FPD, 3rd
Oint
At least one calls originated to outside USA
FPD, 3rd
NoRec
At least one calls recorded -1
FPD, out
ZeroL
All calls are zero length
FPD, 3rd
BusH
>54.5% calls made during business hours
FPD,3rd
LeisH
>65.2% calls made during leisure hours(evening, weekend)
FPD,3rd
NightH
>8.9% calls made during night hours
FPD,3rd
WkEnd
> 50% calls made during weekend
WkDay
> 50% calls made during weekdays
TLNcard
10 or 17 digits of card number
TCcard
6,7,8,9,16 digits of card number
AT&T, Commercial, LEC : Billing Number Content
…
More variable can be generated…
AT&T Proprietary
10
Results from by SAS Enterprise Miner
AT&T Proprietary
11
Frequency of items
AT&T Proprietary
12
Items generated by usage patterns, 60% confidence
AT&T Proprietary
13
Future work
Various approaches to generate
Variables and Association Rules
Classification methods are challenges:
TREE, Random Forest…
AT&T Proprietary
14