Impossibility Mining

Download Report

Transcript Impossibility Mining

Impossibility Mining
Traditional Data Mining
Using multidimensional data to find
previously unknown hidden relationships
Not just simple query/joins
Canonical: Diapers and Beer at Walmart

Urban Legend – comes from 1992 Teradata
study of Osco.
Correlation!=Causation
Terminology currently has negative
connotations in the press
Il buono, il brutto, il cattivo
3 categories of “data mining” for fraud



Profiling (il brutto)
Probability Mining (il cattivo)
Anomaly Detection (il buono)
Profiling
Looking for a series of characteristics which identify a
likely problem
Demographic Profiling:


Looking for a series of personal identifiers to determine likely
suspects
Example: Corporate data thieves tend to be males between 30
and 40 years of age
Behavior Profiling:


Looking for a series of behaviors which indicate likely suspects
Example: Corporate data thieves are more likely to work
weekends, not take vacations, and be generally highly rated
Profiling - Issues
Demographic profiling, no matter how
good, will likely end up with you on CNN
Base Rate Fallacy: The profile needs to
be extraordinarily close to 100% for a
population of any size.
Probability Mining
Identifying high probability issues to target
Can be applied to profiling or anomaly detection
Good for sliding thresholds with competing
business drivers
Example: Stolen credit cards are more likely to
be used at electronics stores for high ticket
items. Applied to a particular profile, a plasma
TV purchase may have a 10% chance of being
fraudulent.
Probability Mining - Issues
Business drivers need to be considered

Is it worth it to bother 10 legitimate credit card
holders to find 1 stolen card? What about
100? 1000?
Probability generation requires a lot of
data and a pre-labeled dataset to be
useful
Anomaly Detection
Sesame Street analysis
Relies on finding outliers in data
Does not require a priori expert knowledge
of the data
Does require après-analysis expert
knowledge to interpret outliers
Case Example: Anomaly Detection
Product launch event - $1.5 Million budget
Launch directors had authority for procurements
up to $10,000
Report received of a “person directing the
launch event gave a lot of vendor work to his
brother-in-law”
There were ~25 recent launch events that this
could refer to, 10 of which were male-directed
Looked at the financials for each launch event
Data
Event Launch Purchases
Amount
Consulting – Marketing Support
$9,512.00
Supplies - General
$250.12
Consulting - Advertising
$9,832.00
Supplies – Plasma TV Rental
$9,814.22
Supplies - Catering
$1,233.22
Consulting – Launch Support
$9,763.00
Supplies – Secondary Plasma TV
$9,814.22
Mileage - Reimbursement
$252.84
Benford
Anomaly Detection – How we
Found ‘em
Benford’s Law


Take a look at both the last and first digits
Distribution is well of predictions
Nearness-to-threshold


Distribution should not be a logarithmic
decline from approval threshold
Nothing was over threshold…
Common Sense

Plasma TV Rentals - $10K to rent? Why 2?
Results
Subject hired their brother-in-law to do
phantom consulting
Subject rented plasma TVs with a $1
buyout option
Case Example: Geospatial
Anomalies
Problem: Identify web activity that is
spurious in nature
Application: Successfully applied to
internal user data (activity logs) as well as
external data (attacks)
User Data
User Data – Plotted as Anomalies
Outliers – What Were They?
Outlier Categorization
7%
3%
3%
Foreign Users
14%
Gambling
False Positives
Pornography
10%
63%
Dating Websites
Spyware
Impossibility Mining
Is NOT data mining
IS an application of control testing
Looks for patterns that cannot exist in any
model of reasonable likelihood
Can be single or multifactor
Only identifies real outliers
Impossibility Mining Example –
Single Factor
Asset Management


IT Asset Management software installed on all
machines in a company
Cataloged installed hardware and software at
different points in time
Proactive Look


Identify any computers where installed
memory at time T is less than or equal to T-1
Identified several hundred laptops from
remote office users that met the criteria
Impossibility Mining Example –
Single Factor, cont’d
Identified commonality in laptops




All laptops were serviced by the same IT
support location
Found the drop in memory was consistent
with the last “upgrade”
Reviewed eBay activity of the local IT support
personnel
Found the thief, who was removing half of the
memory from laptops of non-power users and
selling it!
Impossibility Mining – Dual Factor
Electronic Funds Transfer Investigation
Payment Process




Manager takes in payment request and assigns to a
clerk
Clerk enters payment information and selects a payee
Manager enters EFT information for the payee and
confirms transaction (cannot change amount)
Division Head confirms name on account, amount,
and releases funds
Question: Does fraud require collusion?
Impossibility Mining – Dual Factor,
cont’d
EFT Audit

Compared actual EFTs for internal consistency
Looked for EFTs where the customer ID was the same, but
the bank routing number was different
Identified a manager who was manually changing routing
information to funnel to her husband’s account
3rd set of eyes (Division Head) did not help – ineffective
control

Two process changes
Only Division Head can add EFT information
Automated check implemented to ID bank name != routing
number
Impossibility Mining – Data Joining
Unauthorized Computer Access



Created a table of physical sites
Calculated the minimum travel time between
sites
Identified anyone logging in to a machine at 2
sites where time between logins < minimum
travel time
Impossibility Mining – Data Joining,
cont’d
Identified several stolen passwords


Also highlighted password sharing
… as well as user passwords hard-coded in
applications
Impossibility Mining - Conclusions
The less likely for something to occur, the better
the candidacy for impossibility mining
Can always implement controls to prevent the
“impossibilities”, but they are not always
implemented correctly
Best example in the media: Insurance fraud
case - men were claiming hysterectomies,
ovarian cyst removal, PAP tests…
Questions
…Other than can we go yet?