Transcript Alert!
Laila Fettah – Associate Sales Engineer SPSS
27 January 2011
Informatie Analyse
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Agenda
Government – Challenges
Data mining
CRISP-DM
Example Application
2
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Government faces challenges everyday…
Demonstrate
Effective Public
Policy
Ongoing Budget
Pressures
Lack of
Decision-Quality
Information
3
Transparency &
Accountability
Ongoing
Improvement,
Less Resources
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
…and must answer critical questions everyday...
Have new
crime
fighting
tactics
been
effective?
How
satisfied
do citizens
feel?
Have job
creation
programs
helped curb
benefits
applications?
4
How have
collection
strategies
impacted
budgets?
What fraud
patterns
are
emerging?
What is
likely to
happen in
the longterm?
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
…and silos often persist that impact outcomes...
5
Workforce/
HR
Information
Technology
Program
Execution
Executive
Leaders
Supply Chain
Services
Delivery
Budgeting &
Finance
Operations/
Readiness
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
…analytics can tear down silos
Information
Technology
Budgeting &
Finance
Public Safety
Staff
Communities
Supply Chain
Management
Operations/
Readiness
6
Program
Execution
Services
Delivery
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
What is data mining?
Finding patterns in your data that you can use to do your business
better
Business-oriented discovery of patterns
producing insight and a predictive capability
which can be deployed widely
Process of autonomously retrieving useful information or
knowledge (“actionable assets”) from large data stores or set
“Predictive analysis helps connect data to effective action by drawing
reliable conclusionsabout current conditions and future events.”
Gareth Herschel,
Research Director, Gartner Group
7
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
What’s in a name?
Data Mining is not a great metaphor
– Would mean people who dig for gold are “rock miners”!
Other early candidates:
– Knowledge Discovery in Databases (KDD)
– “Torturing the data until it confesses”
• “…and if you torture it long enough, it’ll confess to anything!”
8
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Traditional analyses
What
is the
What
doprofile
I do of the
repeat
offenders in my
NOW???
district?
By count of
By time
of
First
bytype
crime
offence
gender
or
offender?
Report 1
Report 2
Report 3
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Data Mining
What is the profile of the repeat
Youth gangs from cities A and
Ok,
so
to talk with the
offenders in my district?I need
B that are mostly active on
railway
and with
local
Thursday
night
in the center.
authorities
in
city
A
and
B….
There
arethat
several
profiles
for
Addicts
are mostly
active
repeat
The
most
aroundoffenders.
the central
station
as
I know
from
my
important
are….
pickunderstanding
pockets
of
Let me
………..
A that
descriptive
crime
gender,
Make
individual
think….
question
time,
place,
type of
profiles
crime, age can be
important
Data Mining
Technology
10
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Underlying analyses
11
Descriptive (KPI)
Predictive (KPP)
Statistics
Classification
Prescriptive
(Scenario)
Prediction
Profiling
Scoring
Scoring
Clustering
Prediction
Forecasting
Associations
Forecasting
What If
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
CRISP-DM
CRoss Industry Standard Process for Data Mining
– Funding from European commission
– Non-proprietary
– Application/Industry neutral
– Tool neutral
– Focus on business issues as well as technical analysis
– www.crisp-dm.org
Process framework for data mining projects
– Process Standardization
12
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
CRISP-DM phases
13
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Example Application Areas:
Public Safety
–Reduce crime
–Improve border protection
–Proactive disease surveillance
–Intrusion and insider threat
detection
Customs & Excise, Tax, Social
security
–Predict & prevent fraud
–Improve collections
–Focus investigators &
inspectors
14
Defense
–Increase battle readiness of
assets
–Improve employee acquisition,
retention & growth
Citizen satisfaction
–Implement continuous citizen
feedback loop
–Improve operational processes
……
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
15
If YES: advise DA and
later parole officer?
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
16
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Does it make sense to
send out a CSI team?
17
Is it likely that they’ll
find useful evidence?
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
18
Who are the key persons? Who are the leaders?
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
Crime Data
If YES: advise DA and
later parole officer?
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Surveillance
Data
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Communication Data
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
19
Who are the key persons? Who are the leaders?
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Aspiring Repeat Offender profile
…
If
male
And age 14-16
And crime =‘car break in’
And motive =‘peer pressure’
Then repeat risk is HIGH ALERT DA
…
Crime Data
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Surveillance
Data
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Communication Data
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
20
Who are the key persons? Who are the leaders?
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
Aspiring Repeat Offender profile
…
If
male
And age 14-16
And crime =‘car break in’
And motive =‘peer pressure’
Then repeat risk is HIGH ALERT DA
…
Crime Data
Crime profile Team 4
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Cluster ‘Bogus Official’
- Burglary,
- Visit by city official,
- Entry ‘Back door’,
- Victim “Elderly’
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Surveillance
Data
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Communication Data
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
21
Who are the key persons? Who are the leaders?
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
Aspiring Repeat Offender profile
…
If
male
And age 14-16
And crime =‘car break in’
And motive =‘peer pressure’
Then repeat risk is HIGH ALERT DA
…
Crime Data
Crime profile Team 4
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Cluster ‘Bogus Official’
- Burglary,
- Visit by city official,
- Entry ‘Back door’,
- Victim “Elderly’
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Surveillance
Data
CS profile No Deployment
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Does it make sense to
send out a CSI team?
Communication Data
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
22
…
If
Break In
And Night
And report>12hrs
And entry =‘broken window’
And object=‘Commercial Property’
Then probability evidence is 6%
…
Who are the key persons? Who are the leaders?
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
Aspiring Repeat Offender profile
…
If
male
And age 14-16
And crime =‘car break in’
And motive =‘peer pressure’
Then repeat risk is HIGH ALERT DA
…
Crime Data
Crime profile Team 4
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Cluster ‘Bogus Official’
- Burglary,
- Visit by city official,
- Entry ‘Back door’,
- Victim “Elderly’
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Surveillance
Data
CS profile No Deployment
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
23
Who are the key persons? Who are the leaders?
…
If
Break In
And Night
And report>12hrs
And entry =‘broken window’
And object=‘Commercial Property’
Then probability evidence is 6%
…
Communication Data
Key Players
Focus on:
• Keith Patterson
• Colin Wiertz
• Markus Haffey
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
PD uses predictive analytics to profile crimes & criminals to
improve solved crime rates and optimize resource usage
Johnny is arrested for breaking into a car
He is 15 years old and confesses that he wanted to
belong to a group of friends
Will he become a
repeat offender?
If YES: advise DA and
later parole officer?
Aspiring Repeat Offender profile
…
If
male
And age 14-16
And crime =‘car break in’
And motive =‘peer pressure’
Then repeat risk is HIGH ALERT DA
…
Crime Data
Crime profile Team 4
A citizen reports a burglary
Reports that her house was burglarized while she was
talking to a representative from the city council
Cluster ‘Bogus Official’
- Burglary,
- Visit by city official,
- Entry ‘Back door’,
- Victim “Elderly’
Predictive Modeling
for Crime Pattern
Detection
Crime record notes
and call logs
Does this crime resemble Do we have a team working on
others? Is it serial?
similar crimes that we can assign
it to?
Surveillance
Data
CS profile No Deployment
A Break-in into a shop is reported
The perpetrators entered by breaking a window probably
between 3am and 5am. Crime was discovered at 6 pm
next day
Does it make sense to
send out a CSI team?
Is it likely that they’ll
find useful evidence?
An organized crime unit wants to bust a drugs ring
The detectives are interested in identifying the central
players within a narcotics network
24
Who are the key persons? Who are the leaders?
…
If
Break In
And Night
And report>12hrs
And entry =‘broken window’
And object=‘Commercial Property’
Then probability evidence is 6%
…
Key Players
Focus on:
• Keith Patterson
• Colin Wiertz
• Markus Haffey
Communication Data
Management Dashboard
Financial Data
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Crime Data
Crime record notes
and call logs
Surveillance
Data
Communication Data
Financial Data
Capture
25
Predict
Act
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Crime Data
Crime record notes
and call logs
Surveillance
Data
Predictive Modeling for
Crime Pattern Detection
Communication Data
Financial Data
Capture
26
Predict
Act
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Crime Data
Criminal Career
Scoring Model
Crime record notes
and call logs
MO Typology Model
Surveillance
Data
Predictive Modeling for
Crime Pattern Detection
Crime Scene
Assessment Model
Communication Data
Financial Data
Capture
27
Predict
Act
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Alert!
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Aspiring Repeat
Offender Risk HIGH
Advise DA and inform
parole officer
Arresting Officer
Crime Data
Alert!
Criminal Career
Scoring Model
Crime record notes
and call logs
MO Typology Model
Surveillance
Data
Serial Crime Profile
MO fits Team 4
Case Assignment
Officer
Predictive Modeling for
Crime Pattern Detection
Alert!
Crime Scene
Assessment Model
Very Low Likelihood
Evidence
CSI Resource Probability <10%
No Deployment
Planner
Communication Data
Financial Data
Capture
28
Predict
Act
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Alert!
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Aspiring Repeat
Offender Risk HIGH
Advise DA and inform
parole officer
Arresting Officer
Crime Data
Alert!
Criminal Career
Scoring Model
Crime record notes
and call logs
MO Typology Model
Surveillance
Data
Serial Crime Profile
MO fits Team 4
Case Assignment
Officer
Predictive Modeling for
Crime Pattern Detection
Alert!
Crime Scene
Assessment Model
Very Low Likelihood
Evidence
CSI Resource Probability <10%
No Deployment
Planner
Key Players
Communication Data
Investigative Model
Template Repository
Investigating
Officer
Focus on:
•Keith Patterson
•Colin Wiertz
•Markus Haffey
Financial Data
Capture
Predict
Act
Feedback results
Feedback loop of new data to
improve and adapt predictions
29
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Alert!
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Aspiring Repeat
Offender Risk HIGH
Advise DA and inform
parole officer
Arresting Officer
Crime Data
Alert!
Criminal Career
Scoring Model
Crime record notes
and call logs
MO Typology Model
Serial Crime Profile
MO fits Team 4
Case Assignment
Officer
Predictive Modeling for
Crime Pattern Detection
Surveillance
Data
Alert!
Crime Scene
Assessment Model
Very Low Likelihood
Evidence
CSI Resource Probability <10%
No Deployment
Planner
Key Players
Communication Data
Financial Data
Analytical Process
Automation &
Optimization
Automate prediction &
deployment process
Capture
Analytical Process
Management &
Control
Monitor & manage
analytics process
Investigative Model
Template Repository
Predict
Investigating
Officer
Focus on:
•Keith Patterson
•Colin Wiertz
•Markus Haffey
Act
Feedback results
Feedback loop of new data to
improve and adapt predictions
30
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Alert!
Aspiring Repeat
Offender Risk HIGH
Advise DA and inform
parole officer
Management Dashboard
Automated Link Analysis
Profiles & Associations
Crime Pattern &
Hotspot Clustering
Arresting Officer
Crime Data
Alert!
Criminal Career
Scoring Model
Crime record notes
and call logs
MO Typology Model
Serial Crime Profile
MO fits Team 4
Case Assignment
Officer
Predictive Modeling for
Crime Pattern Detection
Surveillance
Data
Alert!
Crime Scene
Assessment Model
Very Low Likelihood
Evidence
CSI Resource Probability <10%
No Deployment
Planner
Key Players
Communication Data
Financial Data
Analytical Process
Automation &
Optimization
Automate prediction &
deployment process
Capture
Analytical Process
Management &
Control
Monitor & manage
analytics process
Investigative Model
Template Repository
Predict
Investigating
Officer
Focus on:
•Keith Patterson
•Colin Wiertz
•Markus Haffey
Act
Feedback results
Feedback loop of new data to
improve and adapt predictions
31
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Start from business understanding… not from data or technique…
32
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
…and use a methodology!
33
© 2011 IBM Corporation
Van informatie op Orde naar Informatie van Waarde – 27 januari 2011
Questions
34
© 2011 IBM Corporation