The COPLINK Experience - Artificial Intelligence Laboratory

Download Report

Transcript The COPLINK Experience - Artificial Intelligence Laboratory

Community-based
Security Informatics
Research: The
COPLINK Experience
Acknowledgement: NSF,
CIA/ITIC, DHS, NIJ/DOJ,
NLM/NIH, COPS, TPD,
PPD, KCC
Hsinchun Chen, Ph.D.
Director,
COPLINK Center of Excellence,
Artificial Intelligence Lab,
Hoffman E-Commerce Lab,
University of Arizona
Outline
•
COPLINK Background and Research Framework
•
COPLINK Connect and Detect: Community-based
Research
•
COPLINK STV, Agent, and Deception Detection
Research
•
COPLINK Visual Criminal Network Analysis
Research
•
From COPLINK to BorderSafe and Terrorism
Research
Outline
COPLINK Background and Research
Framework
Introduction
• The concern about national security has increased
significantly since the terrorist attack on September
11, 2001
• Intelligence agencies such as the CIA and FBI are
actively collecting and analyzing information to
investigate terrorists’ activities
• Local law enforcement agencies have also become
more alert to criminal activities in their own
jurisdictions that may be relevant to national security
COPLINK Progression
1990-present NSF CISE funding (IIS, Digital Government,
Digital Library, NSDL, ITR, IDM, CSS), NLM/NIH
(medical informatics), DARPA
1997
NIJ COPLINK funding; Web-enabled data warehousing
for law enforcement
2000
NIJ AGILE interoperability funding; information sharing
2001
NSF Digital Government funding; data/text mining,
agents, and knowledge management; COPLINK Center
2002
NSF/CIA KDD funding; intelligence community
2003
DHS BorderSafe funding; NSF/CIA disease informatics
(bioterrorism) funding; NSF ITR funding, terrorism portal
Goal: A model and testbed for law enforcement and national
security research
Crime Types
Type
Local Law Enforcement Level
National Security Level
Traffic
Violations
Driving under the influence (DUI), fatal or
personal injury, property damage, traffic
accident, road rage
-
Sex Crime
Sexual offenses, sexual assault, child
molesting
Organized prostitution
Theft
Robbery, burglary, larceny, motor vehicle
theft, stolen property
Theft of national secrets or weapons
Fraud
Forgery and counterfeiting, frauds,
embezzlement, identity deception
Transnational money laundering, identity
fraud, transnational financial fraud
Arson
Arson on buildings, apartments
Gang / drug
offenses
Narcotic drug offenses (sales or
possession)
Transnational drug trafficking
Violent Crime
Criminal homicide, armed robbery,
aggravated assault, other assaults
Terrorism (bioterrorism, bombing, hijacking,
etc.)
Cyber Crime
Internet frauds (e.g., credit card fraud, advance fee fraud, fraudulent Internet banking
sites), illegal trading, network intrusion/hacking, virus spreading, netspionage, cyberpiracy, cyber-pornography, cyber-terrorism, theft of confidential information, hate crime
-
The COPLNK Research Framework
Building the Science of
Intelligence and Security
Informatics
Outline
COPLINK Testbed: Data Characteristics
Information Sharing and Interoperability
Tucson PD Data Sources
• TPD Record Management System:
Stores a wide range of information from incident reports to warrants to
pawn tickets, from person descriptions to vehicles to weapons and
property items. Incident data goes back as early as 1983.
Database: Litton PRC RMS31 on Oracle 7.3, Compaq OpenVMS
• TPD Mug Shot Database:
Stores about 90,000 mug shots taken by the ID Department.
Database: ImageWare on SQL Server 7.0, Windows NT 4.0 Server
• TPD Gang Database:
Stores comprehensive information about 3,200 gang members: their
activities, aliases, physical descriptions, vehicles, etc.
Database: In House Access 97, Windows NT 4.0 Server
Tucson PD RMS Documents
• Incident Reports:
Report number, crime type,
precinct, MOs, date and time.
Number of Documents
2,500,000
• Pawn Tickets:
Ticket number, data and time.
• Warrants:
Warrant number, docket
number, type and issue date.
• Field Interviews:
FI number, type, precinct, date
and time.
150,000
Reports
Pawn Tickets
65,000
45,000
Warrants
Field Interviews
Tucson PD RMS Data Objects
• Person:
True names, aliases, descriptions,
addresses, IDs, marks and phone
numbers.
Number of Data Objects
1,300,000
• Organization:
Name, address and phones.
• Vehicle:
VIN, license plate, make, model,
style, year and colors.
420,000
• Property:
400,000
Serial number, type, make, model,
size and colors.
85,000
39,000
• Weapon:
Serial number, type, manufacturer,
caliber and colors.
Person
Property
Vehicle
Organization
Weapon
COPLINK Database: Tucson PD
Coplink 2.5 Database Size, TPD Node
16 GB
7 GB
Data
Indices
COPLINK Documentation
OBJECTS
PK
Sample COPLINK
ERD, Entity
Relationship
Diagram
OBJECTPK
OBJECTTYPE
OBJECTDESC
PERSONS
L_EYECOLTYPES
PK
PK,FK5 PERSONPK
COLORTYPE
COLORCODE
COLORDESC
COLORRANK
FK4
FK2
L_HAIRCOLTYPES
PK
COLORTYPE
COLORCODE
COLORDESC
COLORRANK
FK1
FK3
REALNAME
DOB
RACE
GENDER
MINDOB
MAXDOB
MINHEIGHT
MAXHEIGHT
MINWEIGHT
MAXWEIGHT
EYECOLOR
HAIRCOLOR
GANGFLAG
CAUTIONFLAG
WANTEDFLAG
PAWNERFLAG
FBIID
SID
LOCALID
FNGRPRTID
DNAID
PHOTOFILENAME
PHOTOIMAGE
L_RACETYPES
PK
RACETYPE
RACECODE
RACEDESC
RACERANK
L_GENDERTYPES
PK
GENDERTYPE
GENDERCODE
GENDERDESC
GENDERRANK
COPLINK Documentation
COPLINK Data Dictionary: 217
Tables, 1000 attributes
TABLE NO TABLE NAME
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
COLUMN NAME ORDER DATATYPE SIZE DEC NULL PK
PERSONPK
REALNAME
DOB
RACE
GENDER
MINDOB
MAXDOB
MINHEIGHT
MAXHEIGHT
MINWEIGHT
MAXWEIGHT
EYECOLOR
HAIRCOLOR
GANGFLAG
CAUTIONFLAG
WANTEDFLAG
PAWNERFLAG
FBIID
SID
LOCALID
FNGRPRTID
DNAID
PHOTOFILENAME
PHOTOIMAGE
1 NUMERIC
2 VARCHAR
3 VARCHAR
4 NUMERIC
5 NUMERIC
6 VARCHAR
7 VARCHAR
8 NUMERIC
9 NUMERIC
10 NUMERIC
11 NUMERIC
12 NUMERIC
13 NUMERIC
14 NUMERIC
15 NUMERIC
16 NUMERIC
17 NUMERIC
18 VARCHAR
19 VARCHAR
20 VARCHAR
21 VARCHAR
22 VARCHAR
23 VARCHAR
24 IMAGE
18
320
8
2
1
8
8
4
4
5
5
2
2
1
1
1
1
100
100
100
100
100
255
0
0
0
0
0
1
1
0
0
0
0
0
0
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
FK TABLE
1 OBJECTS
FK COLUMN
OBJECTPK
L_RACETYPES
L_GENDERTYPES
RACETYPE
GENDERTYPE
L_EYECOLTYPES
L_HAIRCOLTYPES
COLORTYPE
COLORTYPE
COPLINK Data Formats
•
Delimited ASCII text files
•
SQL Server 2000 backup file
•
SQL Server 2000 detached database
•
Oracle 8i/9i dump file
•
Oracle 8i/9i transportable tablespace
•
DB2 UDB 7 backup file
•
TPD data available: 10/1/2002, PPD data: 2/1/2003
Information Management Challenges:
Tucson PD Data Across all Crime Types
• Incident Reports:
Report number, crime type,
precinct, MOs, date and time.
Number of Documents
2,500,000
• Pawn Tickets:
Ticket number, data and time.
• Warrants:
Warrant number, docket
number, type and issue date.
• Field Interviews:
FI number, type, precinct, date
and time.
150,000
Reports
Pawn Tickets
65,000
45,000
Warrants
Field Interviews
Information Management Challenges:
Sample COPLINK Table
COPLINK Data Dictionary: 217
Tables, 1000 attributes
TABLE NO TABLE NAME
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
181 PERSONS
COLUMN NAME ORDER DATATYPE SIZE DEC NULL PK
PERSONPK
REALNAME
DOB
RACE
GENDER
MINDOB
MAXDOB
MINHEIGHT
MAXHEIGHT
MINWEIGHT
MAXWEIGHT
EYECOLOR
HAIRCOLOR
GANGFLAG
CAUTIONFLAG
WANTEDFLAG
PAWNERFLAG
FBIID
SID
LOCALID
FNGRPRTID
DNAID
PHOTOFILENAME
PHOTOIMAGE
1 NUMERIC
2 VARCHAR
3 VARCHAR
4 NUMERIC
5 NUMERIC
6 VARCHAR
7 VARCHAR
8 NUMERIC
9 NUMERIC
10 NUMERIC
11 NUMERIC
12 NUMERIC
13 NUMERIC
14 NUMERIC
15 NUMERIC
16 NUMERIC
17 NUMERIC
18 VARCHAR
19 VARCHAR
20 VARCHAR
21 VARCHAR
22 VARCHAR
23 VARCHAR
24 IMAGE
18
320
8
2
1
8
8
4
4
5
5
2
2
1
1
1
1
100
100
100
100
100
255
0
0
0
0
0
1
1
0
0
0
0
0
0
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
N
Y
FK TABLE
1 OBJECTS
FK COLUMN
OBJECTPK
L_RACETYPES
L_GENDERTYPES
RACETYPE
GENDERTYPE
L_EYECOLTYPES
L_HAIRCOLTYPES
COLORTYPE
COLORTYPE
Outline
COPLINK Connect and Detect: Community-based
Research
User-centered Design, Information Sharing, Information
Retrieval, HCI, and Association Rule Mining
COPLINK Connect: Information
Sharing
Consolidating & sharing information promotes
problem solving and collaboration
Records
Management
Systems (RMS)
Gang Database
Mugshots
Database
COPLINK Connect Functionality
• Generic, common XML based criminal elements representation
• Data migration (batch and incremental) and mapping for all major
databases and legacy systems
• Database independent: ODBC compliance data warehouse
• Multi-layered Web-based architecture: database server, Web
server, browser
• Powerful and flexible search tools for various reports, e.g.,
incidents, warrants, pawns, etc.
• Graphical browser-based GUI interface for ease of use, training
and maintenance
H. Chen, J. Schroeder, R. V. Hauck, L. Ridgeway, H. Atabakhsh, H. Gupta, C. Boarman, K.
Rasmussen, and A. W. Clements, “COPLINK Connect: Information and Knowledge Management
for Law Enforcement,” Decision Support Systems, Special Issue on Digital Government, 2003.
COPLINK Detect: Crime Analysis
Consolidated information enables targeted problem solving
via powerful investigative criminal association analysis
COPLINK Detect Functionality
• Simple association rule mining applied to criminal
elements relationships
• Generic, common XML based representation for
criminal relationships
• Incremental data migration and association analysis
on databases
• Support powerful, multi-attribute queries using partial
crime information
• Graphical browser-based GUI interface for simple
crime relationship analysis and case retrieval
H. Chen, D. Zeng, H. Atabakhsh, W. Wyzga, J. Schroeder, “COPLINK: Managing
Law Enforcement Data and Knowledge,” Communications of the ACM, 2003.
COPLINK Detect 2.0/2.5
COPLINK Connect/Detect Deployment
• Tucson, Phoenix (Arizona)
• Huntsville (Texas)
• Montgomery County (Maryland)
• Polk County/Des Moines (Iowa)
• Ann Arbor (Michigan)
• Boston (Massachusetts)
• Redmond (Washington)
• Henderson County (North Carolina)
• Shawnee County (Kansas)
• San Diego (CA)
• Pima County, Arizona DHS (Arizona)
• State of Alaska, Los Angeles (CA)
Serving 20+ states, 300+ agencies, protecting 30M+ citizens
Outline
COPLINK STV, Deception Detection and
Agent Research
Visualization, HCI, Agent, Data Mining
COPLINK Spatial-Temporal Visualization: Timeline Tool
•
Visualizes the chronologically
ordered set of events associated
with user-selected database
entities
•
Events placed along horizontal
axis
•
Entities placed along vertical axis
– Entities can be grouped
together
– Each row contains all events
associated with the entities in
a group
•
Time-based Zooming
– User can zoom into a specific
time interval for more detail,
while hiding uninteresting
portions of the timeline
COPLINK Spatial-Temporal Visualization: GeoMapping Tool
•
Plots location of incident events
within a selected time interval
•
Zooming/panning capabilities
•
User-selectable GIS layers
•
Overview map
– Provides context to the
currently selected region
•
Plot events over time
– Plot events as they occur, use
different color shadings to
indicate when it occurred
relative to other events
– Plot events as they occur and
remove them after they are
over, using directed arrows to
highlight movement from one
event to the next in time
COPLINK Spatial-Temporal Visualization: Periodic Pattern Tool
•
Reveals periodic patterns of
incident occurrence
•
Incident events will be plotted
continuously on a circular graph
– Time period represented along
circle (day, week, month, etc.)
– Height from center indicates
number of incidents that
occurred at that specific time
•
Customizable granularity (e.g.
year, month, day, etc.)
•
3-sigma statistical significance line
– Indicates unusually large or
small number of occurrences
at a specific time
COPLINK Data Mining Research
Deception Detection, a data mining approach
•
•
•
•
•
•
“An agent must spell a suspect’s name exactly right, or the FBI computer will
not recognize it. That can be particularly frustrating in cases such as the
Sept. 11 probe, in which suspects have used multiple names and sometimes
created identities by switching a few letters in their names.” – FBI
FBI’s problem with 9/11 suspect names, e.g., “Majed M.GH Moqed,” “Majed
Moqed,” and “Majed Mashaan Moqed,” and DOB, e.g., “01-01-1976” and
“03-03-1976.”
A deception taxonomy was created based on criminal deceptions in law
enforcement databases
Patterns existed in criminal deceptions, e.g., SSN variations, name
variations, etc.
Phonetic and syntactic string comparators are adopted
Promising initial testing result: 94% accuracy in deception detection
G. Wang, H. Chen, H. Atabakhsh, “Automatically Detecting Deceptive Criminal
Identities,” Communications of the ACM, forthcoming, 2002.
A Taxonomy for Deceptions in Criminal Identity
Criminal Identity
Deception
DOB Deception
ID Deception
Residency
Deception
Name Deception
Birth year deception
Birth month deception
Birth date deception
Street type deception
Street direction deception
Street name deception
Street number deception
Name exchange
Partly missing names
Similar pronunciation
Abbreviation and add-on
Changing middle inital
Completely deceptive name
Partly deceptive name
A Taxonomy of Deceptions in Criminal Identity: Name
Deception
• Name Deception:
– Either false first name or false last name (62.5%)
– Only the middle initial is changed (62.5%)
– Similar pronunciation but different spelling (42%)
– A Completely false name (29.2%)
– Using abbreviated names or adding extra letters
(29.2%)
– Leaving out the first name or last name (29.2%)
– Exchanging last name and first name (8%)
A Taxonomy of Deceptions in Criminal Identity: DOB, SSN,
Residency
• DOB and ID (SSN) deception:
– In most cases, criminals only make minor changes
in DOB and SSN, e.g., 19700207  19700208
• Residency deception:
– 42% criminals in the collection deceived on
address information. In most cases, only one
portion of the address is changed slightly, e.g.,
street number.
String Comparators
• Phonetic Russell SoundEx code: Newcombe [1959],
encodes a name with a format having a prefix letter
followed by a three-digit number,
– e.g., PEARCE and PIERCE both coded as: “P620”.
However, phonetic matching is particularly poor at finding
matches [Zobel and Dart 1996];
• Spelling string comparator [Jaro 1976; Winkler 1990].
– compares spelling variations between two strings instead of
phonetic codes
Limitation: common characters in both strings must be within half the
length of the shorter string
Other Approximate String Matching tool
• Agrep [Wu, Manber 1992]: A general string matching
algorithm that can handle character variations of
insertion, deletion, and substitution.
• The pattern is represented as a bit array. The
computation only involves simple bit operations
(RightShift) and logic operations (AND, OR) on bit
arrays.
Rdj+1=Rshift[Rdj] AND Sc OR Rshift[Rd-1j OR Rd-1j+1] OR Rd-1j
• Agrep has been integrated into Unix and been in
wide use since June 1991
Algorithm Design
• Compare corresponding fields of each pair of records
(disagreement): Sname, SDOB, Saddr, and SID
• To capture different types of name deceptions,
agrep (last1  first1, last 2  first 2) 
agrep (last1  first1, first 2  last 2) 

S name (name1, name2)  min 
 SoundEx(last1  first1, last 2  first 2)


SoundEx
(
last
1

first
1
,
first
2

last
2
)


Calculate the Normalized Euclidean Distance for the overall dissimilarity between two records, i.e., Disagreement =
S name  S DOB  S addr  S ID
4
2
2
2
2
Experimental Results (Training: 80 cases)
Disagreement
Value
R1
R2
R3
R4
R5
R6
R7
R8
….. (up to R80)
R1
*
0.53
0.71
0.67
0.54
0.65
0.63
0.62
…
R2
0.53
*
0.66
0.71
0.64
0.73
0.58
0.67
…
R3
0.71
0.66
*
0.62
0.7
0.64
0.7
0.68
…
R4
0.67
0.71
0.62
*
0.67
0.68
0.67
0.66
…
R5
0.54
0.62
0.72
0.65
*
0.73
0.67
0.58
…
R6
0.65
0.73
0.66
0.68
0.73
*
0.7
0.64
…
R7
0.63
0.6
0.68
0.67
0.67
0.7
*
0.69
…
R8
0.62
0.67
0.68
0.66
0.58
0.64
0.69
*
…
….. (up to
R80)
…
…
…
…
…
…
…
…
*
Table: Distance matrix, the distance value shows the degree
of disagreement between each pair of records in the
training data set.
Experimental Results (Training: 80 cases)
Threshold
0.4
0.45
0.46
0.47
0.48
0.49
0.5
Accuracy False Negative False Positive
76.60%
23.40%
0.00%
92.20%
7.80%
0.00%
93.50%
6.50%
2.60%
96.10%
3.90%
2.60%
97.40%
2.60%
2.60%
97.40%
2.60%
6.50%
97.40%
2.60%
11.70%
Table: Determining best threshold value (0.48)
Training Result
120.0%
Rate
100.0%
80.0%
Accuray
60.0%
False Negative
False Positive
40.0%
20.0%
0.0%
0.35
-20.0%
0.40
0.45
4.8
Threshold
0.50
0.55
Experimental Results (Testing: 40 cases)
Threshold
0.48
Accuracy False Negative False Positive
94.0%
0.0%
0.0%
Table: Accuracy of deception detection when the
best threshold value (0.48) is applied to
the testing data set (40 records)
COPLINK Agent Research
COPLINK Agent: alert and collaboration in a wireless
architecture
• Enhance police information timeliness, collaboration,
mobility, and safety via a web-based wireless alerting
system (under testing at TPD)
• Real-time alert of time-critical information from multiple
databases, e.g., CAD (computer-aided dispatching)
database, MVD
• Identify and inform officers/detectives who are working on
similar cases
• Push time-critical information via wireless and
personalized communications, i.e., web alert, email, cell
phone, and pager
COPLINK Agent: Wireless Alert and Collaboration
•
Allows Patrol Officers to
enhance their community
expertise
•
Further promotes Officer
safety through curbside
knowledge
•
Secure wireless access and
alert: laptop, PDA, pager,
cell phone
•
Alert: 24-7 monitoring of
time-critical information from
different databases
•
Collaboration: Automatically
informing detectives
working on similar cases
COPLINK Agent: Vehicle Search Form
Multi-DB Search
Notificat
ion
setting
Alert Method
COPLINK Agent: Web and E-mail Collaboration
Alerts
Web Alert
Email Alert
COPLINK Agent: Cell Phone and Pager Alert
Cell phone alert
Pager alert with case number
Agent User Study and Result Summary
•
Study Design:
– Case study method based on structured interviews, archival records analysis, and
usability survey.
– Use QUIS (Questionnaire for User Interaction Satisfaction) survey instrument
developed by the HCI Lab at the U. of Maryland.
– 10 participants: crime analysts and detectives in several TPD units.
•
Positive feedback on system Effectiveness and Efficiency:
– Monitoring: “… the information I have received back was instrumental in making at
least 2 felony cases that will be prosecuted on the federal level.”
– Collaboration from CAD Alert: “… allowing us to respond to incidents we know are
important that the field units perhaps don’t realize in a timely manner.”
– Multi-database Search: “The Tucson City Court Search was helpful because I
located one of my suspects on her court date.”
•
High User Satisfaction from QUIS survey items:
– Averaged 5.5 for 49 items on a 7-point Likert scale (7: most useful).
– Strengths: Offers good Investigative power; Easy to read layout; Potential for
Collaborative information sharing; CAD Integration; High intention to use.
– Weaknesses: Lack of help messages; Difficult for inexperienced users; Obscure
user preference settings.
Arizona Daily Star, Jan 7, 2001
New York Times, Nov 2, 2002
Newsweek, March 3, 2003
Interacting with the LE Community
• User-centered design (2 officers assigned to project);
frequent, focused, staged user studies (a user study
team); quick prototyping and user feedback
(quarterly)
• TPD user briefings: 30+ user groups and
management demos/briefings (2 chiefs, 7 assistant
chiefs)
• Arizona/regional partner briefings: 30+ regional
partners demos/meetings; Phoenix, Pima, etc.
• Annual COPLINK Center research workshop, under
NSF Digital Government Program
• National/regional NIJ/DOJ and LE meetings: 20+ LE
IT meetings; International Association of Chiefs of
Police (IACP) meetings
• Regional deployment and success: Arizona, TX,
Iowa, Michigan, Boston, Alaska, CA, etc.
COPLINK Lessons Learned
• Know their pain and build something they can use.  What street
cops need.
• Build trust and know the culture.  security, policy, training, user
acceptance (build a Living Lab)
• Early and consistent user involvement.  2 TPD officers, 7 asst.
chiefs, 2 chiefs
• Create early and small successes.  Detect/Connect, group to
division and department
• Spread the success and solicit partners.  Tucson, AZ, CA, TX,
MA, Montgomery, MA, Alaska, etc.
• Understand funding agencies expectation.  NIJ (tools), NSF
(research)
• Development and research prioritization.  research (Ph.D.) after
development (MS/BS); little cutting-edge research in the first two
years
• Establish deployment partners.  KCC, diff(operational
system,research prototype) = $2M
• Work with university technology transfer office.  office of
(preventing) technology transfer?
Outline
COPLINK Visual Criminal Network
Analysis (CNA) Research: BorderSafe,
and Dark Web Terrorism Research
Research Approach
•
Testbed and community grounded
algorithm, toolkit and system research and
development
•
Advanced visual criminal network analysis
and knowledge mapping research and
technologies
BorderSafe: Research Objectives
•
Participate in DHS BorderSafe IFE Experiment, in
partnership with CNRI, ARJIS (SD), SDSC, TPD,
and AZ DHS
•
Develop (1) border-crosser and border-crossing
vehicle analysis techniques, by (2) leveraging local
law enforcement and local DHS data, and (3) using
COPLINK crime analysis abilities
•
Advance visual criminal network analysis (CNA)
and knowledge mapping research and technologies
(e.g., terrorism, terrorist, terrorized)
Current Capability: Criminal Network
Analysis (LE and Intelligence Community)
• First generation — manual approach
– Anacapa Chart (Harper & Harris, 1975)
• Second generation — graphics-based
approach
– Analyst’s Notebook, Netmap, Watson
– COPLINK hyperbolic tree view, network view
• Third generation — structural analysis
approach
Anacapa Chart (1st generation)
Association
Matrix
• Manually extract criminal associations
from data files
• Construct an association matrix and draw
a link chart based on the association matrix
Link chart
Analyst’s Notebook, Netmap, Watson (2nd generation)
Analyst’s Notebook.
Network nodes are
automatically arranged for
easy interpretation.
Source: i2, Inc.
Netmap.
Different colors
are used to
represent
different entity
types.
Source: Netmap
Analytics, LLC.
Watson.
Relations among a group of people
(the central sphere) based on telephone
records. Source: Xanalysis, Ltd.
A 9/11 Terrorist Network: centrality, cliques, typology…
BorderSafe Visual Criminal Network
Analysis (CNA) Design
Structural Analysis
Criminal
-justice
Data
Network
Partition
Hierarchical
Clustering
Network
Creation
Network
Visualization
Concept Space
Centrality
Measures
Networked
Data
Blockmodeling
MDS
J. Xu and H. Chen, “Criminal Network Analysis and Visualization: A Data Mining
Perspective,” Communications of the ACM, 2004, forthcoming.
Visual CNA: Network Display
Nodes represent individual
criminals labeled by their
names
Links represent relationships
between criminals
Adjust the slider to perform
clustering and blockmodeling
Visual CNA: Subgroup Display
The reduced star structure
found using blockmodeling
• Circles represent groups.
• The size of a circle is
proportional to the number
of group members.
• Each group is labeled by
its leader’s name.
Visual CNA: Member Ranking
The rankings of each group member
in terms of centrality measures
The first one of each column is the
leader, gatekeeper, and outlier,
respectively
The inner structure of a selected
group
Adjust the slider to do further
blockmodeling
Meth World: Subgroup Verification
•
Subgroups detected have different characteristics: The subgroups
found are consistent with the groups’ specializations or responsibilities
in a network
White gang members
who were involved in
assaults and murders
White gang members
who were involved in
crack cocaine
Drug dealers
Offenders who were
responsible for stealing,
counterfeiting, and cashing
checks and providing money
to other groups to carry out
drug transactions
Visual CNA: Network Structure
A chain structure found in a
60-member network
using blockmodel analysis
Temporal CNA: The Evolution of Meth World
The network in Year:
1995, 1996, 1998,
1999, 2002
Cross-Jurisdictional CNA: The
Extended Meth World (TPD & PPD)
• Highlighted (red) nodes represent criminals who appear in both TPD and
PPD databases
Tucson
Phoenix
Customs and Border Protection (CBP) Border
Crossing Information
• CBP has provided the Border Safe project with license plate
numbers seen crossing the border. These can be integrated with
local data to enhance the analysis.
• Video equipment automatically extracts license plate numbers of
cars as they cross into or out of the country at the Douglas AZ
port of entry.
1,125,155 Records: plate, state, date, time
226,207 Distinct vehicles
209 Days of information over an 18 month period
130,195 Plates issued in AZ
5,546 Plates issued in CA
90,466 Plates issued in Mexico
Border Crossing Records and TPD
• Many of the vehicles found in the CBP data also show activity in
the TPD database.
• The fact that a vehicle frequently crosses the border is of
interest in criminal investigations.
• The TPD data provides a link between license plates and
criminal activity networks.
8,300 Distinct vehicles appear in both datasets
34,632 Crossings recorded crossings involve those vehicles
A Vehicle to Watch?
This network contains 5 border crossing plates (outlined in red). The
large green dots were confirmed to be criminals of significant interest.
Shape Indicates Object Type
circles are people
rectangles are vehicles
Color Denotes Activity History
Gang related
Violent crimes
Narcotics crimes
Violent & Narcotics
Larger Size Indicates higher
levels of activity
Border Crossing Plates are
outlined in Red
A Vehicle to Watch?
Plate ABC-123
- Crossed border 35 times.
- No prior Narcotics associations
“Jane”
- Associated with Vehicle
- No known Narcotics activity
“Joe”
- Related with Jane and vehicle
in ‘Suspicious Activity’ report
- Some prior narcotics activity
“Bob”
- Related with Joe in Narcotics
- Involved in 11 narcotics incidents
- Connected to a big narcotics
network
Truncated version of previous network
Name Removed
People / Vehicles previously
never linked to narcotics
can be identified using such
Networks to focus and
support investigations.
From COPLINK to BorderSafe to
Terrorism Knowledge Portal
•
Terrorism: Identify key terrorism literature,
resources, and experts (web portal, meta
searching, citation network analysis, knowledge
maps, expert finder)
•
Terrorist: Understand how the terrorist groups are
revealed on the web and how they use the web
(Dark Web, web spidering and mining, back-link
analysis, terrorist network analysis, multilingual
entity and event extraction)
•
Terrorized: Assist citizens and victims responding to
terrorism (pattern-based chatbots, system
assessment, victim consultation and resources,
scalable anonymous robot assistance)
•
(In collaboration with Drs. Reid, Sageman and Levine and Sandia
National Lab)
The Web
Dark Web
Hate Groups | Racial Supremacy | Suicidal Attackers |
Activists / Extremists | Anti-Government |
Information Terrorist Group
Sources
Web sites
Collection
Methods
Automatic
Spidering
…
Search
Engines
Personal
Profile Search
Terrorism
databases
Meta
Searching
Back link
search
Government
information
Downloading
from Gov’t
Web sites
Filtering
Data
Storage
International
Terrorism
Domestic Terrorism
Terrorism research
information
Sageman’s Global Salafi Jihad (GSJ)
Data
• Data collected and cross-validated from open sources
regarding 172 GSJ members (Dr. Sageman, U. Penn)
• Background
– From upper or middle class (3/4)
– Average age is 26
– Affiliation through friendship, kinship, discipleship, and
worship
• Four clusters (based on geographical distribution): lieutenants
and network structures
– Central Staff: Osama bin Laden
– Core Arabs: Khalid Sheikh Mohammed
– Maghreb Arabs: Zain al Abidin Mohd Hussein
– Indonesians: Abu Bakar Baasyir
Jihad CNA: “Combined” = “Link to GSJ” +
“Operational” + “Family” (107 nodes)
Scale free
network
Maghreb Arabs
Core Arabs
Cliques
A clique
Scale free
network
Osamar bin
Larden
Hierarchical
network
Indonesians
Jihad CNA: 9/11 Hijackers in “Combined”
Network  Monitor New Open Sources
Mohemed
Atta
Outline
Developing the Science of Intelligence and
Security Informatics (ISI)
Develop the Science of Intelligence and
Security Informatics (ISI)
• ISI: The study of the use and development of
advanced information technologies, systems,
algorithms, and databases for national security
related applications, through an integrated
technological, organizational, and policy based
approach.  similar to “Biomedical Informatics”
(information centric)
• National Security is a long-term mission  Need to
develop long-term research agenda and partnership
(researchers, practitioners, policy makers, industries,
law enforcement and intelligence professionals, etc.)
• A bottom-up, success-driven approach  From
selected demonstration sites, to regional partnership,
and then to national deployment. Build small
successes first.
Building an ISI Community
• Federal funding priority: Building community-based
“Living Labs”
• Many disparate LE, intelligence, and industry meetings
(vendor driven)
• Academic ISI special issues: JASIST, DSS, and ACM
TOIT, forthcoming, 2004
• IEEE Intelligence and Security Informatics Conference:
Sponsored by NSF, NIJ, CIA, and DHS, 2003 (Tucson),
2004 (Tucson), 2005 (Atlanta), 2006 (San Diego), 2007
(NJ), 2008 (Taiwan)
For project information:
http://ai.arizona.edu/COPLINK
[email protected]