Data Mining Customer-Related Subway Incidents

Download Report

Transcript Data Mining Customer-Related Subway Incidents

Data Mining Customer & Employee-Related
Subway Incidents: Phase II
David Budet
Mariel Castro
Jason Jaworski
Yevgeny Khait
Florangel Marte
Client: Richard Washington, NYC Transit Authority
Presentation Summary
 Project Description
 Review
 Progression
 City Crime vs. Subway Crime
 Results: Customer Assaults
 Results: Employee Assaults
 Results: Robberies (Simple Theft)
 Results: Train Delays
 Weka ID3 Decision Trees
 Future Research Avenues
Project Description
 Phase I concentrated on looking at incidents and
identifying reasons for aggression, specifically
what effects delays had on aggression incidents
 Phase II is more specifically concentrated on
subway assaults and possible correlations with the
data’s attributes
 Main focus of both phases: analysis of a dataset of
incidents which occurred in the New York City
Subway system over multiple years and mining of
the data to establish relationships and trends
Review
The first half of the study focused on mining data with
Microsoft SQL Server 2008 and the program Weka.
Utilizing these tools and team methodologies, we
determined which stations and train lines had the most:
 Violent assaults against customers and employees
 Delays
 Simple thefts (unarmed robberies, pick-pocketing,
etc.)
Progression
The second half of the study had a more regional focus.
The team:
 Acquired US Census data regarding crime and population in
NYC
 Normalized the Census crime data and subway crime data by
population for Manhattan, Brooklyn, Queens and the Bronx
 Analyzed Subway crime as a microcosm of overall NYC crime
for 2007
 Created an interactive Javascript map pinpointing stations
with most violent incidents and delays
City Crime vs. Subway Crime
In comparing overall crime in New York City for 2007
to crime in the NYC Subway system:
 We found that Manhattan, though the third largest
borough in terms of population, accounted for over
half the crime in NYC
 The Bronx has the smallest population, but in terms
of crime per resident, had the second highest rate of
crime
 Subway crime accounts for less of a percentage of
overall crime in Manhattan than the other three
boroughs researched
City Crime vs. Subway Crime
City Crime vs. Subway Crime
When normalized for population, subway crime in Brooklyn and Queens
accounts for a greater percentage of overall crime than in Manhattan and the
Bronx, signaling these boroughs may have more dangerous, or incident prone
stations than Manhattan or Queens.
Findings: Customer Assaults
The stations with the most assaults (all types of assault) against customers
from 2005 – 2007 were 59th Street, 14th Street and 125th Street.
Findings: Customer Assaults
Between 2005 & 2007, the highest number of assaults (all types)
committed against customers took place on the A, 2 and 4 lines.
Findings: Employee Assaults
Stations with more than 5 total assaults (all types of assault) against
employees between 2005 – 2007
Findings: Employee Assaults
Between 2005 & 2007, the highest number of assaults (all types)
committed against employees took place on the 6, 2 and A lines.
Findings: Robberies (Simple Theft)
Findings: Robberies (Simple Theft)
Findings: Train Delays
Number of delays by month over 3 year period:
Findings: Train Delays
Findings: Train Delays
Weka ID3 Decision Tree
Weka ID3 Decision Tree
Future Research Avenues
 MTA and project team can separately mine an identical data




set and introduce an objective methodology for determining
the best results and techniques from both databases
Continue in-depth data mining
Identify and research other algorithms in Weka conducive to
mining and correlating NYC Subway data (we propose the
next team utilize clustering analysis via the algorithm
SimpleKMeans)
Investigate possible correlations between neighborhood
income levels and stations where subway crime is prevalent
Continue to expand and build on Javascript map