Current Research in Forensic Toolmark Analysis

Download Report

Transcript Current Research in Forensic Toolmark Analysis

Current Research in Forensic
Toolmark Analysis
3
2
1
0
1
2
3
Petraco Group
Outline
•
•
•
•
Introduction
Instruments for 3D toolmark analysis
3D toolmark data/Features
The statistics:
•
•
•
•
Identification Error Rates
“Match” confidence
“Match” probability from Empirical Bayes
“Match” probability from CMS data and
Bayesian Networks
Quantitative Criminalistics
• All forms of physical evidence can be represented as numerical
patterns
o
o
o
o
Toolmark surfaces
Dust and soil categories and spectra
Hair/Fiber categories and spectra
Triangulated fingerprint minutiae
• Machine learning trains a computer to recognize patterns
o Can give “…the quantitative difference between an identification and
non-identification”Moran
o Can yield identification error rate estimates
o May be even confidence measures for I.D.s
Data Acquisition For Toolmarks
Confocal Microscope
Scanning
Electron
Microscope
Focus
Variation Microscope
Comparison
Microscope
Screwdriver Striation Patterns in Lead
2D profiles
3D surfaces
(interactive)
9mm Glock fired cartridge cases
Bottom of
Firing pin imp.
Bullets
Bullet base, 9mm Ruger Barrel
Bullet base, 9mm Glock Barrel
Close up: Land Engraved Areas
What can we do with all this microscope data?
• Statistical pattern comparison!
• Modern algorithms are called
machine learning
• Idea is to measure
features that
characterize
physical evidence
• Train algorithm to
recognize “major”
differences between
groups of features while taking into account
natural variation and measurement error.
Good Features are the Key!
• We need a toolmark feature set that is:
•
•
•
•
•
Large in number
(possibly) translationally invariant
(possibly) rotationally invariant
Mostly statistically independent
DISCRIMINATORY!
Toolmark Features
Mean total
profile:
Mean
“waviness”
profile:
Mean
“roughness”
profile:
Aperture primer shear on a 9mm
cartridge case fired from the a Glock 19
Take a representative for a group of toolmarks
made by the same tool as a “chapter” in a “dictionary”
0.0
0.4
0.8
A toolmark is a “word” in a chapter.
0
2000
4000
6000
8000
1:length(profil)
aug.deriv.zeros
A line is a “letter” in a “word”
10000
Form the CMS-space
0
0.4
0.8
•
0.8
4000
6000
4000
6000
•
0.4
…
2000
4000
6000
10000
0.8
0.4
0.0
0
2000
4000
6000
8000
aug.deriv.zeros
1:length(profil)
Find best matching “word” in query to
each “dictionary word”
Similarity metric is arbitrary
•
0
8000
… … …
8000
Index
0.0
X.wga.list[[5]][2, ]
2000
2000
aug.deriv.zeros
1:length(profil)
0.0
X.wga.list[[5]][1, ]
…
Database/queries
0
rep(0,profil[aug.deriv.zeros]
length(aug.deriv.zeros))
profil
0.4
0.8
… … …
0.0
rep(0,profil[aug.deriv.zeros]
length(aug.deriv.zeros))
profil
“words” in the Biasotti-Murdock “dictionary”
We use a mix
8000
0.4
0.8
•
0.0
0
2000
4000
…
X.wga.list[[27]][4, ]
Index
Index
6000
8000
Process produces a registration
free/translation/rotation-invariant
multivariate feature vector
Consecutive Matching Striae (CMS)-Space
Toolmarks (screwdriver striation profiles) form database
Biasotti-Murdock Dictionary
Calculation was
SLOW!
~ One week on a
fairly beefy
desktop computer
FAST-Consecutive Matching Striae (CMS)-Space
Toolmarks (screwdriver striation profiles) form database
Biasotti-Murdock Dictionary
Found an approximate
algo and parallelized it
~ 3 minutes on same
fairly beefy desktop
computer
• Visually explore: 3D PCA of 760 real and simulated mean
profiles of primer shears from 24 Glocks:
•
~45% variance retained
Support Vector Machines
• Support Vector Machines (SVM) determine
efficient association rules
• In the absence of specific knowledge of probability
densities
SVM decision boundary
18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples
Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35%
95% C.I. = [0%, 0.83%]
(sample size = 720 real and simulated profiles)
How good of a “match” is it?
Conformal PredictionVovk
• Confidence on a scale of 0%-100%
• Testable claim: Long run I.D. errorrate should be the chosen
significance level
• This is an orthodox “frequentist”
approach
• Roots in Algorithmic Information
Theory
• Data should be IID but that’s
it
Cumulative # of Errors
• Can give a judge or jury an easy to understand measure of
reliability of classification result
80% confidence
20% error
Slope = 0.2
95% confidence
5% error
Slope = 0.05
99% confidence
1% error
Slope = 0.01
Sequence of Unk Obs Vects
Conformal Prediction
• For 95%-CPT (PCA-SVM) confidence intervals will not
contain the correct I.D. 5% of the time in the long run
• Straight-forward validation/explanation picture for
court
Empirical Error Rate: 5.3%
Theoretical (Long Run)
Error Rate: 5%
14D PCA-SVM Decision Model
for screwdriver striation patterns
How good of a “match” is it?
Efron Empirical Bayes’
• An I.D. is output for each questioned
toolmark
• This is a computer “match”
• What’s the probability the tool is truly the
source of the toolmark?
• Similar problem in genomics for detecting
disease from microarray data
• They use data and Bayes’ theorem to get an
estimate
Bayesian Statistics
• The basic Bayesian philosophy:
Prior Knowledge × Data = Updated Knowledge
A better understanding
of the world
Prior × Data = Posterior
Empirical Bayes’
• From Bayes’ Theorem we can getEfron:
Estimated probability of not a true
“match” given the algorithms'
output z-score associated with its
“match”
(
) ( )
p̂
z
|
S
P̂r S- | z =
P̂r Sfˆ ( z )
(
)
Names: Posterior error probability (PEP)Kall
Local false discovery rate (lfdr)Efron
• Suggested interpretation for casework:
(
)
1- P̂r S- | z = Estimated “believability” that the specific tool
produced the toolmark
Empirical Bayes’
• Model’s use with crime scene “unknowns”:
This is the est. post.
prob. of no association
= 0.00027 = 0.027%
This is an uncertainty
in the estimate
Computer outputs “match” for:
unknown crime scene toolmarks-with knowns from “Bob the burglar” tools
The “Bayesian Framework”
• Odd’s form of Bayes’ Rule:
Likelihood Ratio
{
Posterior odds in favour of
Theory A
{
{
Pr(H a | E, I ) Pr(E | H a , I) Pr(H a , I )
=
´
Pr(H b | E, I ) Pr(E | H b , I) Pr(H b , I)
Prior odds in favour of
Theory A
Posterior Odds = Likelihood Ratio × Prior Odds
Bayes Factors/Likelihood Ratios
• Using the fit posteriors and priors we can obtain the likelihood ratiosTippett, Ramos
Known match LR values
Known non-match LR values
Bayesian Match Probabilities from CMS
• 2007 Neel and Wells study:
• Count the number of each type of CMS run for KM and
KNM comparisons
• A CMS type is its run length:
• 4X means 4 matching adjacent lines in a comparison of two
striation patterns
914 KM comparisons
Number observed
0
1
2
3
4
5
6
7
8
>8
CMS run
lengths:
2X
508
186
109
39
21
10
4
10
14
13
3X
612
172
59
29
15
9
9
6
2
1
1411 KNM comparisons
4X
694
135
43
19
16
2
1
3
0
1
…
…
…
…
…
…
…
…
…
…
…
Number observed
0
1
2
3
4
5
6
7
8
>8
CMS run
lengths:
2X
771
298
143
84
46
21
13
14
6
15
3X
1239
124
35
10
2
1
0
0
0
0
4X
1357
47
4
2
1
0
0
0
0
0
Model each column of counts as arising from a multinomial distribution
…
…
…
…
…
…
…
…
…
…
…
Bayesian Match Probabilities from CMS
• Model Neel and Wells counts in each column with a
multinomial likelihood:
• Model each cell probability before we’ve seen any data as an
“uninformative” Dirichlet prior:
• Bayes’ theorem gives “updated” (posterior) cell probabilities:
Bayesian Match Probabilities from CMS
• Updated CMS run length probabilities:
KM comparisons
Number observed
0
1
2
3
4
5
6
7
8
>8
CMS run
lengths:
2X
0.550
0.202
0.119
0.043
0.024
0.012
0.005
0.012
0.016
0.015
3X
0.663
0.187
0.065
0.032
0.018
0.011
0.011
0.008
0.003
0.002
KNM comparisons
4X
0.752
0.147
0.047
0.022
0.019
0.003
0.002
0.004
0.001
0.002
…
…
…
…
…
…
…
…
…
…
…
Number observed
0
1
2
3
4
5
6
7
8
>8
CMS run
lengths:
2X
0.5440
0.2099
0.1010
0.0598
0.0332
0.0155
0.0099
0.0105
0.0049
0.0113
3X
0.8726
0.0880
0.0254
0.0078
0.0021
0.0014
0.0007
0.0007
0.0006
0.0007
4X
0.9556
0.0338
0.0035
0.0021
0.0014
0.0007
0.0007
0.0007
0.0007
0.0007
…
…
…
…
…
…
…
…
…
…
…
• So what can we use these for??
• Lot’s of stuff, but we put them into a Bayesian network:
• BN model for Match/Non-match probabilities given observed
numbers of CMS runs
Bayesian Networks
• A “scenario” is represented by a joint probability
function
• Contains variables relevant to a situation which represent
uncertain information
• Contain “dependencies” between variables that describe how
they influence each other.
• A graphical way to represent the joint probability
function is with nodes and directed lines
• Called a Bayesian NetworkPearl
Bayesian Networks
• (A Very!!) Simple exampleWikipedia:
• What is the probability the Grass is Wet?
• Influenced by the possibility of Rain
• Influenced by the possibility of Sprinkler action
• Sprinkler action influenced by possibility of Rain
• Construct joint probability function to answer
questions about this scenario:
• Pr(Grass Wet, Rain, Sprinkler)
Bayesian Networks
Pr(Sprinkler | Rain)
Sprinkler
:
Rain:
yes
was on
was off
40%
60%
no
Pr(Rain)
Pr(Sprinkler)
1%
Pr(Rain)
99%
Pr(Grass Wet)
Pr(Grass Wet | Rain, Sprinkler)
Grass
Wet:
Sprinkler:
Rain:
was on
yes
was on
no
was off
yes
was off
no
yes
no
99%
1%
90%
10%
80%
80%
0%
100%
Rain: yes
no
20%
80%
Bayesian Networks
Pr(Sprinkler)
Pr(Rain)
Other probabilities
are adjusted given
the observation
Pr(Grass Wet)
You observe
grass is wet.
Bayesian Networks
“Prior” network based on Neel and Wells
observed counts and MultinomialDirichlet model:
“Instantiated” network with observations
from a comparison:
Estimate of the “match” probability
which can be turned into an LR if so
desired
Future Directions
• GUI modules for common toolmark
comparison tasks/calculations using 3D
microscope data
• 2D features for toolmark impressions
• Parallel implementation of computationally
intensive routines
• Standards board, to review statistical
methodology/algorithms
• Maybe part of OSAC??
Acknowledgements
• Professor Chris Saunders (SDSU)
• Professor Christophe Champod (Lausanne)
• Alan Zheng (NIST)
• Ryan Lillian and Marcus Brubaker (CADRE)
• Research Team:
• Ms. Tatiana Batson
• Dr. Martin Baiker
• Ms. Julie Cohen
• Dr. Peter Diaczuk
• Mr. Antonio Del Valle
• Ms. Carol Gambino
• Dr. James Hamby
• Mr. Nick Natalie
• Mr. Mike Neel
•
•
•
•
•
•
•
•
•
•
Ms. Alison Hartwell, Esq.
Ms. Loretta Kuo
Ms. Frani Kammerman
Dr. Brooke Kammrath
Mr. Chris Lucky
Off. Patrick McLaughlin
Dr. Linton Mohammed
Ms. Diana Paredes
Mr. Nicholas Petraco
Ms. Stephanie Pollut
•
•
•
•
•
•
•
•
•
Dr. Peter Pizzola
Dr. Graham Rankin
Dr. Jacqueline Speir
Dr. Peter Shenkin
Mr. Chris Singh
Mr. Peter Tytell
Ms. Elizabeth Willie
Ms. Melodie Yu
Dr. Peter Zoon
Data, Programs, Reprints/Preprints:
[email protected]