Helping to satisfy the new needs of forensic scientists with
Download
Report
Transcript Helping to satisfy the new needs of forensic scientists with
Current Research in Forensic
Toolmark Analysis
3
2
1
0
1
2
3
Helping to satisfy the “new” needs of forensic
scientists with state of the art microscopy,
computation and statistics
Outline
•
•
•
•
Introduction
Instruments for 3D toolmark analysis
3D toolmark data
The statistics:
• Identification Error Rates
• “Match” confidence
• “Match” probability
• Statistics from available practitioner
data
Quantitative Criminalistics
• All forms of physical evidence can be represented as numerical
patterns
o
o
o
o
o
Toolmark surfaces
Dust and soil categories and spectra
Hair/Fiber categories and spectra
Craniofacial landmarks
Triangulated fingerprint minutiae
• Machine learning trains a computer to recognize patterns
o Can give “…the quantitative difference between an identification and
non-identification”Moran
o Can yield identification error rate estimates
o May be even confidence measures for I.D.s
Data Acquisition For Toolmarks
Confocal Microscope
Scanning
Electron
Microscope
Focus
Variation Microscope
Comparison
Microscope
Screwdriver Striation Patterns in Lead
2D profiles
3D surfaces
(interactive)
Bullets
Bullet base, 9mm Ruger Barrel
Bullet base, 9mm Glock Barrel
Close up: Land Engraved Areas
What can we do with all this microscope data?
• Statistical pattern comparison!
• Modern algorithms are called
machine learning
• Idea is to measure
features that
characterize
physical evidence
• Train algorithm to
recognize “major”
differences between
groups of features while taking into account
natural variation and measurement error.
• Visually explore: 3D PCA of 760 real and simulated mean
profiles of primer shears from 24 Glocks:
•
~45% variance retained
Support Vector Machines
• Support Vector Machines (SVM) determine
efficient association rules
• In the absence of specific knowledge of probability
densities
SVM decision boundary
18D PCA-SVM Primer Shear I.D. Model, 2000 Bootstrap Resamples
Refined bootstrapped I.D. error rate for primer shear striation patterns= 0.35%
95% C.I. = [0%, 0.83%]
(sample size = 720 real and simulated profiles)
How good of a “match” is it?
Conformal PredictionVovk
• Confidence on a scale of 0%-100%
• Testable claim: Long run I.D. errorrate should be the chosen
significance level
• This is an orthodox “frequentist”
approach
• Roots in Algorithmic Information
Theory
• Data should be IID but that’s
it
Cumulative # of Errors
• Can give a judge or jury an easy to understand measure of
reliability of classification result
80% confidence
20% error
Slope = 0.2
95% confidence
5% error
Slope = 0.05
99% confidence
1% error
Slope = 0.01
Sequence of Unk Obs Vects
Conformal Prediction
• For 95%-CPT (PCA-SVM) confidence intervals will not
contain the correct I.D. 5% of the time in the long run
• Straight-forward validation/explanation picture for
court
Empirical Error Rate: 5.3%
Theoretical (Long Run)
Error Rate: 5%
14D PCA-SVM Decision Model
for screwdriver striation patterns
How good of a “match” is it?
Efron Empirical Bayes’
• An I.D. is output for each questioned
toolmark
• This is a computer “match”
• What’s the probability the tool is truly the
source of the toolmark?
• Similar problem in genomics for detecting
disease from microarray data
• They use data and Bayes’ theorem to get an
estimate
A Bayesian Hierarchical Model: Believability Curve
JAGS MCMC Bayesian over-dispersed Poisson with intercept, on test set
Bayes Factors/Likelihood Ratios
• In the “Forensic Bayesian Framework”, the Likelihood
Ratio is the measure of the weight of evidence.
• LRs are called Bayes Factors by most statistician
• LRs give the measure of support the “evidence” lends to
the “prosecution hypothesis” vs. the “defense hypothesis”
• From Bayes Theorem:
LR =
(
Pr E | H p
Pr ( E | H d
(
Pr H p | E
)
) = Pr ( H | E ) = Posterior Odds
Prior Odds
) Pr ( H )
d
p
Pr ( H d )
Bayes Factors/Likelihood Ratios
• Using the fit posteriors and priors we can obtain the likelihood ratiosTippett, Ramos
Known match LR values
Known non-match LR values
Available Large Scale Practitioner Studies
• Two large scale published studies
o 10-Barrel TestHamby:
o
o
o
o
626 practitioners (24 countries)
15 “unknowns” per test set
At least one bullet from each of the 10 consecutively manufactured barrels
# examiner errors committed = 0
o GLOCK Cartridge Case TestHamby:
o
o
o
o
1632 9-mm Glock fired cartridge cases
1 case per Glock
All cartridge cases pair-wise compared
# of pairs of cartridge cases judged to have enough surface detail agreement
to be (falsely) “matching” = 0
o AFTE Theory of Identification standard used: www.swggun.org
Available Large Scale Practitioner Studies
So does that mean the error rate is 0%?
• 0% error rate is the “frequentist” estimate
o We looked to sports statistics for low scoring
games
o “Bayesian” statistics provide complementary
methods for analysis
oCan work much better in estimating small
probabilities
Available Large Scale Practitioner Studies
• For 10-Barrel we need to estimate a small error
rate
• For GLOCK we need to estimate a small random
match probability (RMP)
• Use Bayesian “Beta-binomial” method when no
“failures” are observed (Schuckers)
Available Large Scale Practitioner Studies
• Basic idea of the reverend Bayes:
Prior Knowledge × Data = Updated Knowledge
Error Rate/RMP =
a
a+b
Uninf(a,b) × Beta-Binomial(data | a,b)
Posterior(a,b | data)
Get updated estimates of Error rate/RMP
Available Large Scale Practitioner Studies
• So given the observed data and assuming “prior
ignorance”
o Posterior error rate/RMP distributions:
100
300
RMP
0.000086%
[0.0000020%, 0.00031%]
0
Frequency
100
200
300
Average Examiner Error Rate
0.011%
[0.00023%, 0.040%]
0
Frequency
400
Posterior
Distribution of RMP, 1632 Cartridge Cas
Distribution ofPosterior
AverageDist.
Examiner
Error Rate for 626
Participants
10-Barrel
Posterior Dist. GLOCK
0.00
0.02
0.04
0.06
0.08
Error Rate (%)
0.10
0e+00
2e−04
4e−04
RMP (%)
6e−04
8e−04
Acknowledgements
• Professor Chris Saunders (SDSU)
• Professor Christophe Champod (Lausanne)
• Alan Zheng (NIST)
• Research Team:
• Dr. Martin Baiker
• Ms. Helen Chan
• Ms. Julie Cohen
• Mr. Peter Diaczuk
• Dr. Peter De Forest
• Mr. Antonio Del Valle
• Ms. Carol Gambino
• Dr. James Hamby
•
•
•
•
•
•
•
•
•
•
•
Ms. Alison Hartwell, Esq.
Dr. Thomas Kubic, Esq.
Ms. Loretta Kuo
Ms. Frani Kammerman
Dr. Brooke Kammrath
Mr. Chris Lucky
Off. Patrick McLaughlin
Dr. Linton Mohammed
Mr. Nicholas Petraco
Dr. Dale Purcel
Ms. Stephanie Pollut
•
•
•
•
•
•
•
•
•
•
Dr. Peter Pizzola
Dr. Graham Rankin
Dr. Jacqueline Speir
Dr. Peter Shenkin
Ms. Rebecca Smith
Mr. Chris Singh
Mr. Peter Tytell
Ms. Elizabeth Willie
Ms. Melodie Yu
Dr. Peter Zoon