Random Match Probability Statistics

Download Report

Transcript Random Match Probability Statistics

Random Match Probability
Statistics
From single source to three person
mixtures with allelic drop out
Statistics
• “There are three kinds of lies: lies, damned
lies, and statistics.” –Benjamin Disraeli, British Prime
Minister as popularized by Mark Twain
• 18.7% of all statistics are made up
• My introduction to forensics statistics…. It had
been a loooooong time since sophomore
genetics
Heterozygote
• Alleles P and Q
• Could be PQ or it could be QP
• So…
2pq
– Where p is frequency of P
– And q is frequency of Q
• If p = 0.2 and q = 0.15, then 2(0.2)(.15) = 0.06
Most of us understood this pretty quickly
Homozygote
• Allele P
• Above stochastic threshold
• So… p x p or p2
Most of us understood this pretty quickly too
• But there’s that Θ business
Homozygote
• You don’t use p2
– But I understood that
• Use p2 + p(1-p)Θ
– I didn’t understand this
• Where did Θ come from?
• “It’s the inbreeding coefficient.”
Homozygote
• OK, but where did p2 + p(1-p)Θ come from?
• “It’s the correction factor for inbreeding.”
– Not so helpful
– Why isn’t it just p2 – Θ?
Homozygote
• We start with what we thought
• p2
• But some percentage is from inbreeding
• Θp
• Correct for that amount of inbreeding
• (1-Θ) p2
• Combine them
+
•
Homozygote
•
•
•
•
•
Now it’s algebra
Θp + (1 – Θ)p2 (inbred p + non-inbred p2)
Θp + p2 – Θp2 (expand the terms)
p2 + Θp – Θp2 (we like to see p2 term first)
p2 + p(Θ – Θp) (pull out p)
• p2 + p(1 – p)Θ
(pull out Θ to get final form)
Single source stat
• Do the 2pq calculation at each heterozygous
locus
• Do p2 + p(1 – p)Θ at each homozygous locus
• Then multiply the results for all loci
Partial single source stat
• What if you don’t detect everything from a
single contributor?
• Consistent with one contributor, but obvious
there is a lot of drop out
Partial single source stat
No result
Drop out
No result
??
Drop out
Drop out
No result
With a sample like this, would you
1.
2.
3.
4.
5.
Inconclusive data
Exclude only
Exclude or “inc a person”
Exclude/include no stat
Exclude/include stat for 2
allele loci
6. Exclude/include for all loci
with something detected
7. Other
0%
0%
1
2
0%
0%
3
4
0%
0%
5
6
0 of 30
30
Countdown
0%
7
Partial single source stat
• Heterozygous loci still 2pq
Partial single source stat
• What about loci that you don’t know about?
Partial single source stat
• Any person that is a 9.3 could be the source
• How to calculate 9.3, Any?
Partial single source stat
• The 9.3 could be a homozygote
• So p2 + p(1-p)Θ covers that
• But the 9.3 could be a heterozygote with any
other allele
• So 2pq, but what is q?
Partial single source stat
• You could go to the ladder
– 2(p)(q)
– p = 9.3
–q=4
so 2(f9.3)(f4)
–q=5
so 2(f9.3)(f5)
–q=6
so 2(f9.3)(f6)
– …..
– q = 13.3 so 2(f9.3)(f13.3)
– Then add them up
But what about off ladder alleles,
microvariants, etc? How do you
do 2pq for those?
Partial single source stat
• Instead – if p is what you see (or detect)
• Then q must be what you don’t see (or detect)
• Since this is a binary system
– (What you see/detect) + (what you don’t) = 1.0
– (what you don’t see) = 1 – (what you see/detect)
• So q = (1-p)
• Therefore 2pq becomes 2p(1-p)
Partial single source stat
• Now just combine the homozygote and
heterozygote options (p = f9.3)
• [p2 + p(1-p)Θ] + [2p(1-p)] for anyone with 9.3
Partial single source stat
• What about loci that look like homozygotes?
• Use your PHR and stochastic threshold studies
– If you treat a locus as a homozygote, you better be
above your stochastic threshold
– When in doubt, use Allele, Any – you’re covered
– At USACIL, Allele, Any = “modified” RMP
Partial single source stat
• The “2p” rule
• Section 5.2.1.3 –SWGDAM
5.2.1.3. For single-allele profiles where the zygosity is in question (e.g.,
it falls below the stochastic threshold):
5.2.1.3.1. The formula 2p, as described in recommendation
4.1 of NRCII, may be applied to this result.
5.2.1.3.2. Instead of using 2p, the algebraically identical formulae
2p – p2 and p2 + 2p(1-p) may be used to address this situation
without double-counting the proportion of homozygotes in the
population.
Partial single source stat
• 2p is an extremely conservative approximation
• There is a better way
– 2p-p2
– p2 + 2p(1-p)
• But this is even better
– p2 + p(1-p)Θ + 2p(1-p)
– (computers can calculate anything)
Partial single source stat
• “Algebraically identical formulae”
• f9.3 = 0.3054
2p –p2
p2 + 2p(1-p)
2(0.3054) - (0.3054)2
(0.3054)2 + 2(0.3054) (1-0.3054)
0.6108 - 0.09326
0.09326 + 0.6108 (0.6946)
0.5175
0.5175
0.09326 + 0.42426
0.5175
Partial single source stat
• So for 9.3, Any
– 2p = 0.6108
– 2p-p2 = 0.5175
– p2 + 2p(1-p) = 0.5175
– p2 + p(1-p)Θ + 2p(1-p) = 0.5197
Minor contributor stat
When the minor is probative,
would you
1.
2.
3.
4.
5.
Inconclusive data
Exclude only
Exclude or “inc a person”
Exclude/include no stat
Exclude/include stat for
some allele loci
6. Exclude/include for all loci
7. Other
0%
0%
1
2
0%
0%
3
4
0%
0%
5
6
0 of 30
30
Countdown
0%
7
Minor contributor stat
• For our purposes, it is an intimate sample
from known female contributor
• Female is major
– Major would have a single source stat
– But isn’t probative
• Focus on the minor (or foreign) contributor
Minor contributor stat
• Situations you need to be able to calculate
– When you know the minor type
– When you are concerned about drop out
– When you are not concerned about drop out, but
you don’t know the minor type (masking/sharing)
– When you do not see any minor alleles, but still
think the minor contributor is represented
We haven’t discussed the last two yet
Minor contributor stat
• When you know the minor type
– 10, 11
• 2pq
• 2(f10)(f11)
– 6, 9.3
• 2pq
• 2(f6)(f9.3)
Minor contributor stat
• When you are concerned about drop out
– 24, Any
• p2 + p(1-p)Θ + 2p(1-p)
• (f24)2 + (f24)(1-(f24))Θ + 2(f24) (1-(f24))
Minor contributor stat
• When you are not concerned about drop out,
but don’t know the minor type
• What types are possible?
– 9, 9
– 8, 9
– 9, 11
• “Combo stat”
Minor contributor stat
• “Combo stat”
• 9 is above stochastic threshold
– 9, 9
– 8, 9
– 9, 11
p2 + p(1-p)Θ
2pq
2pr
• Add them up
+
+
(f9)2 + (f9)(1-(f9))Θ + 2(f8) (f9) + 2(f9) (f11)
Minor contributor stat
• Section 5.2.2 - SWGDAM
5.2.2. When the interpretation is conditioned upon the assumption of a
particular number of contributors greater than one, the RMP is the
sum of the individual frequencies for the genotypes included following
a mixture deconvolution. Examples are provided below.
5.2.2.1. In a sperm fraction mixture (at a locus having alleles P, Q,
and R) assumed to be from two contributors, one of whom is the
victim (having genotype QR), the sperm contributor genotypes
included post-deconvolution might be PP, PQ, and PR. In this
case, the RMP for the sperm DNA contributor could be calculated
as [p2 + p(1-p)] + 2pq + 2pr.
Minor contributor stat
Minor contributor stat
• No minor alleles present, but you know the
minor is contributing
• Every other locus has minor alleles
• Did the enzyme just get lazy?
• “Just inc the locus for stats”
– That doesn’t make any more sense than throwing
out any other locus
– You just need the right calculator
Minor contributor stat
• Two scenarios to consider
– No stochastic concerns
– Stochastic concerns
• Two slightly different stats, but can deal with
both
Minor contributor stat
• No stochastic concerns
• In some cases, PHR and P may help
– 17, 17 or possibly 16, 17
– Maybe not 16, 16
• But, you know minor must be:
– 16, 16
– 16, 17
– 17, 17
p2 + p(1-p)Θ
2pq
This is the “combo” stat
q2 + q(1-q)Θ
+
+
Minor contributor stat
• Couple more definitions:
– “Unrestricted” RMP
• The “combo” stat where we used all possibilities
• 16,16 and 16,17 and 17,17 from previous slide
– “Restricted” RMP
• The “combo” stat where we chose not to use one (or
more) possible types based on what fits peak heights,
peak height ratios, or proportions of contributors
• 17,17 or 16,17 but not 16,16 from previous slide
Minor contributor stat
• What if stochastic concerns?
• You would take anyone with
– 16, Any
– 17, Any
(p2 + p(1-p)Θ) + 2p(1-p)
(q2 + q(1-q)Θ) + 2q(1-q)
• But that has the 16, 17 counted twice
– Subtract 16, 17
– But only once!
– 2pq
+
Modified random match probability
• Let’s look at this “double any” calculation
(p
p2 + p(1-p)Θ) + 2p(1-p) + (q
q2 + q(1-q)Θ) + 2q(1-q) – 2pq
• Simplify by removing Θ
p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq
• This is the basis for dealing with any number
of “Allele, Any” contributors
• USACIL calls this a modified RMP because
“Anys” are involved
Modified random match probability
• Let’s say we’ve got a two contributor mixture
with signs that both contributors are having
stochastic issues.
• But what you see is consistent with two
contributors
– Remember “Take a stand on the stand….”
– Validation studies, interpretation guidelines, your
experience, Tech Review agrees…
Modified random match probability
• We’ll start with this same pattern
• But stochastic concerns
– Homozygote threshold
– Mixture interpretation threshold
– Stochastic threshold
– Drop out threshold
16
230
17
260
(We’re not suggesting
that you MUST do this
- only that you can
calculate it.)
• Lets just call it the “Danger Zone”
– Why do I always think of “Top Gun” when I have
low peak heights?
Modified random match probability
• Remember the “Allele, Any”
– 2pq = 2p(1-p) or
– 2x(what you do see)x(what you don’t see)
– (We used it for a single allele below stochastic
threshold for partial or minor contributor)
• Because we have two contributors:
– 16, Any
– 17, Any
– Or both
16
230
17
260
Modified random match probability
• Also, remember the “combo stat” for the
combinations you can see
– p2 + 2pq + q2
– We’ll rearrange this in a minute
16
230
17
260
Modified random match probability
• Allele, Any for p
(16)
– 2(what you see)(what you don’t)
– 2p(1-?)
• You “see” two alleles now
• Both p and q (16 and 17)
16
230
17
260
– Stick with “1 – what you see” for what you don’t see
– 2p(1-(p+q))
for p (16)
• Same thing for q
– 2q(1-(p+q))
(17)
Modified random match probability
• So, the obvious combinations:
–
p2 + 2pq + q2 “Combo” for visible
• The “Allele, Any” combinations:
–
–
2p(1-(p+q)) Allele, Any for the 16
2q(1-(p+q)) Allele, Any for the 17
• Add them up
+
+
16
230
17
260
Modified random match probability
• Here is the formula for multiple Allele, Any
p2 + 2pq + q2 + 2p(1-(p+q)) + 2q(1-(p+q))
• Now we rearrange that first part
–
–
–
p2 + 2pq + q2
(p + q) x (p + q)
(p + q)2
• That last line should look familiar
Modified random match probability
• Remember back in the good old days?
• CPI stat
– For two alleles
– For three alleles
– …
– For nine alleles
(p + q)2
(p + q + r)2
(p + q + r + s + t + u + v + w + x)2
CPI
Modified random match probability
• Two ways to think about Allele, Any
– The way we derived it for that minor contributor
[p2 + 2p(1-p)] + [q2 + 2q(1-q)] – 2pq
– The way that works for as many contributors as
we may need
(p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))
• They are equivalent
– (Remember we dropped Θ for the top one)
– (CPI math is the foundation for the bottom one,
and doesn’t use Θ)
Modified random match probability
• Expand this one (“Double” Allele, Any – duplicate)
p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq
• To get
p2 + 2p – 2p2 + q2 + 2q – 2q2 – 2pq
• Rearrange the terms
p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
Modified random match probability
• Now expand the other one (Multiple Allele, Any)
(p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))
• To get
p2 + 2pq + q2 + 2p – 2p2 – 2pq + 2q – 2q2 – 2pq
• Rearrange the terms
p2 + q2 + 2p + 2q + 2pq – 2pq – 2pq – 2p2 – 2q2
• Condense the 2pq terms
p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
Modified random match probability
• Now compare them:
• This was the “single source” one (2 slides ago)
p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
• This is the “generic” form for multiple
contributors (previous slide)
p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
Modified random match probability
• Section 5.2.2.3 - SWGDAM
5.2.2.3. In a mixture having at a locus alleles P, Q, and R, assumed to be from
two contributors, where all three alleles are below the stochastic threshold, the
interpretation may be that the two contributors could be a heterozygotehomozygote pairing where all alleles were detected, a heterozygoteheterozygote pairing where all alleles were detected, or a heterozygoteheterozygote pairing where a fourth allele might have dropped out. In this case,
the RMP must account for all heterozygotes and homozygotes represented by
these three alleles, but also all heterozygotes that include one of the detected
alleles. The RMP for this interpretation could be calculated as (2p – p2) + (2q –
q2) + (2r – r2) – 2pq – 2pr – 2qr.
5.2.2.3.1. Since 2p includes 2pq and 2pr, 2q includes 2pq and 2qr, and 2r
includes 2pr and 2rq, the formula in 5.2.2.3 subtracts 2pq, 2pr, and 2qr to
avoid double-counting these genotype frequencies.
Modified random match probability
• To use RMP you must state the number of
contributors
– Validation studies
– Experience
– Yadda, yadda
• Now that we know how to deal with drop out
via Allele, Any, we can use RMP more often
• Modified RMP (modified denotes “Anys”)
– This is the language we use at our lab
CPI compared to RMP
• But CPI is NOT the same as RMP
– CPI is used when you are unsure about the
number of contributors
– Consequently, you have problems when you have
alleles in the stochastic range – “Danger Zone”
– If you don’t know how many contributors you
have, you don’t know how many alleles are
missing
CPI compared to RMP
• But we can use the CPI math in our RMP stat
• We must make two changes to the “base” CPI
formula that we use in the RMP
– 1. We must correct for situations that change the
number of contributors
– 2. We must account for allelic drop out
• We’ve been through that second, so let’s deal
with the first
CPI compared to RMP
• Consider a four allele pattern
• We interpret the overall profile as having two
contributors.
• CPI considers all possible “visible”
combinations of contributors
– (p + q + r + s)2
– This includes P, P and Q, Q and R, R and S, S types
CPI compared to RMP
• But if you think you could have a P, P
contributor, that leaves three alleles left
• We stated that there were only 2 contributors
– If Contributor #1 is P, P
– Contributor #2 cannot account for Q, R and S
alleles
• Having a homozygote changes the assumption
of the number of contributors
CPI compared to RMP
• So all we need to do is subtract the
homozygotes – but only when the presence of
a homozygote changes the number of
contributors
– 2 contributors and 4 alleles detected
– 3 contributors and 6 alleles detected
CPI compared to RMP
• Easy to do with a friendly computer
– (p + q + r + s)2 – p2 – q2 – r2 – s2
– (p + q + r + s + t + u)2 – p2 – q2 – r2 – s2 – t2 – u2
• USACIL defines this as an “Unrestricted” RMP
• We kind of think of it as a CPI stat corrected
for a defined number of contributors
Unrestricted RMP
• Section 5.2.2.6 - SWGDAM
5.2.2.6. The unrestricted RMP might be calculated for mixtures that display
no indications of allelic dropout. The formulae include an assumption of the
number of contributors, but relative peak height information is not utilized.
For two-person mixtures, the formulae for loci displaying one, two, or three
alleles are identical to the CPI calculation discussed in section 5.3. For loci
displaying four alleles (P, Q, R, and S), homozygous genotypes would not
typically be included. The unrestricted RMP in this case would require the
subtraction for homozygote genotype frequencies, e.g., (p + q + r + s) 2 –
p2 – q2 – r2 – s2.
Modified random match probability
• Same thing for our “Allele, Any” situation
• No need to consider an “Allele, Any” if it
changes the number of contributors
• It doesn’t matter how many alleles are below
your stochastic threshold
– If you say there are 2 contributors and you detect
4 alleles, by definition there are no alleles missing
– Similar for 3 contributors and 6 alleles detected
Modified random match probability
•
•
•
•
About as bad as it can get
3 contributors
All alleles are in the Danger Zone
Each allele could be missing it’s sister allele
(p+q+r+s+t)2 + 2p(1-(p+q+r+s+t)) + 2q(1-(p+q+r+s+t)) +
2r(1-(p+q+r+s+t)) + 2s(1-(p+q+r+s+t)) + 2t(1-(p+q+r+s+t))
Modified random match probability
• GIANT DISCLAIMER!!
• We are not saying that you can charge ahead
and now use any profile of any number of
people with any number of alleles dropping
out if you just use a modified RMP calculation
• Bad data is bad data
• It’s science, not Voodoo