Chances Are - University of Nebraska–Lincoln

Download Report

Transcript Chances Are - University of Nebraska–Lincoln

Chances Are
A look at probability
and its application to
beef production and
diagnostic testing
Everyday probabilities
Some every day probabilities
Probability concepts



Likelihood
Predictability
Certainty/Confidence (or the lack of)
Basis of probability

Counting outcomes
–
–
–
–
–
–
How many cows do I have (100)
How many cows have calves at their side this year
(82)
How many cows were exposed to the bull (97)
How many cows were diagnosed pregnant (91)
How many cows had bull calves (37)
How many cows required assistance at calving (7)
Truth of Probability
“Not everything that counts can be counted,
and not everything that can be counted
counts.” Einstein
Likelihood

What is the likelihood that a cow selected from
the herd described in the previous slide does
not have a calf at her side?
–
Likelihood=Odds count the possibilities



The number of cows that did not have a calf at their side
was 18
The number of cows that did have calves at their side was 82
So the likelihood/odds would be 18:82
Predictability

If you select a cow at random from the herd what are
the chances that you will select a cow without a calf.
–
–
–
Prediction=Probability= the potential of an event expressed
as a relative frequency or mathematically as f/n where n is
the total number of events and f is the number of events of
interest.
The sum of all possible events is 1. If I flip a coin the
probability it is a head is .5 a tail .5 the probability it is a head
or a tail is 1
In this example n=100 and f=18 the probability of any one
cow selected meeting the criteria is 18/100 or .18 and if the
selection was made 100 times 18% of the time you would
expect to select a cow without a calf.
Predictability


(continued)
Sometimes we might be interested in how many times
an event might occur, such as to evaluate a new test I
test 5 cows from a herd for “Disease X” what is the
probability that the new test will find 3 test positive and
2 test negative?
For simplicity assume.
–
–
–
We already know the status of the animals as being infected.
The prevalence is 1.00
After testing the animals are returned to the herd (in this
scenario with known infection it doesn’t make any difference)
The Se of the test is .9 (the probability the test is negative is .1
and the probability the test is positive is .9) & the Sp is 1
So how do we approach the
problem?
Let TP = test positive and TN = test negative
TP=.9 TN=.1 If we perform the test 5 times the outcome
would look like this:
(TP+TN)5
TP5+5TP4TN1+10TP3TN2+10TP2TN3 +5TP1TN4 +TN5
The coefficient1 corresponds to the number of possible
combinations and the exponent to the number of times
that event might occur, there are 10 possible
combinations of getting 3 test positives and 2 test
negatives. The probability then is 10*.93*.12 or .0729.
1
coefficient 
n!
k!(n  k )!
Expanding the problem
What would be the probability of at least 3 test
positives.
TP5+5TP4TN1+10TP3TN2
.590+.328+.073=.991
If you consider that you don’t know they are all infected
but know the prevalence then values change by
factoring in prevalence and the initial formula would
change to look like this; (still Sp of 1)
P5(TP+TN)5 + SP5(1-P)5
If you consider a test that is less than 100% specific
then you ADD in the confusion of the false positive
and a more complex formula.
Certainty

If I select 5 cows at random to check to see if
the cows have been exposed to “X disease”
using a newly validated diagnostic test, how
certain/confident would I be that by testing 5
cows a statement regarding the presence of “X
disease” could be made.
THE PLOT THICKENS
Sample size formulas

n  1 

1 
1
D
 
D  1
 * N 

2



n=required sample size
α=confidence level
D=number of diseased animals
N=total population or herd size
Short comings of the formula

Only applies to a perfect test
–
Diagnostic tests have limitations




Se
Sp
A positive test does not indicate disease nor a negative test
an absence of disease
Trial and error to determine confidence (or a
complex math formula)
What information do you need to
answer the questions about sample
size?



Test parameters Se and Sp is the test perfect
or imperfect?
Is the population in question large or small?
Estimated prevalence?
–
Sources for estimated prevalence





Literature
National reports, NAHMS
Diagnostic Labs (may be a biased answer)
Clinical experience/local surveys (a topic for later)
Others?
Methods to estimate confidence
Sample size spread sheet
Formulas behind the spreadsheet
Formulas
 Sample size to detect a positive animal with an imperfect test and a large
group (Can be used with Goal seek to determine prevalence if , Se, Sp and
sample size is known).
S
ln( )
a
ln[ p (1  Se)  (1  p ) Sp]
Where  = 1 -  (where  is the desired level of confidence)
S = sample size
p = prevalence
Se = test sensitivity
Sp = test specificity

Formula to calculate  (the level of confidence (probability) of detecting an
infected animal) for an imperfect test and a small group (Can be used with
Goal seek to determine prevalence if , Se, Sp and sample size are set to
predetermined values.)
 pM  (1  p ) M 



x
(s x)
x  ( s  x) 
  
(1 Se) Sp
M 
x 0
 
s 
s
a
Where  = 1 -  (where  is the desired level of confidence)
S = sample size
p = prevalence
M = herd size
Se = test sensitivity
Sp = test specificity
a
Vose, David. Risk Analysis a Quantitative Guide
Kennedy’s oversimplified
definitions






P=the proportion diseased/infected/of interest (D) in a
population (N)
(1-P)=the proportion NOT D
Se=the proportion that are D that a test detects (T+|D)
(1-Se)=the proportion D the test fails to detect (false
negatives)
Sp=the proportion NOT D (ND) that a test correctly
identifies negative (T-)
(1-Sp)=the proportion ND a test incorrectly identifies D
(false positives)
Kennedy’s oversimplified formulas
D
N
A.
P 
B.
(1  P ) 
C.
Se 
D.
(1  S e) 
E.
T 
Sp 
ND
F.
ND
N
T 
D
(1  S p) 
(T  | D )
D
(T  | ND )
ND
N=Total Population
D=Diseased
ND=Not Diseased
P=Prevalence
Se=Sensitivity
Sp=Specificity
T+=Test positive
T-=Test negative
More of Kennedy’s oversimplified
formulas
C.
( P  Se)
PPV 
((1  Se)  (1  P)  P  Se)
Sp
NPV 
(1  Se)  SP
EFF  P  Se  (1  P)  Sp
D.
AP  P  Se  (1  P)  (1  Sp)
A.
B.
PPV=Positive Predictive Value
NPV=Negative Predictive Value
EFF=Efficiency proportion of positives
and negatives correctly classified
AP=Apparent Prevalence
Sources of diagnostic test error
or
the lab never gets it right
Where testing error happens




Pre-analytical error sources, wrong sample,
mishandled sample, improper sample collection, etc.
Starts from collection and goes until analysis begins.
Analytical error, analytic variation such as mechanical
wear and tear or inherent error such as that seen with
a set of spring type scales.
Biological variation, an average means some are
higher and some are lower.
Post-analytical error, reporting errors misread values
or misreported values, transposition of figures, etc.
Test expectations

Repeatability
–
–
–
–
Consider flipping a fair coin 5 times the chances of all being
heads is.5x.5x.5x.5.x5 or 3.125%.
Consider flipping a weighted coin 5 times that is expected to
be heads 90% of the time the chances that 5 heads will be
returned is .9x.9x.9x.9x.9 or 59%, meaning 41% of the time
one or more tails would occur.
The same principle applies to a diagnostic test.
So how do you interpret when two labs disagree?
Did the test Miss
Rates


How many miles would you expect to drive before you
get a nail in your tire?
In 2004 how many aviation fatalities occurred per
100,000 hours flown?
–
–

General aviation 2.15
Commercial aviation .08
In every 1000 diagnostic tests performed how many
times does the test fail beyond incorrect results due to
Se and Sp?
Repetition/Rates?

Does repeating a task increase the probability
something will go wrong? i.e.
–
–
–
You and your neighbor purchase new tires for your cars. Both
of you drive to the same place to work each day on the same
road, but you come home for lunch while he doesn’t, who is
more likely to get a nail in one of their tires?
Aviation gas gets cheap so I fly twice as much will I be more
likely to become a fatality? Maybe
If I run 1000 individual diagnostic tests am I more likely to
misclassify an animal than if I ran 10 test each containing
samples of 100 animals? Controversial
Each mile you drive, each hour you fly, or each test you run are independent events.
Over a given period the number of times that an event occurred is a rate
(rates have units probabilities do not) Risk is the probability of a negative event.
Think about life insurance. On the other hand dependent events have changing probabilities.
Why pooled testing?

Pooled testing offers advantages over
individual testing
–
–
Allows the diagnostician to take advantage of highly
sensitive and specific tests while minimizing cost
Diminishes cumulative testing error over individual
tests
Why not pooled testing?
–
–
–
Potential impact of dilution diminishing Se
Logistical requirements for pooling samples
(pooling of individual samples can be labor
intensive)
Loss of samples for follow-up testing on
positive pools
Assumptions associated with
pooled testing




Pooled test Se must be approximately the
same as individual test Se
Samples must be easily obtainable
Pools must represent a homogenous mixture
of samples
The outcome is binomially distributed, i.e. a discrete
probability distribution of the number of successes in a sequence of
independent yes/no events each yielding success with a probability p
Our human counterparts institute
pooled testing strategies



For generations the military has attempted to screen its
applicants/inductees to insure they were healthy and
would not become a liability on the battlefield.
Early screening involved a physical exam to insure all
parts were present and properly located.
Later blood tests for infectious disease became
available and were included in the screening process
Pooled testing during WWII



Syphilis had plagued the
military since the first soldier
marched off to war.
They could mandate controls
after recruitment to help slow
its spread but that was not
enough
To minimize the risk they
looked to tests that would
detect carriers before they
were inducted.
Military Test for Syphilis


The test used during WWII to
insure inductees were free
from infection was a
Wasserman type blood test.
– A sample of blood is
drawn from each inductee.
– Then each sample is
tested.
The procedure was
expensive, time consuming
and amplified testing error.
Time and cost of test encouraged a
change in the process



The military implemented a procedure where a small
quantity of blood from multiple inductees was pooled
and a single test was run on the pool
Sufficient blood remained that the positive individual
among the pool could be identified.
The study of using pooled testing as a screen lead to
two conclusions/considerations on pooling.
–
–
Prevalence must be low enough to make pooling more
economical
It must be easier to obtain an observation on a group than on
the individuals within the group (minimize the number of tests).
reference Robert Dorffman The Detection of Defective Members of Large Populations, Annals of Mathematical Statistics,
Vol. 14, Dec 1943
More Recent Use of Pooled Testing
Strategy in Human Medicine




ELISA and Western Blot tests were used to screen for
human immunodeficiency virus (HIV).
An ELISA test was used initially and then a Western
Blot was the confirmatory test.
The ELISA alone was prone to falsely classifying
samples positive and therefore may result in an
overestimation of prevalence,
Western blots were done to confirm HIV presence, but
are expensive.
Reference; Tu, Litvak, Pagano. Studies of AIDS and HIV surveilance, screening tests: can we get more by doing less?, Xin M. Tu
Eugene Litvak, Marcello Pagano, Statistics in Medicine, Vol 13, 1905-1919 (1994).
Pooling to Screen for HIV


Blood Mobile—time and money made
individual testing at the human “herd” level
unappealing plus creating issues of false
classifications.
So what about pooling samples,
–
–
–
–
Up to 15 samples were pooled without a loss of Se
Pooling diminished false positives
Less cost
Fewer tests were needed
A step further on the pooling

A JAMA article Jul 2002 described the following
protocol.
–
–
–
Pool samples of blood in groups of 10 to determine
the absence of HIV antibodies
From the negative pools, form pools of 90
individuals and run RT-PCR to detect the presence
of the HIV virus
Used to find the presence of the virus prior to the
time antibodies are formed allowing earlier
treatment and preventing spread
Trial results




8000 people visiting publicly funded HIV clinics in the
Southern USA were subjects of the test
Antibody tests found 39 long term infected individuals
(those that had formed Ab to HIV)
RT-PCR testing of pools of 90 serologically negative
samples found 4 additional positive individuals.
The cost to find the 4 additional individuals was
$4109.00 per individual, if individual PCR’s had been
done the cost would have been ~$360,000.00 per
positive individual.
Pooled testing/Screening Human
Applications


Screening tests have been used to identify
infected individuals in large populations, such
as the military or blood donors.
Screening tests are used to estimate
prevalence.
Veterinarians and screening tests

Limited applications of screening test
strategies
–
–
–
–
Salmonella contamination of eggs
Johne’s fecal pools
BVD
T. foetus
Point


Human medicine has implemented the concept
of pooling when human life is at risk, should
veterinarians be open to the concept to
address herd health issues in livestock?
Possible veterinary applications
–
–
–
Screening to evaluate treatment success
Determine prevalence prior to instituting control
programs
Screening to evaluate vaccine success
Estimation of prevalence using
pooled PCR
x 

 Se

m 
AP  1  
 SeSp1
=

1
k
Se  sensitivity Sp  specificity
k  poolsize m  pools tested
x  positivepools
formulafrom prev med 39 (1999)
Cowling, Gardner, Johnson