Transcript Slide 1
New interval estimating procedures for the
disease transmission probability in multiplevector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
[email protected] and [email protected]
Introduction
Plant disease is responsible for major losses in agricultural
throughout the world
Diseases are often spread by insect vectors (e.g., aphids,
leafhoppers, planthoppers, etc.)
Example:
www.knowledgebank.irri.org/ricedoctor_mx/Fact_Sheets/Pests/Planthopper.htm
Brown
planthopper
Whitebacked
planthopper
Joshua M. Tebbs and Christopher R. Bilder
2
Example
Ornaghi et al. (1999) study the effects of the “Mal Rio Cuarto”
(MRC) virus and its spread by the Delphacodes kuscheli
planthopper
The MRC virus is most-damaging maize virus in Argentina
It was desired to estimate p, the probability of disease
transmission for a single vector
Vector-transfers are often used by plant pathologists wanting
to estimate p
In such experiments, insects are moved from an infected
source to the test plants
Joshua M. Tebbs and Christopher R. Bilder
3
Single-vector transfers
The most straightforward way to estimate p is by using a
single-vector transfer
Each test plant contains one vector, and test plants must be
individually caged
Under the binomial model, the proportion of infected test
plants gives the maximum likelihood estimate of p
Disadvantages with a single-vector transfer:
Requires a large amount of space (since insects must be
individually isolated)
Is a costly design since one needs a large number of test
plants and individual cages
Joshua M. Tebbs and Christopher R. Bilder
4
Multiple-vector transfers
A group of s > 1 insect vectors is allocated to each test plant.
Even though test plants are occupied by multiple insects,
the goal is still to estimate p, the probability of disease
transmission for a single vector
Planthopper
Y=0
Y=0
Y=1
Greenhouse
Does not
transmit
virus
Transmits
virus
Enclosed
test plant
Y=0
Y=1
Joshua M. Tebbs and Christopher R. Bilder
Y=0
5
Multiple-vector transfers
Advantages of a multiple-vector versus single-vector transfer:
Potential savings in time, cost, and space
Statistical properties of estimators are much better (for a
fixed number of test plants)
A multiple-vector transfer is an application of the group-testing
experimental design
Other applications of group testing:
Infectious disease seroprevalence estimation in human
populations
Disease-transmission in animal studies
Drug discovery applications
Joshua M. Tebbs and Christopher R. Bilder
6
Notation and assumptions
Define:
n = number of test plants
s = number of insects per plant (“group size”)
Y=1 “infected test plant” – plant for which at least one
vector (out of s) infects
Y=0 “uninfected test plant” – plant for which no vectors (out
of s) infect
Assumptions:
Common group size s
The statuses of individual vectors are iid Bernoulli random
variables with mean p
The statuses of test plants are independent
Test plants are not misclassified
Joshua M. Tebbs and Christopher R. Bilder
7
Maximum likelihood estimator for p
Let T = Y denote the number of infected test plants. Under
our design assumptions, T has a binomial distribution with
parameters n and
1 (1 p )
s
The maximum likelihood estimator of p is given by
1/ s
pˆ 1 (1 ˆ ) ,
where ˆ T / n (the proportion of infected test plants)
Estimates of p are computed by only examining the test plants
(and not the individual vectors themselves)
The binomial model is only appropriate if test plants do not
differ materially in their resistance to pathogen transmission
Joshua M. Tebbs and Christopher R. Bilder
8
Properties of the MLE and the Wald CI
The statistic pˆ has the following properties:
Consistent as n gets large
Approximately normally distributed; more precisely,
pˆ
A N [ p , v ( p )/ n ],
where
v (p )
1 (1 p )
s (1 p )
2
s
s 2
A 100(1-) percent Wald confidence interval is given by
pˆ z / 2 vˆ ( pˆ ) n , where
vˆ ( pˆ )
s
1 (1 pˆ )
2
s 2
s (1 pˆ )
Joshua M. Tebbs and Christopher R. Bilder
9
Variance stabilizing interval (VSI)
Goal: Find g ( pˆ ), whose variance is free of the parameter p
Solve the following differential equation:
c 0 s (1 p )
2
g ' (p )
1 -(1 p )
s 2
s
s
With c0 = 1, a solution is given by g ( p ) 2a rcta n (1 p ) 1
It follows that
1
1 + co s( a )
2
1 /s
1 + co s( b )
,1
2
1/ s
is a 100(1-) percent confidence interval for p. Here,
s
a 2arctan (1 pˆ ) 1 z1 / 2 / n
s
b 2a rcta n (1 pˆ ) 1 z1 / 2 / n
Joshua M. Tebbs and Christopher R. Bilder
10
Modified Clopper-Pearson (CP) interval
The number of infected test plants, T, has a binomial
distribution with parameters n and 1 (1 p )s
One can obtain an exact Clopper-Pearson interval for and
then transform back to the p scale (Chiang and Reeves, 1962)
Exact 100(1-) percent confidence limits for p are given by
1
1 1
n t 1
1
F1 / 2,2 ( n t 1),2 t
t
1/ s
t 1
1/ s
F1 / 2,2 ( t 1),2 ( n t )
n
t
,
and 1 1
t 1
1
F1 / 2,2 ( t 1),2 ( n t )
nt
where F1-,a,b denotes the 1- quantile of the central F
distribution with a (numerator) and b (denominator) degrees of
freedom
Joshua M. Tebbs and Christopher R. Bilder
11
Comparing the Wald, VSI, and CP
The Wald interval is simple and easy to compute. However, it
has three main drawbacks:
Provides symmetric confidence intervals even though the
distribution of pˆ may be very skewed
Often produces negative lower limits when p is small!
The VSI handles each of these drawbacks
Not symmetric
Always produces lower limits within the parameter space
(i.e., strictly larger than zero)
The CP interval’s main advantage is that its coverage
probability is always greater than or equal to 1-. However,
such intervals can be wastefully wide, especially if n is small.
Joshua M. Tebbs and Christopher R. Bilder
12
Bayesian estimation
Prior distribution for p
One parameter Beta distribution
fP ( p | ) (1 p )
50
40
30
20
10
I (0 p 1)
0
1
for a known value of
Takes into account p is small
Example when = 52.4
f(p)
0.00
0.02
0.04
0.06
0.08
p
Joshua M. Tebbs and Christopher R. Bilder
13
Bayesian estimation
Prior distribution for p
Why use one parameter instead of two parameter Beta?
Sensible model acknowledging p is small
Bayes and empirical Bayes estimators are simpler
Resulting estimator using squared error loss with a
two parameter beta is ratio of complicated alternating
sums
See Chaubey and Li (Journal of Official Statistics,
1995) for Bayes estimators
Joshua M. Tebbs and Christopher R. Bilder
14
Bayesian estimation
Posterior distribution for 0 < p < 1
fP |T ( p | t , ) fT ,P ( t , p | ) / fT ( t | )
s ( n / s 1)
( n t / s ) ( t 1)
(1 p )
s ( n t ) 1
[1 (1 p ) ]
s t
Note: U = 1 − (1 − P)s ~ beta(t + 1, n − t + /s)
Joshua M. Tebbs and Christopher R. Bilder
15
Empirical Bayesian estimation
Use the marginal distribution for T to derive an estimate for
Why?
Avoid possible poor choice for
n is often small in multiple-vector transfer experiments
Posterior may be adversely affected by the prior
Marginal distribution of T for t = 0, 1, …, n
fT ( t | )
( n 1) ( n t / s )
s ( n t 1) ( n / s 1)
Maximize fT(t|) as a function of to obtain the marginal
maximum likelihood estimate,ˆ
Iteratively solve for in
lo g fT ( t | )
1
s
1
( n t
/ s ) ( n / s 1) 0
where ( ) is the digamma function
Joshua M. Tebbs and Christopher R. Bilder
16
Credible intervals
(1 − )100% Equal-tail
[pL, pU] satisfy
pL
1
fP |T ( p | t , ˆ )d p / 2
0
and fP |T ( p | t , ˆ )d p / 2
pU
Use relationship with Beta distribution, U = 1 − (1 − p)s ~
beta(t + 1, n − t + ˆ /s)
1/ s
1/ s
Interval: 1 (1 B / 2;t 1,n t ˆ / s ) ,1 (1 B1 / 2;t 1,n t ˆ / s )
where B,a,b is the quantile of a Beta(a,b) distribution
Remember that = 1 − (1 − p)s implies p = 1 − (1 − )1/s
Joshua M. Tebbs and Christopher R. Bilder
17
Credible intervals
(1 − )100% highest posterior density (HPD) regions
Posterior is unimodal and right skewed
Find [pL, pU] such that (1 − )100% area of posterior
density is included and pU − pL is as small as possible
See Tanner (1996, p. 103-4)
Key is to sample from posterior distribution
Use U = 1 − (1 − p)s ~ beta(t + 1, n − t + ˆ /s) relationship
Joshua M. Tebbs and Christopher R. Bilder
18
Example - Ornaghi et al. (1999)
Data
s = 7 planthoppers per plant
n = 24 plants
t = 3 infected plants observed
ˆ 5 2 .4
95% interval estimates for p
Lower
Interval
limit
Wald
-0.0023
VSI
0.0037
Modified Clopper-Pearson 0.0038
Equal-tail
0.0052
HPD
0.0034
Upper
limit
0.0401
0.0465
0.0543
0.0410
0.0373
Joshua M. Tebbs and Christopher R. Bilder
Length
0.0424
0.0428
0.0505
0.0358
0.0339
19
Interval comparisons
Coverage
t
n
s
s (n t )
I ( n , t , s ) 1 1 p
1 p
t 1
t
n 1
C ( p, n , s )
1 1 p
sn
s
1 1 p
n
,
where I(n,t,s) = 1 if the interval contains 1 and I(n,t,s) = 0
otherwise.
Do not consider the t = 0 and t = n cases
Poor multiple-vector transfer experimental design
See Swallow (1985, Phytopathology) for guidance in
choosing s
Brown, Cai, and DasGupta (2001, Statistical Science)
Frequentist evaluation similar to how Carlin and Louis
(2000) approach evaluating confidence and credible
intervals
Joshua M. Tebbs and Christopher R. Bilder
20
Interval comparisons
= 0.05, n=40, and s=10
Black line denotes Wald & bold line denotes plot title
0.95
0.85
0.04
0.06
0.08
0.10
0.00
0.02
0.04
0.06
p
p
Equal-tail
HPD
0.08
0.10
0.08
0.10
0.95
0.90
0.85
0.85
0.90
Coverage
0.95
1.00
0.02
1.00
0.00
Coverage
0.90
Coverage
0.95
0.90
0.85
Coverage
1.00
Clopper-Pearson
1.00
VSI
0.00
0.02
0.04
0.06
0.08
0.10
0.00
p
0.02
0.04
0.06
p
Joshua M. Tebbs and Christopher R. Bilder
21
Summary
Best interval: VSI or modified Clopper-Pearson
Credible intervals may be improved by taking into account
variability of the estimators
Bootstrap intervals mentioned in abstract – VSI and
Clopper-Pearson perform better
Many other intervals could be investigated!
Website
www.chrisbilder.com/bilder_tebbs
Contains R programs for examining the interval estimation
properties
Different values of p, n, and s can be used
Also calculates empirical Bayes estimators
Program for Ornaghi et al. (1999) data example
Joshua M. Tebbs and Christopher R. Bilder
22
New interval estimating procedures for the
disease transmission probability in multiplevector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
[email protected] and [email protected]
Contact address starting Fall 2003:
Joshua M. Tebbs
Christopher R. Bilder
Department of Statistics
Department of Statistics
Kansas State University
University of Nebraska-Lincoln
[email protected]