Transcript Slide 1

New interval estimating procedures for the
disease transmission probability in multiplevector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
[email protected] and [email protected]
Introduction


Plant disease is responsible for major losses in agricultural
throughout the world
Diseases are often spread by insect vectors (e.g., aphids,
leafhoppers, planthoppers, etc.)
 Example:
www.knowledgebank.irri.org/ricedoctor_mx/Fact_Sheets/Pests/Planthopper.htm
Brown
planthopper
Whitebacked
planthopper
Joshua M. Tebbs and Christopher R. Bilder
2
Example



Ornaghi et al. (1999) study the effects of the “Mal Rio Cuarto”
(MRC) virus and its spread by the Delphacodes kuscheli
planthopper
 The MRC virus is most-damaging maize virus in Argentina
 It was desired to estimate p, the probability of disease
transmission for a single vector
Vector-transfers are often used by plant pathologists wanting
to estimate p
In such experiments, insects are moved from an infected
source to the test plants
Joshua M. Tebbs and Christopher R. Bilder
3
Single-vector transfers


The most straightforward way to estimate p is by using a
single-vector transfer
 Each test plant contains one vector, and test plants must be
individually caged
 Under the binomial model, the proportion of infected test
plants gives the maximum likelihood estimate of p
Disadvantages with a single-vector transfer:
 Requires a large amount of space (since insects must be
individually isolated)
 Is a costly design since one needs a large number of test
plants and individual cages
Joshua M. Tebbs and Christopher R. Bilder
4
Multiple-vector transfers

A group of s > 1 insect vectors is allocated to each test plant.
 Even though test plants are occupied by multiple insects,
the goal is still to estimate p, the probability of disease
transmission for a single vector
Planthopper
Y=0
Y=0
Y=1
Greenhouse
Does not
transmit
virus

Transmits
virus
Enclosed
test plant
Y=0
Y=1
Joshua M. Tebbs and Christopher R. Bilder
Y=0
5
Multiple-vector transfers



Advantages of a multiple-vector versus single-vector transfer:
 Potential savings in time, cost, and space
 Statistical properties of estimators are much better (for a
fixed number of test plants)
A multiple-vector transfer is an application of the group-testing
experimental design
Other applications of group testing:
 Infectious disease seroprevalence estimation in human
populations
 Disease-transmission in animal studies
 Drug discovery applications
Joshua M. Tebbs and Christopher R. Bilder
6
Notation and assumptions


Define:
 n = number of test plants
 s = number of insects per plant (“group size”)
 Y=1 “infected test plant” – plant for which at least one
vector (out of s) infects
 Y=0 “uninfected test plant” – plant for which no vectors (out
of s) infect
Assumptions:
 Common group size s
 The statuses of individual vectors are iid Bernoulli random
variables with mean p
 The statuses of test plants are independent
 Test plants are not misclassified
Joshua M. Tebbs and Christopher R. Bilder
7
Maximum likelihood estimator for p

Let T = Y denote the number of infected test plants. Under
our design assumptions, T has a binomial distribution with
parameters n and
  1  (1  p )

s
The maximum likelihood estimator of p is given by
1/ s
pˆ  1  (1  ˆ ) ,


where ˆ  T / n (the proportion of infected test plants)
Estimates of p are computed by only examining the test plants
(and not the individual vectors themselves)
The binomial model is only appropriate if test plants do not
differ materially in their resistance to pathogen transmission
Joshua M. Tebbs and Christopher R. Bilder
8
Properties of the MLE and the Wald CI

The statistic pˆ has the following properties:
 Consistent as n gets large
 Approximately normally distributed; more precisely,
pˆ
A N [ p , v ( p )/ n ],
where
v (p ) 

1  (1  p )
s (1  p )
2
s
s 2
A 100(1-) percent Wald confidence interval is given by
pˆ  z / 2 vˆ ( pˆ ) n , where
vˆ ( pˆ ) 
s
1  (1  pˆ )
2
s 2
s (1  pˆ )
Joshua M. Tebbs and Christopher R. Bilder
9
Variance stabilizing interval (VSI)


Goal: Find g ( pˆ ), whose variance is free of the parameter p
Solve the following differential equation:
c 0 s (1  p )
2
g ' (p ) 


1 -(1  p )
s 2
s
s
With c0 = 1, a solution is given by g ( p )  2a rcta n (1  p )  1
It follows that

1 


 1 + co s( a ) 


2


1 /s
 1 + co s( b ) 
,1  

2


1/ s




is a 100(1-) percent confidence interval for p. Here,
s
a  2arctan  (1  pˆ )  1   z1  / 2 / n


s
b  2a rcta n  (1  pˆ )  1   z1  / 2 / n


Joshua M. Tebbs and Christopher R. Bilder
10
Modified Clopper-Pearson (CP) interval



The number of infected test plants, T, has a binomial
distribution with parameters n and   1  (1  p )s
One can obtain an exact Clopper-Pearson interval for  and
then transform back to the p scale (Chiang and Reeves, 1962)
Exact 100(1-) percent confidence limits for p are given by


1
1  1 
n t 1
 1
F1  / 2,2 ( n  t  1),2 t
t






1/ s
t 1
1/ s


F1  / 2,2 ( t  1),2 ( n  t )


n

t
,
and 1  1 

t 1
 1
F1  / 2,2 ( t  1),2 ( n  t ) 
nt


where F1-,a,b denotes the 1- quantile of the central F
distribution with a (numerator) and b (denominator) degrees of
freedom
Joshua M. Tebbs and Christopher R. Bilder
11
Comparing the Wald, VSI, and CP



The Wald interval is simple and easy to compute. However, it
has three main drawbacks:
 Provides symmetric confidence intervals even though the
distribution of pˆ may be very skewed
 Often produces negative lower limits when p is small!
The VSI handles each of these drawbacks
 Not symmetric
 Always produces lower limits within the parameter space
(i.e., strictly larger than zero)
The CP interval’s main advantage is that its coverage
probability is always greater than or equal to 1-. However,
such intervals can be wastefully wide, especially if n is small.
Joshua M. Tebbs and Christopher R. Bilder
12
Bayesian estimation
Prior distribution for p
 One parameter Beta distribution
fP ( p |  )   (1  p )
50
40
30
20
10

I (0  p  1)
0

 1
for a known value of 
Takes into account p is small
Example when  = 52.4
f(p)

0.00
0.02
0.04
0.06
0.08
p
Joshua M. Tebbs and Christopher R. Bilder
13
Bayesian estimation

Prior distribution for p
 Why use one parameter instead of two parameter Beta?
 Sensible model acknowledging p is small
 Bayes and empirical Bayes estimators are simpler
 Resulting estimator using squared error loss with a
two parameter beta is ratio of complicated alternating
sums
 See Chaubey and Li (Journal of Official Statistics,
1995) for Bayes estimators
Joshua M. Tebbs and Christopher R. Bilder
14
Bayesian estimation

Posterior distribution for 0 < p < 1
fP |T ( p | t ,  )  fT ,P ( t , p |  ) / fT ( t |  )


s  ( n   / s  1)
 ( n  t   / s )  ( t  1)
(1  p )
s ( n  t )   1
[1  (1  p ) ]
s t
Note: U = 1 − (1 − P)s ~ beta(t + 1, n − t + /s)
Joshua M. Tebbs and Christopher R. Bilder
15
Empirical Bayesian estimation



Use the marginal distribution for T to derive an estimate for 
Why?
 Avoid possible poor choice for 
 n is often small in multiple-vector transfer experiments
 Posterior may be adversely affected by the prior
Marginal distribution of T for t = 0, 1, …, n
fT ( t |  ) 

 ( n  1)  ( n  t   / s )
s  ( n  t  1)  ( n   / s  1)
Maximize fT(t|) as a function of  to obtain the marginal
maximum likelihood estimate,ˆ
 Iteratively solve for  in


lo g fT ( t |  )  
1
s
1
 ( n  t  
/ s )   ( n   / s  1)   0
where ( ) is the digamma function
Joshua M. Tebbs and Christopher R. Bilder
16
Credible intervals

(1 − )100% Equal-tail
 [pL, pU] satisfy
pL
1
 fP |T ( p | t , ˆ )d p   / 2
0


and  fP |T ( p | t , ˆ )d p   / 2
pU
Use relationship with Beta distribution, U = 1 − (1 − p)s ~
beta(t + 1, n − t + ˆ /s)
1/ s
1/ s
Interval: 1  (1  B / 2;t  1,n  t  ˆ / s ) ,1  (1  B1 / 2;t  1,n  t  ˆ / s ) 
where B,a,b is the  quantile of a Beta(a,b) distribution
Remember that  = 1 − (1 − p)s implies p = 1 − (1 − )1/s
Joshua M. Tebbs and Christopher R. Bilder
17
Credible intervals

(1 − )100% highest posterior density (HPD) regions
 Posterior is unimodal and right skewed
 Find [pL, pU] such that (1 − )100% area of posterior
density is included and pU − pL is as small as possible
 See Tanner (1996, p. 103-4)
 Key is to sample from posterior distribution
 Use U = 1 − (1 − p)s ~ beta(t + 1, n − t + ˆ /s) relationship
Joshua M. Tebbs and Christopher R. Bilder
18
Example - Ornaghi et al. (1999)

Data
 s = 7 planthoppers per plant
 n = 24 plants
 t = 3 infected plants observed


ˆ  5 2 .4
95% interval estimates for p
Lower
Interval
limit
Wald
-0.0023
VSI
0.0037
Modified Clopper-Pearson 0.0038
Equal-tail
0.0052
HPD
0.0034
Upper
limit
0.0401
0.0465
0.0543
0.0410
0.0373
Joshua M. Tebbs and Christopher R. Bilder
Length
0.0424
0.0428
0.0505
0.0358
0.0339
19
Interval comparisons

Coverage
t
n
s
s (n t )


 I ( n , t , s )   1  1  p 
1  p 


t 1
t 
n 1
C ( p, n , s ) 
1  1  p 
sn
s

 1  1  p  


n
,
where I(n,t,s) = 1 if the interval contains 1 and I(n,t,s) = 0
otherwise.
 Do not consider the t = 0 and t = n cases
 Poor multiple-vector transfer experimental design
 See Swallow (1985, Phytopathology) for guidance in
choosing s
 Brown, Cai, and DasGupta (2001, Statistical Science)
 Frequentist evaluation similar to how Carlin and Louis
(2000) approach evaluating confidence and credible
intervals
Joshua M. Tebbs and Christopher R. Bilder
20
Interval comparisons

 = 0.05, n=40, and s=10

Black line denotes Wald & bold line denotes plot title
0.95
0.85
0.04
0.06
0.08
0.10
0.00
0.02
0.04
0.06
p
p
Equal-tail
HPD
0.08
0.10
0.08
0.10
0.95
0.90
0.85
0.85
0.90
Coverage
0.95
1.00
0.02
1.00
0.00
Coverage
0.90
Coverage
0.95
0.90
0.85
Coverage
1.00
Clopper-Pearson
1.00
VSI
0.00
0.02
0.04
0.06
0.08
0.10
0.00
p
0.02
0.04
0.06
p
Joshua M. Tebbs and Christopher R. Bilder
21
Summary


Best interval: VSI or modified Clopper-Pearson
 Credible intervals may be improved by taking into account
variability of the  estimators
 Bootstrap intervals mentioned in abstract – VSI and
Clopper-Pearson perform better
 Many other intervals could be investigated!
Website
 www.chrisbilder.com/bilder_tebbs
 Contains R programs for examining the interval estimation
properties
 Different values of p, n, and s can be used
 Also calculates empirical Bayes estimators
 Program for Ornaghi et al. (1999) data example
Joshua M. Tebbs and Christopher R. Bilder
22
New interval estimating procedures for the
disease transmission probability in multiplevector transfer designs
Joshua M. Tebbs and Christopher R. Bilder
Department of Statistics
Oklahoma State University
[email protected] and [email protected]
Contact address starting Fall 2003:
Joshua M. Tebbs
Christopher R. Bilder
Department of Statistics
Department of Statistics
Kansas State University
University of Nebraska-Lincoln
[email protected]