Transcript Document
Simulated-likelihood-based Inference
for an outbreak of influenza
Marc Baguelin
Health Protection Agency
[email protected]
A few info
Work done at the School of Veterinary Medicine of the
University of Cambridge with the Animal Health Trust in
Newmarket
Funded by the Horserace Betting Levy Board
Now I work at the HPA currently on the modelling of swine
flu in order to inform Public Health policies
Equine influenza
Currently circulating strains : H3N8
(current human strains H3N2 and H1N1 and now H1N1pdm,
H5N1 does not transmit human to human)
H7N7 also exists for horses but no longer circulates (but still
in the vaccine)
Two separate sub-lineages co-circulating
From the modelling point of view, the main difference with
human influenza is the way the population is distributed
EI phylogenic tree
American sub-lineage
Moulton‘98
Leics ‘00
KY ‘98
Nt‘1/93
KY ‘91
Suffolk ‘89
S.American
family
Miami ‘63
European sub-lineage
Nt‘2/93
Why modelling?
To help understand epidemics
Risk assesment
Test different scenarios for vaccine policies
Etc.
The 2003 Newmarket
outbreak
21 training yards involved over ~60
more than 1300 horses at risk (~2500)
The dynamics of the epidemics cannot
be understood with a simple one yard
model as previously for EI: need of a
new model.
The map of the outbreak
SEIR Model
S
E
I
R
Ex: I5 will be the number of infectious horses in yard 5
Latent and infectious periods
Within and between yard transmission rates,
Mixing matrix
i
tij
tij = rate of
transmission from a
horse infected in yard
i to susceptible
horses in yard j
tjj
T= (tij)
j
How to find the mixing matrix?
Depends of the contacts between the horses in the
different yards: very difficult to quantify (shared
facilities, contacts when moving, routes taken for
going to training, vets, air spread…) usually
considered as one spatial and one stochastic part
Assumptions can be made to reduce the number
of parameters to find: necessity of the expertise
from epidemiologists from the fields
Some of the available
data
Number of horses in training for most of the yards with age structure (from
Raceform ‘Horses in training 2003’)
Serological data giving antibodies level for some yards which allow us to
estimate the level of protection of the horses
Geographical location of the yards
Estimation of the proportion of infected horses in each yards at the end of the
epidemic
Date of first detected cases in the infected yards
Trainer questionnaires
But…
Though a huge amount of work done to collect data, few
input for the model:
A lot of quantities have to be averaged
Stochastic (as opposed to deterministic) model means that each run
leads to different output: in that sense, one ‘run’ available
Lack of temporal data
A classical assumption
Model with two-level of mixing λG (global rate; between yards)
and λL (local rate; inside yards).
Number of susceptible
As all the horses are vaccinated, the status of the initial
population is uncertain. Vaccine coverage (though
theoretically 100%) and efficacy in horses is difficult to
predict. Less data than for human + circulation of cross
protecting but heterologous strains.
A statistical model (log regression) has been proposed to
connect the probability that an animal will be infected given
the virus entered its yard (using different variables among
which the AB level)
Combining threshold theorem + the statistical model
average of the risk for the yard from stat model
mean infectious period
Data
Inference method
The likelihood is analytically and numerically impossible to
calculate for each of the pairs (λL, λG)
Very easy to simulate the model
-> use of simulated likelihood to estimate λL, λG
Two approaches
1 Estimate simultaneously the pairs (λL, λG)
2 Estimate first λL and then λG, since the transmission is
mostly locally driven (see 1)
First method
1) Use a grid of (λL, λG)
2) For each values of the grid simulate N realisations of the
epidemic
3) Count anytime the output is close from the real data
(ideally exactly-discrete data)
4) For N sufficiently big, the frequency does approximate the
likelihood
This is ABC with “flat” priors
Result (first method)
Grey: give the final size
(+/- 0.5%)
Black: final size + exact
number of yards
λL = 1.03; λG = 1.5e−2)
for the exponential
distributions and
λL = 0.7; λG = 1.5e−2
for the empirical ones.
Non-regular likelihood
The outbreak is essentially
locally driven
It is possible to have a more efficient estimation for the local transmission by
using the ten yards for which we have the final sizes
As less than 2% (0.63% on average) of cases will come from re-introduction
Second method
Estimating for a grid of local transmission the simulated
likelihood to have simultaneously the exact count for the ten
final sizes as independent sub-epidemic (seeded from
outside) with the number of susceptible as given by the
predicted risk
Second methods: results
The estimated values
of the intra-yard
transmission
λL were 0.78 for the
exponential (grey)
distribution and 0.69
(black) for the
empirical distribution.
Then estimate the global
transmission knowing the local
λG =1.7e−2 for the exponential distribution and 1.6e−2 for
the empirical distribution
Conclusion
When likelihood are difficult to derive analytically and
models easy to simulate, simulated-likelihood-based
methods are an efficient solution
It’s the case in many models of transmission of infectious
diseases
More work has to be done on the methodological side of
this, especially the limits/accuracy of these methods, the
most efficient way of implementing them, model selection
issues and deviations from standard theory due to the
threshold/phase transition behaviour of epidemic models
Acknowledgments
Horserace Betting Levy Board
CIDC
Epidemiology group in AHT (esp. Richard Newton)
Vet School at Cambridge University (esp. James Wood)
Prof Bryan Grenfell from Penn State
Dr Nikolaos Demiris from MRC-BSU, Cambridge, now in
Athens