SMBE3 - Daniel Wilson

Download Report

Transcript SMBE3 - Daniel Wilson

TOWARDS TESTING THE EPIDEMIC CLONE MODEL OF BACTERIAL PATHOGENS
Daniel J. Wilson, Gilean A.T. McVean and Martin C.J. Maiden
Peter Medawar Building for Pathogen Research and
Departments of Statistics and Zoology, Oxford University
Overview
First the starting sequence is chosen from a distribution based on observed codon usage.
Neisseria meningitidis is the causal agent of meningococcal meningitis and septicaemia, yet it is
found in up to 10% of healthy individuals as an asymptomatic commensal organism of the
nasopharynx. Sporadic epidemics of virulent or hypervirulent strains are thought to contribute
little to the long-term persistence of the pathogen.
A coalescent tree is then simulated (Hudson 1990), and the sequence mutated down the tree
according to a model, the parameters of which are estimated from the observed data.
Finally the test statistic is computed for the simulated data.
When all 30,000 runs are complete, the distribution of values of the test statistic is compared to the
observed value to determine whether the model plausibly describes the observed data.
Codon frequencies The distribution of the starting codon frequencies were estimated using the
observed codon usage patterns in the MLST data in a Bayesian manner. The mean marginal codon
usage from the posterior distribution is shown in Figure 3.
Scanning electron micrograph of Neisseria meningitidis taken from http://www.sanger.ac.uk/Projects/N_meningitidis/
Population structure is found in the form of significant association between loci, despite
relatively high rates of recombination. The epidemic clone hypothesis posits that this is due to
recent, explosive increases in groups of closely related individuals.
However, in a finite population some degree of structuring is expected because of the stochastic
nature of the evolutionary process. To test this simpler explanation, we perform coalescent
simulations of seven housekeeping genes in N. meningitidis, modelling functional constraint as a
form of mutational bias.
Figure 3 Codon frequencies estimated from the data.
Using the number of unique sequences (haplotypes) as a test statistic, we reject the null
hypothesis (p<0.00004), showing that genetic diversity is too clustered: a finding consistent with
the epidemic clone hypothesis.
Model of mutational bias Under-representation of, for example, non-synonymous changes in the
sequence data can be modelled as mutational bias rather than purifying selection. Confounding
functional constraint in this way allows coalescent simulations of neutral evolution to be performed.
Introduction
The model was parameterised as follows:
Jolley et al (2000) sampled 218 isolates of Neisseria meningitidis from asymptomatic carriers in the
Czech Republic during 1993. They characterised seven housekeeping genes in each of the isolates
using multi locus sequence typing (MLST) (Maiden et al 1998), yielding complete nucleotide sequences
of gene fragments some 400-500 base pairs in length.
Number of segregating sites
Rate of occurrence
Synonymous transversion

Synonymous transition

Non-synonymous transversion

Non-synonymous transition

Interpretation

Transition-transversion ratio

Proportion of non-synonymous
mutations that are viable
Tajima's D
Locus
MLST fragment
length (bp)
Synonymous
Non-synonymous
Total
Number of unique
sequences
abcZ
adk
aroE
fumC
gdh
pdhC
pgm
433
465
490
465
501
480
450
63
23
86
45
23
68
59
12
2
49
3
3
15
22
75
25
135
48
26
83
81
21
19
21
29
19
25
25
1.15
0.817
0.926
0.328
1.355
1.433
0.811
All loci
3284
367
106
473
89
1.101
Function
(from Jolley et al in
press)
Putative ABC transporter
Adenylate kinase
Shikimate dehydrogenase
Fumarate hydratase
Glucose-6-phosphate dehydrogenase
Pyruvate dehydrogenase subunit C
Phophoglucomutase
Estimates of μ, κ and ω were obtained by the method of maximum likelihood on the assumption that
codons were independently and identically distributed, that the number of mutations in the genealogy
was Poisson distributed, and that the probability of having more than one mutation at a nucleotide in
the genealogy was negligible.
Recombination Jolley et al (in press) estimated the rate of recombination to equal 0.94 times the rate
of mutation, and the mean tract length of a recombination fragment to be 1.1 kilobases in length.
The first step in constructing models of the epidemiological process is to determine whether the
signature of evolutionary processes can be detected in the data. In other words, is it possible to outright
reject a null hypothesis in which nothing interesting is happening? Simple summary statistics such as
Tajima’s D (Tajima 1989) were unable to reject this type of null hypothesis (Jolley et al in press), so it is
to coalescent simulations that we turned.
Results and Conclusions
Figure 1 shows a caricature of what the topology of a gene tree might look like in the case of (a) a
neutral and (b) an epidemic clonal model of meningococcal evolution. The red branches indicate a
recent expansion of a particular complex of closely-related clones.
Figure 4 shows the distribution of the test statistic (number of haplotypes) simulated under 30,000 runs
of the null model. The median is 126, with range 97-154. The observed number of haplotypes in the
Czech MLST data was 89, outside the range of the simulated values. Thus the null hypothesis can be
overwhelmingly rejected (p<0.00004).
a
The rates of synonymous transversion, synonymous transition, non-synonymous transversion and nonsynonymous transition were estimated (in units of 103 Ne generations) at 3.32, 19.4, 0.86 and 5.06
respectively (μ=3.32, κ=5.85 and ω=0.26).
b
0.03
Methods
0.01
The steps involved in testing the null hypothesis of meningococcal evolution can be summarised in
Figure 2.
0.00
Figure 1 Caricatures of gene trees under the neutral and epidemic clonal hypotheses.
0.02
Probability density
0.04
0.05
Number of unique haplotypes
Simulation
80
Real Data
100
120
140
160
Figure 4 Simulated distribution of the test statistic. Arrow indicates observed value.
1
2
Starting sequence
Mutational model
Choose codons at random
from the observed
distribution of codon usage
Estimate evolutionary parameters
from the observed data
Using coalescent simulations it has been possible to reject the null hypothesis of neutral evolution with
functional constraint. Our method has detected a strong signal of evolutionary forces consistent with
the epidemic clone model, something that Tajima’s D did not have sufficient power to achieve.
The next step will be to incorporate more sophisticated hypotheses, such as the clonal epidemic model,
into the coalescent framework. Parameterisation of such models in terms of epidemiological and
evolutionary forces, and estimation of those parameters from empirical data, will exploit these efficient
methods of inference to address important problems pertaining to bacterial population biology.
Acknowledgments
3
Evolved sequences
Statistically test for differences
between simulated and observed
patterns of variation.
Figure 2 Summary of testing the null hypothesis of meningococcal evolution.
Thanks go to Chris Spencer, Graham Coop, Jonathan Marchini and the BBSRC for funding. St. John’s
College, Oxford kindly provided travel expenses.
Cited References
Hudson, R.R. (1990) Oxf. Surv. Evol. Biol. 7: 1-44
Jolley, K.A. et al (2000) J. Clin. Microbiol. 38: 4492-4498
www.medawar.ox.ac.uk
www.stats.ox.ac.uk/mathgen
Jolley, K.A. et al (in press)
Maiden, M.C.J. et al (1998) Proc. Natl. Acad. Sci. USA 95: 3140-3145
Tajima, F. (1989) Genetics 123: 585-595
www.neisseria.org