4. Vibrio and Annotation

Download Report

Transcript 4. Vibrio and Annotation

The Global Evolution and
Adaptation of Vibrio cholerae
Across Multiple
Niche Dimensions
Flinders
2015
Rob
Edwards
How to annotate a couple
of hundred genomes
Flinders
2015
Rob
Edwards
Annotation of microbial genomes
and comparison across differences

Cholerae

Haiti

Genome Sequencing

ORF Calling

Annotation

Global evolution

Niche dimensions
Cholera is caused by Vibrio cholerae
A world wide pandemic
About 3-5 million cases per year
About 100 - 200,000 deaths world
wide per year
Notable deaths:

Tchaikovsky
Symptoms
About 75% of patients have no symptoms
25-50 PINTS of diarrhea per DAY
Severe symptoms are by dehydration
Treatment
Clean water
Electrolytes
Vaccine
Not antibiotics
Multiple Pandemics
1st – 1817 to 1823 Started at the Ganges,
spread by colonialists
2nd – 1829 to 1849 Worldwide spread via
immigrants
3rd – 1852 to 1859 John Snow first
epidemiologist
First epidemiological study
John Snow
Portrait painted in
1847 when he was
34 years old.
First epidemiological study
John Snow
Cholera outbreak in Soho, London 1854
Plotted all cases on a map
Found big cluster around water well
First epidemiological study
John Snow’s
Map
On the mode of
communication of
Cholera
1854
Cholera caused by
bacteria
Outbreaks of cholera
Multiple Pandemics
1st – 1817 to 1823 Started at the Ganges,
spread by colonialists
2nd – 1829 to 1849 Worldwide spread via
immigrants
3rd – 1852 to 1859 John Snow first
epidemiologist
4th – 1863 to 1879 Originated in mecca
5th – 1881 to 1896 First cholerae vaccine (1892)
6th – 1899 to 1923 Killed 800,000 people
7th – 1961 to present
Haitian Outbreak
Earthquake Jan 12th, 2010
No cholera in Haiti for > 50 years
First case, October 22nd, 2010
By February, 2011 250,000 cases and ~5,000
deaths
What was the original source?
Haitian cholera outbreaks
http://www.ph.ucla.edu/
Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in
Cases by day – Mirebalais Hospital
Cases by Age – St Marc Hospital
On October 20th, 2010
Haitian Outbreak
Two hypotheses:
Endemic, waterborne strain that has been in Haiti
but not caused disease for 50 years
Imported from another country
The environmental hypothesis
"They have been fortunate in Haiti that for 50
years the conditions have been such that they
haven’t had an intense increase in cholera
bacterial populations. ... But they’ve had an
earthquake, they’ve had destruction, they’ve had
a hurricane ... I think it’s very unfortunate to look
for a scapegoat. It is an environmental
phenomenon that is involved”
Rita Colwell
Johns Hopkins School of Public Health
The human hypothesis
“The organism that is causing the disease is very
uncharacteristic of (Haiti and the Caribbean), and
is quite characteristic of the region from where
the soldiers in the base came. ... I don't see there
is any way to avoid the conclusion that an
unfortunate and presumably accidental
introduction of the organism occurred."
John Mekalanos
Harvard Medical School
Conditions favor human hypothesis
Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in
Conditions favor human hypothesis
Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in
Conditions favor human hypothesis
Source: Final Report of the Independent Panel of Experts on the Cholera Outbreak in
Global evolution of Vibrio
Can
Which
we use
gene(s)
genomics
are important
to identifyforthe
global
temporal/spatial
evolution of Vibrio?
variation?
Prototype
Vibrio cholerae
sequence
TIGR
Nature 406, 477-483(3 August 2000)
Sequenced genomes
2011 – 32 Vibrio strains
sequenced
Fabiano Thompson's Lab @ UFRJ
Fundação Oswaldo Cruz
Ion quality scores
2011 – 171 Vibrio strains sequenced
Sequenced genomes
2011 – 32 Vibrio strains
sequenced
How do you analyze 250+ genomes?
The steps in genome sequencing

Generate genome sequence

Assembly

ORF calling

tRNA identification

rRNA identification

Functional annotation
www.sigmaaldrich.com
Putative protein

Open Reading Frame (ORF)


Coding Sequence (CDS)


An ORF that could encode a protein
Hypothetical protein = putative protein


An ORF that could encode a protein
Protein encoding gene (PEG)


A stretch of amino acids with no stop codon
Something that has not been experimentally shown
Polypeptide
Reads per chromosome (Chr. I)
Reads per chromosome (Chr. II)
Cholera Toxin Phage
Assembly
ORF Calling
Annotation
Annotated Vibrio using RAST
Single nucleotide polymorphisms
ATCATCGATCAGCATGCATCAGCATCGATCAGC
ATCATCGATCAGCATGCATCAGCATCGATCAGC
ATCATCGATCAGCATGCATCAGCCTCGATCAGC
ATCATCGATCAGCATGCATCAGCCTCGATCAGC
ATCATCGATCAGCAAGCATCAGCCTCGATCAGC
ATCATCGATCAGCAAGCATCAGCCTCGATCAGC
ATCATCGATCAGCAAGCATCAGCCTCGATCAGC
ATCATCGATCAGCAAGCATCAGCCTCGAGCAGC
ATCATCGATCAGCAAGCATCAGCCTCGAGCAGC
Global evolution
Mutreja et al 2011
Waves of spread of cholera
Mutreja et al 2011
Different evolution for each wave
Mutreja et al 2011
On the source of Haitian cholera
Harveyi
Parahemolyticus
Mimicus
Cholerae
Vibrio cholerae from Bangladesh in 1994
Vibrio cholerae from Haiti in 2010
Vibrio cholerae from Bangladesh in 2002
Vibrio cholerae from Haiti in 2010
Vibrio cholerae from Haiti in 2010
Nepalese soldiers?
Outbreak in Khatmandu, Nepal before the soldiers left
Outbreaks downstream (not upstream) along the river
from the nepalese UN camp
But that could have come from river trade. Ships used
to fly the yellow flag when they were quarantined by
cholera
Haitian cholera outbreaks
http://www.ph.ucla.edu/
Evolution not only by SNPs
Conservation of the ~120 kb superintegron region across 210 strains
Horizontal gene transfer
versus
Vertical evolution
Mother
SNPs
Daughter
Daughter
HGT
Niche dimensions




210 Vibrio genomes
Reassembled
Reannotated
Find interesting
genes!

Year

Continent

Country

Lat/Lon Coordinates

Clinical or
Environmental

Source

Serogroup

Serotype
V. cholerae classification
Vibrio cholerae
Cholera toxin
Serogroup
Biotype
Serotype
O1
Non-cholera toxin
O139
Classical
El Tor
Ogawa
Inaba
Epidemics
No disease
Response variables



15,000 genes in the pangenome
933 subsystems (pathways) present in at least
one genome
SNPs (after Mutreja)
Analysis
Recreate evolution of the Vibrios
What are the important genes for each niche
dimension
Who, what, when, where!
Use random forests to identify
important variables
Random Forest
O-antigen
Exopolysaccharide
Capsule
Sialic Acid
DNA
recomb.
01
10
20
5
10
01
10
20
5
10
01
10
20
5
10
0139
100
1
8
10
0139
100
1
8
10
0139
100
1
8
10
0139
100
1
8
10
Random Forest
Exopolysaccharide
<50
DNA-
O1
recombination
Capsule
10
O139
<10
O1
O139
O1
O139
Random Forest
Each tree
votes on
the
importance
of each
variable.
Typically,
run 10,000
trees
Response
variables
and niche
dimensions
Genes important for who ?
(serogroup)
Genes important for what?
(clinical, environmental, ...)
Genes important for where?
(continent)
Separation of functions by continent
Genes important for when?
(year)
DNA Repair
DNA repair & phages
Normal DNA repair
(134 strains)
Additional DNA repair
(4 strains; not O1)
Phage borne DNA repair
(72 strains)
umuC
umuD
umuC
umuD
prophage
umuC
Different evolution for each wave
Waves 2 & 3 have phage
Interrupted repair
Mutreja et al 2011
Conclusions

Unraveling evolution and spread of new
pathogens

Mining genomes and niche dimensions

Don't get scooped!
Multi-genome projects??
Current multigenome projects
Organism
Number
Organism
Number
S. pyogenes
3,615 Mycobacterium
tuberculosis
390
S. pneumoniae
3,085 Salmonella in cattle
and humans
373
Rice (Oryza sativa)
3,000 Vibrio
274
C. elegans
2,007 Shigella sonnei
263
Clostridium difficile
1,250 Mycobacterium
tuberculosis
259
The thousand
(human) genome
project
1,092 Streptococcus
pneumoniae
240
Mycobacterium
tuberculosis
1,000 Methicillin-resistant
Staphylococcus
aereus
193
Plasmodium
falciparum
825 Campylobacter
jejuni
192
Streptococcus
pneumoniae
616 Mycobacterium
abscessus in CF
170
Nick Loman: http://lab.loman.net/