Presentation1_final_(3)

Download Report

Transcript Presentation1_final_(3)

Comparative Genomics
Ben
Dan
Deepak
Esha
Kelly
Pramod
Raghav
Smruthy
Vartika
Will
Background Check
Vibrio navarrensis

An aquatic bacterium

First isolated from sewage in Navarra, Spain in 1982

Gram negative

Non-spore forming rods

Motile by means of single polar flagellum
Questions to be Addressed
1. Sixteen strains clustered with V. navarrensis type strain LMG15976
•
•
•
16S rRNA, pyrH, recA and rpoA
Four formed a distinct cluster
V. vulnificus  Closest relative to both lineages of V. navarrensis
“Is it a different species or biotype?”
2. V. navarrensis strains isolated from various sources.
• nav_2423 (VN1) : Blood
• nav_2462 (VN2) : Surface Wound
• nav_2541 (VN3) : Sewage
• nav_2756 (VN4) : Water
“Is Vibrio navarrensis pathogenic?”
Red and blue indicate an available
genome sequence.
Red indicates it was isolated in
blood; blue indicates it was
isolated in an environmental
setting (water or sewage)
75
2421-86
55
08-2466
1397-6T
76
98
48
Vibrio navarrensis LMG 15976T
2544-86
66
2232 or 2541-90
55
2422-86
0053-83
99
99
2578-87
L1
08-2461
48
Vibrio navarrensis
08-2462
38
48
AM 37820
08-2467
99
2462-79
30 60
AM 36848
31
2543-80
99
2481-86
1048-83
2756-81
99
44
54
2538-88
L2
2423-01
Vibrio vulnificus LMG 13545T
Vibrio vulnificus CMCP6
99
99
Vibrio vulnificus
Vibrio vulnificus YJ016
0.01
Concatenated pryH ,recA,rpoA; 16S was not used
Neighbor-joining method , Kimura2P, pairwise deletion and 1000 interior branch tests. 1443 nt total
pyrH (321nt), recA(606 nt), and rpoA(516 nt)
New Species??
Strategy for
Defining/Distinguishing Species
• ANI (average nucleotide identity)
• Robustly assessing phylogenetic relationships between strains
– Supertree approach
– Supermatrix approach
• If there is interest:
– Genes under positive selection (DN/DS)
– Rates of Divergence
Old School Method for
Defining Species
DNA / DNA Hybridization
– Tedious, hard to have good reproducibility
– Coherent group of strains sharing > 70% DDH considered a
species
• Still need to have a phenotype associated with the group
Genomics Approach to DDH
• Developed by Dr. Konstantinidis
– (Konstantinidis and Tiedje et al. IJSEM, 2005)
• We’re employing a modified version of his script for whole genome
ANI comparisons
Original Script:
– Takes two genomes as input
– Parses genomes into 1kb fragments, and uses blastn to find
reciprocal orthologs
– Takes average nucleotide idenity (ANI) for all reciprocal
orthologs for each pair of draft genomes
• Coherent groups sharing
– >95%  Same Species
– <95% to sister group/subgroup  Candidate New Species
Whole Genome Tree
A. First required identification of all
orthologous proteins common to all
strains (should we exclude VN2?)
 Perl script: uses reciprocal blastp,
keeps top hit, >70% length of
reference genes, >40% ID
• Outputs a file that can be used for
interograting presence/absense of
metabolic/virulence genes later on
 OrthoMCL
• Genome
scale
algorithm
for
grouping
orthologous
protein
sequences
B. Align all orthologous genes


Clustal
Muscle
C. Supertree approach


Generally considered more robust and allows further investigation of
HGT
Make separate tree for each gene, find consensus tree
D. Supermatrix approach


Concatenate all alignments
Generate tree
Tree Building Approaches
• Neighbor-joining
– Fast, decently robust when bootstrapping
• Utilizing complex substitution models
– Maximum Likelihood
– Bayesian Analysis
– Computationally demanding, thought to do better with missing
data, generally work better for divergent organisms. To publish
we’ll probably need to generate one of these trees to confirm NJ
topology
Tree Building Software
•
MEGA
– Easy to use GUI
– Not very customizable, but very quick
•
PHYLIP
– Command-line based
– Very customizable
•
PAUP
– Command-line
– Customizable
•
Mr Bayes
– Uses MCMC to generate bayesian trees
– Has >11,000 citations…
Strategy for Defining Species
Draft Genome
Gene
Predictions
Translated
Genes
Custom Script
ANI
Dendrogram
ANI
Identifying
Core
Genome
Dendrogram
Multiple
Alignment
Super
Matrix
Super Tree
MEGA
New Species??
OrthoMCL
PHYLIP
PAUP
Mr
Bayes
Consensus
Tree
ClustalΩ
MUSCLE
PATHOGEN??
Pathogenecity
Challenges:
1. Well known databases and tools are lacking a complete list of virulence
factors.
2. Non-human pathogenic Vibrios are sometimes pathogenic in their marine
hosts. As a result, some non-human pathogenic Vibrios share virulence
factors with the human pathogenic Vibrios.
3. The plasticity of Vibrio genome: Many virulence factors are present in
mobile elements and they can be shared through Horizontal Gene Transfer
(HGT) . Hence, its difficult to draw a line between pathogenic (to humans)
and non-pathogenic Vibrios.
Types of Infection
1. Gastroenteritis
2. Septicemia
3. Wound Infection
Association of Vibrio species with
different clinical symptoms
Vibrio sp.
Wound Infection
Gastroenteritis
Septicemia
Vibrio cholerae O1
**
Vibrio cholerae non O1
*
**
*
Vibrio parahemolyticus
*
**
(*)
Vibrio vulnificus
**
*
**
Vibrio mimicus
(*)
*
(*)
Vibrio alginolyticus
**
*
Vibrio fluvialis
(*)
**
(*)
Photobacterium damsela
**
Grimontia hollisae
(*)
**
(*)
Vibrio furnissi
**
Alivibrio fischeri
Vibrio splendidus
Vibrio harveyi
Vibrio anguillarum
* Less common presentation, ** common presentation, (*) rare presentation
Pathogenic
Potentially Pathogenic Non-Pathogenic
(Daniels et al., 2000)
Genomic Islands
• Discrete DNA segments differing between closely related bacterial strains
 Usually some past or present mobility is attributed.
• Why of our interest??
 Virulence factors are often associated with GEIs!!
• Features of GEIs:
 GEIs are relatively large segments of DNA, usually between 10 and 200 kb
detected by comparisons among closely related strains.
 GEIs may be recognized by nucleotide statistics that usually differ from
the rest of the chromosome, such as
1. GC content
2. Cumulative GC skew
3. Codon usage
 GEIs are often inserted at tRNA genes.
•
GEIs are often flanked by 16–
20bp perfect or almost perfect
direct repeats (DR)
•
GEIs often harbor functional
or cryptic genes encoding
integrases or factors related to
plasmid conjugation systems
or phages involved in GEI
transfer.
•
GEIs often carry insertion
elements or transposons
•
GEIs often carry genes
offering a selective advantage
for host bacteria. According to
their gene content, GEIs are
often
described
as
pathogenicity,
symbiosis,
metabolic,
fitness
or
resistance islands.
General features of GEIs
(Mario Juhas et al., 2009)
Integration, development and excision of GEIs.
(Mario Juhas et al., 2009)
Virulence Factors in Vibrio
Colonization
Immunosuppression
Immunoevasion
Virulence factors
Obtaining nutrition
from host
Entry into and Exit out
of the cell
Original strategy
(proposed by Lee Katz)
• Possible ways
– Discover homologous genes to other Vibrio virulence factors
(esp. V. vulnificus)
– Uncover genes that appear in closely-related pathogenic species
but do not appear in closely-related non-pathogenic species
Sweet spot
V. navarrensis
V. pathogenic
V. nonpathogenic
Distribution of virulence-associated orthologous groups across eleven Vibrionaceae genomes
(Lilburn et al., 2010)
Virulence-Related Protein
Collection
NMPDR
+
VFDB
Literature
Survey
PAI DB
MvirDB
Strategy to Determine Pathogenecity
• Checking for Presence/Absence of
– Toxins
– Adherence factors
• Type IV pilus system
– Secretion systems
– Siderophores
Strategy for Pathogenicity
Annotated
Dataset
Presence
Absence
Existence
of Toxins
Machinery for
Incorporation
(Pili/Adherence
Factors)
Machinery for
Incorporation
(Pili/Adherence
Factors)
Yes
Correlation with
Pathway
(KEGG)
Connecting the dots
Pathogenic
or
Putatively Pathogenic
Potentially Pathogenic
No
Unlikely Pathogenic
Road ahead
• Environment v/s clinical strains comparison
• All v/s All within our nine strains
• Core genes v/s best genes trees
(Morrison et al., 2012)