Transcript Methods
Viral Genomics
Allie Evans
Colin Lappala
Chelsea Layes
Sheena Scroggins
The Sorcerer II Global Ocean Sampling Expedition: Northwest Atlantic through
Eastern Tropical Pacific
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, et al.
PLoS Biology Vol. 5, No. 3, e77 doi:10.1371/journal.pbio.0050077
The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of
Protein Families
Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, et al.
PLoS Biology Vol. 5, No. 3, e16 doi:10.1371/journal.pbio.0050016
The Sorcerer II Global Ocean Sampling Expedition: Metagenomic
Characterization of Viruses within Aquatic Microbial Samples
Shannon J. Williamson, Douglas B. Rusch, Shibu Yooseph, Aaron L. Halpern, Karla B. Heidelberg, John I.
Glass, Cynthia Andrews-Pfannkoch, Douglas Fadrosh, Christopher S. Miller, Granger Sutton, Marvin
Frazier, J. Craig Venter
Baltimore Classification of
Viruses
•
•
•
•
•
•
•
dsDNA
ssDNA
dsRNA
+ssRNA
-ssRNA
ssRNA-RT
dsDNA-RT
http://upload.wikimedia.org/wikipedia/en/thumb/0/07/Baltimore_Classification.png/720px-Baltimore_Classification.png
Bacteriophages
• Viruses that infect bacteria
• Numerically dominant type of phage in
oceans.
http://www.scienceclarified.com/images/uesc_02_img0070.jpg
Cyanophages
• Prochlorococcus
• Viruses have acquired and retained
photosynthesis gene
http://web.mit.edu/mbsulli/www/NATL2A-40-group-cropped.jpg
Phage Cycles
Lateral gene transfer
l
http://upload.wikimedia.org/wikipedia/commons/thumb/4/42/Transduction_(genetics).svg/800px-Transduction_(genetics).svg.png
Metagenomics
• Contribution of viral genomes to microbial
environmental processes studied through
metagenomic techniques.
• Metagenomics enables us to study
microorganisms by examining DNA that is
extracted directly from communities of
environmental microorganisms
http://camera.calit2.net/metagenomics/what-is-metagenomics.php
Metagenomic Challenges
• Inefficiencies in
sampling
• DNA extraction
methods
• Construction of
libraries
• Inadequacies in
data analysis and
visualization tools
• Low abundance
species overlooked
• Lack of reference
genomes
• Sequencing
complex
environments cost
prohibitive
• Standardizing
metadata
Methods
First:
• Cruise the world
• Collect 90-200 L of seawater
from each of 37 different stations
• Record pH, salinity, temperature,
etc. of water
Methods
• Pass water through 2.0, 0.8, 0.1
µm filters, TFF to 50Kda for viral
concentrate
• Store at -20°C until shipment from
next port
Sequencing Preparation
• Extract DNA
• Nebulize DNA
– Average of 1.0-2.2 kb fragments
• Gel electrophoresis extraction
– purify and determine lengths
• Subclone into E. coli
• Colonies selected for inserts
• Shotgun sequence inserts
Sequencing
• End sequence each insert
– Average of 822 bp sequenced per end
www.pasteur.fr/recherche/genopole/PF8/equipement_en.htmlnopole/PF8/equipement_en.html
Metagenomic Assembly
• Same procedure as in humans, Drosophila,
dogs, etc.
Unitigs using 98% or 94%
homology for overlap
Scaffolding
Consensus sequence
Venter et al. (2001)
Metagenomic Assembly
New uses for shotgun sequencing and assembly
• Multiple organisms at once
• Likely novel organisms
Problems?
•
•
Mate-pair data relied on more heavily, since overlap coverage is
low or unknown
Need verification of assembly somehow
Metagenomic Assembly
• Created multiple distinct assemblies
– 98% homology unitigs
– 94% homology unitigs
– non-preassembled end-pairs at various stringencies
for multiple sequence alignments
• Multiple assemblies allowed cross-referencing,
quality assurance.
Taxonomic Assignment
Protein-ORF based strategy
• 5.6 million sequences from GOS
• All ORFs in same sequence scaffold compared to
NCBI protein database using BLAST
• Votes tallied from each ORF into pools for scaffold
• Archea, Bacteria, Eukaryota, Viral
• 5.0 million sequence assigned using this method
Quantitative PCR
How many copies of studied proteins exist:
• from station to station?
• versus one another?
http://www.invitrogen.com/c
ontent.cfm?pageid=10037
Quantitative PCR
• Level of fluorescence checked after each PCR cycle
• Initial amount can be inferred using standard curve
• Multiple dilutions allow comparison
- Outcome reported only if:
-- Ten-fold above no-template negative control
AND
-- 10-2 dilution results in 3-30 more than 10-3 dilution
http://www.invitrogen.com/c
ontent.cfm?pageid=10037
Clustering and Phylogeny
• Proteins clustered and compared to NCBI
– Sequence alignments, not just domains
– Gene families bolstered with new genes
• Phylogeny trees generated
– Multiple sequence alignments CLUSTALW
– Used only long, fairly homologous samples
• PHYLIP used to build trees
– Based on difference matrix
Results
• 37 marine surface water samples collected
• 7.7 million sequencing reads were
produced
• Identified 154,662 viral peptide sequences
Identification of Viral Sequences
• Data from microbial fraction of water samples was
examined
• Looked for viral sequences by comparison to the NCBI
non-redundant protein database
• 154,662 viral peptide sequences were identified
• Approximately 3% of predicted proteins were identified
as viral sequences
• Number of viral sequences thought to be largely
underestimated
Classification through Protein
Clustering
• Of 154,662 viral peptide sequences, 117,123 or
76% fell within 380 protein clusters containing at
least 20 proteins
• Remaining sequences fell within clusters
containing less than 20 proteins
• Average cluster size contained 258 peptide
sequences
Neighbor Functional Linkage
Analysis
• Used to verify that they were on viral instead of pro-viral
regions of bacterial genomes
• Proportion of viral same-scaffold ORFs range from 32%
to 92% for the metabolic gene families studied
• Occurrence of viral neighbors on same scaffolds as hostderived viral genes supports hypothesis that sources of
the sequences are viruses rather than bacterial
Quantitative PCR
• qPCR used on DNA collected from 5 sampling locations
• Yields were initially too low, so samples were pooled
• Viral gene families psbD, petE, speD, talC, pstS, and
phoH were included
• Results indicate that host-derived viral genes are viral in
nature
• Viral genes encoding environmentally significant hostspecific functions are prevalent in aquatic samples
Phylogenetic Analyses
Figure 2. Phylogenetic trees of all GOS and publicly available psbA(A) and psbD(B) sequences. BS indicates
bootstrap values. GOS and public
viral sequences are colored aqua and pink respectively. GOS and public prokaryotic sequences are navy blue and
lime green respectively.
doi:10.1371/journal.pone.0001456.g002
Figure 3. Phylogenetic trees of all GOS and publicly available pstS(A) and talC(B) sequences. BS indicates bootstrap values. GOS
and public viral
sequences are colored aqua and pink respectively. GOS and public prokaryotic sequences are navy blue and lime green
respectively. GOS eukaryotic
sequences are colored yellow.
doi:10.1371/journal.pone.0001456.g003
All viral gene families were positively correlated with water temperature
Some viral gene families were correlated with salinity, water depth, and calculated
trophic status indices
Different environmental pressures may influence acquisition of these genes by viruses
Table S7 shows the correlations between viral gene families and environmental
parameters
Discussion
• Most studies have focused on the filtered viral fraction of the data
• This is the first study to focus on the viral components in the
microbial fraction of the data
• Strong evidence for abundance and distribution of environmentally
important host-derived viral gene families
• Distribution patterns of host-derived viral families over environmental
gradients
• Evidence of interactions between bacteriophage and host organisms
Detection of Viruses in
Mircrobial Data
• Large viruses (0.1 µm–0.22 µm) get caught in the filters
because of their size and geometric shape
• Small free living phages flow through the filter, but when
viruses physically interacting with the microbes will be
caught along with the microbes
• When filtrating large volumes, biomass accumulates on
the filter and viruses get caught
• Most viruses found within the aquatic microbial
communities studies seemed to be in the lytic infection
cycle therefore they were actively replicating their DNA
Viruses with Metabolic Genes
• Through lateral gene transfer, metabolic genes can be acquired from
the host
• Acquisition, retention, and expression of metabolic genes may
increase fitness
• Key metabolic processes and pathways running during infection
allows maximum replication
• Previous studies on host-derived metabolic viral genes has been on
the photosynthesis genes psbA and psbD of a cyanophage
• Previous studies did not focus on abundance or distribution of these
genes in the oceans
Host-Derived Metabolic
Gene Families
• In aquatic viral communities sampled, host-derived genes
were found widely distributed in significant proportions
• Quantitative PCR of the these genes confirmed high
abundance
• Not known if these genes were expressed at the time of
sampling
• Unlikely to see these genes in high abundance if they:
– Were not expressed
– Did not have a fitness advantage
“Suggests that viruses may play a
more substantial role in
environmentally relevant metabolic
processes than previously
recognized such as the conversion
of light to energy, photoadaptation,
phosphate acquisition, and carbon
metabolism”
Potential Evolutionary Viral-Host
Relationships
• The study of the cyanophage found that the hostderived genes undergo higher mutation rates than
their cyanobacterial nucleotide counterpart
• After phage acquisition, the genes could diversify
• Mutated viral genes could form gene reservoirs for
the host
• Through horizontal gene transfer, viruses could
promote diversity and distribution
Prochlorococcus –
P-SSM4-like Phage
• Prochlorococcus is one of
the most widespread
picophytoplankton in the
ocean
• P-SSM4-like phage may
influence the abundance,
diversity, and distribution of
Prochlorococcus
• Statistically significant
relationship between the
Prochlorococcus and the
P-SSM4-like phage
Metagenomic Viral-Microbial
Interactions
• This study of viral-microbial association between
communities was coincidental
• Horizontal transfer of metabolic genes
• More studies necessary on the viral-microbial
diversity and genetic complement
– Community relationships
– Evolutionary relationships
Any Questions?