Transcript slides
• Global Ocean Sampling
project
(GOS)
The Sorcerer II Global ocean
• CAMERA
sampling expedition
• METAREP
Katrine Lekang
Background
• Microorganisms in the world oceans: what
do we know?
– Play an important role in the marine
ecosystem and global biogeochemical cycles.
– 10 million species?
• How can second generation sequencing
techniques contribute?
Craig Venter
• Human genome
• Ocean sampling (GOS)
• Synthetic biology
– Modified microorganisms
Global Ocean Sampling
• The expedition’s goal is to evaluate the microbial
diversity in the world’s oceans using the tools
and teqhniques developed to sequence the
human and other organisms’ genomes.
• They want to increase the knowledge about
microbial diversity and expect that this will help
them understand how ecosystems function and
to discover new genes of ecological and
evolutionary importance
Sorcerer II - expedition
Sampling
• 200-400 liters of
water every 200 miles
• Filtering
Methods
• Total DNA was extracted (0.1-0.8 µm)
• Random insert clone libraries
• End-sequencing of 44 000-420 000 clones
per sample (Sanger sequencing)
Development of new tools
• Fragment recruitment analyses for performing
and visualizing comparative genomic analysis
when a reference sequence is available.
• New assembly techniques that use metadata to
produce assemblies for uncultivated microbial
taxa.
• A whole metagenome comparison tool to
compare entire samples at arbitrary degree og
genetic divergence.
Assembly
• Primary assembly: Celera assembler
– Pairs of mated reads were testet- overlapsingle pseudo-read
– Overlap cut-off 98 % to construct unitigs
– Fragmented
• Second assembly: 94 % cut-off
• Series of assemblies at various
stringencies for subsets of GOS-data
Fragment recruitment
• GOS dataset compared with genomes of
sequenced microbes (NCBI)- 584
reference genomes
• BLAST- 55 % identity
• 70 % of the reads aligned to one or more
genomes.
– Many with large gaps and low identity
• Recruited reads: stringent criteria- 30 % of
the reads
Fragment recruitment analysis
Identification of structural variation
with metagenomic data
• Variations in genome structure
(rearrangements, duplications,
inserions, deletions) can be explored
by fragment recruitment
• Mated sequencing reads- assessing
structural differences between the
reference and environmetntal
sequence.
• Determines the orientation and
distance between two mated
sequencing reads.
• Relative location and orientation of
mated reads →metadata that can be
used to color-code a fragment
recruitment plot.
Fragment recruitment
•
•
All genome structure variations that are large enough to prevent recruitment
can be detected → will be associated with missing mates.
Depending on the type of rearrangement present, other recruitment
metadata categories will be present near the rearrangements’ endpoint →
possible to distinguish among deletions, translocations, inversions and
inverted translocations from the recruitment plots.
Extreme assembly of uncultivated
populations
• Assemblies for abundant, uncultivated microbial
genera
• Assembly apporach that resolves conflict –
”Extreme assembly”
• Do not use matepairing data – contigs
– Assembly artefacts
• Alternative way to an unguided assembly: start
from seed fragments that can be identified as
belonging to a particular taxonomic group.
Fragment recruitment plots
• Investigate variation within a group of
related organisms
• Repeatedly seeding extreme assembly
with fragments mated to a SAR11 like 16S
sequence.
Sample comparisons
• A method that assess the genetic similarity
between two samples that potentially make use
of all portions of the genome, not just the 16S
rRNA region.
• Assembly independent
• Estimate of the fraction of sequence from one
sample that could be considered to be present in
the other sample.
• Whole metagenomic similarities were computed
for all pairs of samples.
Variations in gene abundance
• Differences in gene content between samples
– Can identify functions that reflect the lifestyles of the
community in the context of its local environment.
• Binning of genes into functional categories –
TIGERFRAM hidden Markov models.
• Genes predominately found in a single sample.
• Differences between temperate/tropical samples
• Differences between samples with almost similar
taxonomy
CAMERA
• Community Cyberinfrastructure for Advanced
Marine Microbial Ecology Research and
Analysis
• http://camera.calit2.net/
• A need for a systematic way to explore the
structure and function of ocean ecosystems, and
their impact on global carbon processing and
climate. – Bridge the gap between the rates of
collecting data and interpreting it.
• Monitoring microbial communities in the ocean
and their response to environmental changes.
CAMERA
Metadata
• CAMERA will integrate sequence data with all
available metadata
• Allow researchers to derive correlations between
ecology and environmental conditions that may
favour one community structure or another.
• Future…. Metadata from satelites and weather
stations can be used to help interpret and inform
us on how these factors affect microbial
processes as well as community composition.
New generation Bioinformatics tools
• Combine
bioinformatical
tools with largescale compute
resources
METAREP
• JCVI Metagenomics Report
• http://www.jcvi.org/metarep/#
• Analyze and compare annotated metagenomics
datasets
• Solr/Lucene search engine
• SQL-like query syntax- filter and refine datasets
• Functional classification, GO, NCBI taxonomy
• Statistical tests
• Analyze function in the context of phylogeny
Web analysis features