Transcript ppt
Probes can be designed in an
evolutionary hierarchy
Probes can be designed to be highly redundant to
increase the certainly of identification
The match between clone counts and hybridization intensity
Genomics terminology
• Shotgun cloning: undirected cloning effort where the
entire sample is cloned and sequenced
• Contig - assembled continuous sequence derived from
sequence reads from a single clone
• Scaffold: assembled sequence reads derived from
multiple overlapping clones
• nX coverage: mean number of times a region was
sequenced from independent clones
• Mini-scaffold: scaffold assembled only by paired ends of
overlapping contigs (approx. 1X coverage)
“Environmental Genomics”
Ed Delong
• Shotgun cloning of Megabase fragments
from marine environments
• Probe for those with rRNA gene of interest
• Sequence and use bioinformatics to infer
function
• Used to connect diverse psbA (photosystem
II) genes to known 16S sequence groups
Shotgun cloning and assembly of
enviromental genomes
• Tyson et al 2004, Nature 428:37-43
• Venter et al 2004, Science 304:66-74.
Tyson et al. 2004. Iron Mt. study
• Pink biofilm growing at pH 0.87, which was known to be
composed of 6 rRNA types
• 103,462 sequence reads from shotgun clones provided 10X
coverage for two species (Leptospirillum III and
Ferroplasma II), and 3X coverage for Leptospirillum II.
• Very low polymorphism in Leptospirillum III interpreted
as evidence for a single strain
• Higher polymorphism (2.2%) in Ferroplasma II
interpreted as evidence for 3 strains which show evidence
of past recombination
• A single nitrogen fixer was found (Leptospirillum III
An aside about rRNA
• 16S rRNA sequences of Fer I isolate and
assembled Fer II strains differ by less than
1%.
• The assembled genomes of Fer I and FerII
differ by more than 22% even though gene
order and content appear to be conserved.
Tyson et al. FISH image of biofilm
Yellow (red + green)
Leptospirillum
Green (Eubacterial)
Blue (Archaea)
predominantly
Ferroplasma
Some numbers from the Venter et al. study
• From 200L of filtered sea water, 1.66 million
sequence reads were derived.
• 246 mbp were assembled into 64,398 scaffolds
ranging from 826 bp to 2.6 Mbp
• 170 mbp of miniscaffolds and unpaired reads
• 1.2 million protein-coding genes (10X more than
previously in protein database)
• 69,901 conserved open reading frames with no
assignable function
• 60,000 16S sequences, 148 of which are at least
3% different from previously known sequence
Summary of genes found in
the Sargasso sea survey
Assembly problems
• Most abundant genomes are overrepresented
• Assembled genomes are composites of
different individuals (particularly genomes
with lower coverage)
Estimates of species diversity
• At least 300 species/sample assuming homologous
sequences that are greater than >6% are from different
species
• Using models based on a poisson distribution and 3
different coverage models, estimates of species for the
whole study range from 1800 to 47,000.
• A minimum of 12X greater sequence effort would be
needed to sample 95% of the unique sequence
Population level findings
• Scaffolds with 14X coverage contain about
1 SNP/10,000 bases, and also contain
inserted phage sequences
• SAR 11- like (a previously characterize 16S
type) sequences are abundant but are very
polymorphic
Venter et al. 2004 evidence the the composite genome
represents a population
Other interesting findings
• Species distributions are patchy even in the ocean (e.g.
Burkholderia and Sewanella abunance in sample 1 but not
2)
• Clear copy bias in rRNA gene sequences in favor of beta
and gamma proteobacteria, which typically have two or
more gene copies
• Huge diversity of rhodopsin sequences, possible nonchlorophyll light harvesting?
Estimates of abundance of major groups based on different gene
families
Rhodopsin tree
showing the novelty
of the Sargasso sea
samples (SAR)
Advantages and disadvantages
of environmental genomics
• Avoids PCR and all
the inherent biases
• Not dependent on
rRNA
• Lots of new genes and
new information about
who has them
• Currently way too
expensive for mere
mortals
• Not very efficient if
structure and activity
are the main questions
Hybridization to rRNA for
identification and quantification
• Extract total RNA from sample (no
amplification or cloning!)
• Spot it on a filter
• Probe it with oligonucliotide probes
From MacKay et al. 2002
From MacKay et al. 2002
Landeweert et al 2003
• Wanted to measure competition between two
mycorrhizal fungi: Suillus & Paxillus
• Setup pot inoculations with pine trees and either
fungus or both fungi together
• Measured: total mycelium, PLFAs
• Amplified ITS region with basidiomycete specific
primers
• Compared via DGGE, clone counts, real-time
PCR quantification
Mycelium (white)
of Suillus in coculture with pine
DGGE gel of amplified basidiomycete ITS from soil
Conclusions
• DGGE, Clone counts, and real-time
quantification agree that Suillus ITS
increases as Paxillus decreases
• What have we gained by real-time?
• Quantification of the template, rather than
the amplicons