Society for General Microbiology 2006 - Edwards @ SDSU

Download Report

Transcript Society for General Microbiology 2006 - Edwards @ SDSU

SGM Meeting, Warwick, April 2006
Challenges for metagenomic data analysis and
lessons from viral metagenomes
[What would you do if sequencing were free?]
Rob Edwards
http://phage.sdsu.edu/~rob
San Diego State University
Fellowship for Interpretation of Genomes
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in evolution?
• Is there a Future?
This is all 454 sequence data
• 21 libraries
– 10 microbial, 11 phage
• 597,340,328 bp total
– 20% of the human genome
– 50% of all complete and partial microbial genomes
• 5,769,035 sequences
– Average 274,716 per library
• Average read length 103.5 bp
– Av. read length has not increased in 7 months
• Cost 0.04¢ per bp
Sequencing is cheap and easy.
Bioinformatics is neither.
The Soudan Mine, Minnesota
Red Stuff
Black Stuff
Oxidized
Reduced
Red and Black Samples Are Different
Black stuff
Cloned and 454 sequenced
16S are indistinguishable
Cloned
Red
Red
There are different amounts of
metabolism in each environment
There are different amounts of
substrates in each environment
Red
Stuff
Black
Stuff
But are the differences significant?
• Sample 10,000 proteins from site 1
• Count frequency of each “subsystem”
• Repeat 20,000 times
• Repeat for sample 2
• Combine both samples
• Sample 10,000 proteins 20,000 times
• Build 95% CI
• Compare medians from sites 1 and 2 with 95% CI
Rodriguez-Brito (2006). BMC Bioinformatics
Subsystem differences & metabolism
Iron acquisition
Black Stuff
Siderophore enterobactin biosynthesis
ferric enterobactin transport
ABC transporter ferrichrome
ABC transporter heme
Black stuff: ferrous iron
(Fe2+, ferroan [(Mg,Fe)6(Si,Al)4O10(OH)8])
Red stuff: ferric iron
(goethite [FeO(OH)])
Nitrification differentiates the samples
Edwards (2006)
BMC Genomics
The challenge is explaining the
differences between samples
Red Sample
Arg, Trp, His
Ubiquinone
FA oxidation
Chemotaxis, Flagella
Methylglyoxal metabolism
Black Sample
Ile, Leu, Val
Siderophores
Glycerolipids
NiFe hydrogenase
Phenylpropionate
degradation
We can cheaply compare the important
biochemistry happening in different environments
We don’t care which organisms are doing the
metabolism but we know what organisms are there
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in evolution?
• Is there a Future?
Why Phages?
• Phages are viruses that infect bacteria
– 10:1 ratio of phages:bacteria
– 1031 phages on the planet
• Specific interactions (probably)
– one virus : one host
• Small genome size
– Higher coverage
• Horizontal gene transfer
– 1025-1028 bp DNA per year in the oceans
• Can’t do fosmids
Phages In The Worlds Oceans
ARC
56 samples
16 sites
1 year
BBC
85 samples
38 sites
8 years
LI
4 sites
1 year
GOM
41 samples
13 sites
5 years
SAR
1 sample
1 site
1 year
Most Marine Phage Sequences are Novel
Phages are specific to environments
ssDNA
-like
Phage
Proteomic
Tree v. 5
(Edwards, Rohwer)
T4-like
T7-like
Thanks:
Mya Breitbart
Marine Single-Stranded DNA Viruses
•
6% of SAR sequences ssDNA phage (Chlamydia-like
Microviridae)
•
40% viral particles in SAR are ssDNA phage
•
Several full-genome sequences were recovered via de novo
assembly of these fragments
•
Confirmed by PCR and sequencing
SAR Aligned Against the Chlamydia 4
Individual
sequence
reads
1033
Coverage
Concatenated
hits
0
389
0 bp
Chlamydia phi 4
genome
4490 bp
12,297 sequence fragments hit using TBLASTX
over a ~4.5 kb genome
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in evolution?
• Is there a Future?
Phages, Reefs, and Human Disturbance
Phages, Reefs, and Human Disturbance
Kingman
Palmyra
Washington
Fanning
Christmas
The Northern Line Islands
Expedition, 2005
Christmas to Kingman Bias in No. Phage Hosts
Negative numbers mean relatively more phage hosts at Kingman
More photosynthesis at Kingman.
No people at Kingman.
More pathogens at Christmas.
More people at Christmas.
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in evolution?
• Is there a Future?
Phages enrich for important genes
Rios Mesquites Stromatolites
• No photosynthesis genes in
phages
Pozas Azules Stromatolites
• 5 different photosynthesis
genes in phages
RNR is the most successful reaction in evolution
Outline
• The envy is not mine
• A tour around the world, thanks to phage
• People suck
• What is the most successful gene in evolution?
• Is there a Future?
Computational Challenges
• Sequence annotations and analysis
– What is there?
– What is it doing?
– How is it doing it?
• Gene predictions in unknowns
– Lutz Krause (Bielefeld)
• Sequence comparisons
– BLAST
– Other ways to rapidly compare short sequences
– What happens when everyone is using 454 sequencing?
Sequence data from 21 libraries
600 million bp
6 million sequences
• Each BLASTX search takes 1,000 CPU hours
• 21 libraries = 21,000 CPU hours or 2.4 CPU years
• Users want
• repeat runs,
• TBLASTX,
• more analysis
• more data
• more, more, more, more
SDSU
Forest Rohwer
Beltran Rodriguez-Brito
USF
Mya Breitbart
Rohwer Lab
Linda Wegley
Florent Angly
Matt Haynes
ANL
Rick Stevens
Bob Olsen
CI Support
FIG
Veronika Vonstein
Ross Overbeek
Annotators
Also at SDSU
Anca Segall
Stanley Maloy
UBC
Curtis Suttle
Amy Chan
Stromatolites
Janet Seifert Rice University)
MIT:
Valeria Souza (UNAM, Mexico)
Ed DeLong
Math Guys@SDSU
Peter Salamon
Joe Mahaffy
James Nulton
Ben Felts
David Bangor
Steve Rayhawk
Jennifer Mueller