PPT - Larry Smarr

Download Report

Transcript PPT - Larry Smarr

The Emerging Global Community of
Microbial Metagenomics Researchers
Opening Talk
Metagenomics 2007
Calit2@UCSD
July 11, 2007
Dr. Larry Smarr
Director, California Institute for Telecommunications and
Information Technology
Harry E. Gruber Professor,
Dept. of Computer Science and Engineering
Jacobs School of Engineering, UCSD
Abstract
Calit2, the J. Craig Venter Institute, and UCSD's SDSC and Scripps Institution of
Oceanography, is creating a metagenomic Community Cyberinfrastructure for
Advanced Marine Microbial Ecology Research and Analysis (CAMERA), funded
by the Gordon and Betty Moore Foundation. The CAMERA computational and
storage cluster, which contains multiple ocean microbial metagenomic
datasets, as well as the full genomes of ~166 marine microbes, is actively in
use. End users can access the metagenomic data either via the web or over
novel dedicated 10 Gb/s light paths (termed "lambdas") through the National
LambdaRail. The end user clusters are reconfigured as "OptIPortals," providing
the end user with local scalable visualization, computing, and storage.
Currently over 1000 users from over 40 countries are CAMERA registered users,
with over a dozen remote OptIPortal sites becoming active. This CAMERA
connected community sets the stage for creating a software system to support
a social network of metagenomic researchers--a "MySpace" for scientists. We
look forward to gathering ideas from Metagenomics 2007 participants for the
functional requirements of such a system.
Calit2 Brings Computer Scientists and Engineers
Together with Biomedical Researchers
• Some Areas of Concentration:
– Algorithmic and System Biology
– Bioinformatics
– Metagenomics
– Cancer Genomics
– Human Genomic Variation and Disease
– Proteomics
– Mitochondrial Evolution
– Computational Biology
– Multi-Scale Cellular Imaging
UC Irvine
National Biomedical
Computation Resource
an NIH supported resource center
– Information Theory and Biological Systems
– Telemedicine
Southern California Telemedicine Learning Center (TLC)
UC Irvine
Philip
Papadopoulos,
SDSC/Calit2
2pm Friday
Paul Gilna Ex. Dir.
PI Larry Smarr
Announced January 17, 2006
$24.5M Over Seven Years
CAMERA 1.1 is Up and Running!
CAMERA Combines Genomic
and Metagenomic Tools
Can We Create a “My Space” for Science Researchers?
Microbial Metagenomics as a Cyber-Community
Over 1000 Registered Users From 45 Countries
70 CAMERA Users
Feedback Session
Friday 2pm
Paul Gilna
USA
United Kingdom
Canada
France
Germany
583
46
35
35
32
•
Calit2 is Prototyping
Social Networks for
Reseachers
•
Research
Intelligence Project
– ri.calit2.net
•
Add in:
–
–
–
–
–
MyProteins
MyMicrobes
MyEnvironments
MyPapers
MyGenomes
Emerging Capabilities That Tie Together
Metagenomics Researchers
• Advanced Computing Techniques
• Broad Coverage of Complete Microbe Genomes
– Moore Foundation
– DOE JGI
• Proteomics of Microbes
• Cellular Network Models
Metagenomic Challenge--Enormous Biodiversity:
Very Little of GOS Metagenomic Data Assembles Well
•
Use Reference Genomes to Recruit Fragments
– Compared 334 Finished and 250 Draft Microbial Genomes
•
Only 5 Microbial Genera Yielded Substantial and Uniform Recruitment
– Prochlorococcus, Synechococcus, Pelagibacter, Shewanella, and Burkholderia
Source: Douglas Rusch, et al. (PLOS Biology March 2007)
Use of Self Organizing Maps to Identify Species
Massive Computation on the Japanese Earth Simulator
C. Elegans
Drosophilia
Rice
Arabidopsis
SOM Created
from an
Unsupervised
Neural Network
Algorithm
to Analyze
Tetranucleotide
Frequencies in a
Wide Range of
Genomes
Fugu
Human
10kb Moving Window
T. Abe, H. Sugawara, S. Kanaya, T. Ikemura
Journal of the Earth Simulator, Volume 6, October 2006, 17–23
www.es.jamstec.go.jp/publication/journal/jes_vol.6/pdf/JES6_22-Abe.pdf
Using SOM, Sargasso Sea
Metagenomic Data Yields 92 Microbial Genera !
Eukaryotes
Mitochondria
Chloroplasts
Prokaryotes
Viruses
Input Genomes:
1500 Microbes
40 Eukaryotes
1065 Viruses
642 Mitochondria
42 Chloroplasts
5kb Window
T. Abe, H. Sugawara, S. Kanaya, T. Ikemura
Journal of the Earth Simulator, Volume 6, October 2006, 17–23
Moore Microbial Genome Sequencing Project
Selected Microbes Throughout the World’s Oceans
Microbes Nominated by
Leading Ocean Microbial
Biologists
www.moore.org/microgenome/worldmap.asp
Moore Foundation Funded the Venter Institute to Provide
the Full Genome Sequence of 155 Marine Microbes
Phylogenetic Trees Created
by Uli Stingl, Oregon State
Blue Means Contains
One of the Moore 155 Genomes
www.moore.org/microgenome/trees.aspx
Moore 155 Marine Microbial Genomes Gives
Broad Coverage of Microbial “Tree of Life”
Phylogenetic Trees Created
by Uli Stingl, Oregon State
www.moore.org/microgenome/alpha-proteobacteria.aspx
Joint Genome Institute
is a Leading Microbial Genomic Source
JGI Metagenomics Projects (42 Projects)
2005
termite hindgut (CalTech) planktonic archaea (MIT) EBPR sludge (UW/UQ) groundwater (ORNL)
2006
AMD
Alaskan soil (UW)
Gutless worm (MPI) TA-degrading bioreactor (NUS)
Antarctic bacterioplankton (DRI) hypersaline mats (UCol) Korarchaeota enrichment Farm soil (Diversa)
2007
8 new metagenomic projects
Source: Eddie Rubin,
DOE JGI
Key Problem with Analysis
of Microbial Metagenomic Data
Proteobacteria
TM6
OS-K
Acidobacteria
Termite Group
OP8
Nitrospira
Bacteroides
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
At Least 40 Phyla of Bacteria,
But Only a Few are Well Sampled
Source: Eddie Rubin,
DOE JGI
DOE Genomic Encyclopedia of Bacteria and Archaea
(GEBA) / Bergey Solution: Deep Sampling Across Phyla
Proteobacteria
TM6
OS-K
Acidobacteria
Termite Group
OP8
Nitrospira
Bacteroides
Chlorobi
Fibrobacteres
Marine GroupA
WS3
Gemmimonas
Firmicutes
Fusobacteria
Actinobacteria
OP9
Cyanobacteria
Synergistes
Deferribacteres
Chrysiogenetes
NKB19
Verrucomicrobia
Chlamydia
OP3
Planctomycetes
Spriochaetes
Coprothmermobacter
OP10
Thermomicrobia
Chloroflexi
TM7
Deinococcus-Thermus
Dictyoglomus
Aquificae
Thermudesulfobacteria
Thermotogae
OP1
OP11
Well sampled phyla
No cultured taxa
Source: Eddie Rubin,
DOE JGI
GEBA / Bergey
Pilot Project at JGI
• Goal
– To Finish ~100 Bacterial and Archaeal Genomes
– Selected Based on:
Input / Interactions with:
– Phylogeny,
– Availability of Phenotype Information Community Advisory Group ,
ASM,
– Community Interest
Academy of Microbiology,
Etc…
• Approach
– Select 200 Organisms
– Order DNA from Culture Collections (DSMZ and ATCC)
– Sequence 100 for which DNA QC is Received
• Project Lead (Jonathan Eisen JGI/UC Davis)
– Project Management (David Bruce JGI/LANL)
– Methods for Sequencing in Changing Technology Landscape (Paul
Richardson JGI)
– Linking to educational project (Cheryl Kerfeld JGI)
Source: Eddie Rubin,
DOE JGI
Converting Genome Sequences
to Protein Fold Space
•
•
•
•
•
•
How many folds?
How many
sequences adopt
the same fold?
How does function
vary as sequences
diverge within a
family?
Are there still
Kingdom-specific
families?
Can we determine
function from
structure?
How diverse are
metabolic
pathways and
networks?
5-amino-6-(5-phosphoribosylamino)
uracil reductase
JCSG: 2hxv
Building Genome-Scale Models
of Living Organisms
JTB 2002
E. coli i2K
Transcription &Translation
b8
v1
G1 + RNAP
G1 *
2aGDP + 2aPi
v2
Genomics
b3
protein1
b5
b1
2nPi
aAA
Transcriptomics
Regulatory Actions
v5
rib
v4 (subject to
global max.)
aAMP
+ 2aPi
aAA-tRNA
Regulation
rib1*
2aGTP
Pi
If [Carbon1] > 0, tc2 = 0
Monomers &
Energy
Proteins
Pc2
A
Metabolism
GLC trx
zwf
G6P
6PGA
6PG
pgl
H+
ATP
2PG
aceE
PYR trx
pflA
LAC trx
ETH
FORxt
ackA
ETHxt
AC
(+)
P5
3E
G6a
O6a
(+)
(-)
t2a
R5
C+
4 NADH
C + 2 ATP +
3 NADH
P2a
t6a
R2a
B
Hext
H
P6a
R6a
G + 1 ATP +
2 NADH
SUCC
SUCCxt
SuccCoA
Map Legend
sucA
aceA
CIT
acs
ETH trx
G2a
aceB
ACTP
FOR trx
t5
FOR
FADH
GLX
gltA
pta
adhE
FOR
dld
LAC
LACxt
fdoH
sdhA2
SUCC trx
OAA
pckA
AcCoA
pykF
O2a
O5
sucC
ppc
ppsA
AC trx
acnA
icdA
ICIT
ACxt
AKG
in Silico Organisms
Now Available
2007:
If Rh > 0, [H] is in surplus, t6a = 0
mdh
sfcA
PYR
PYRxt
frdA
eno
pts
NADH
maeB
PEP
GLxt
O2 +
NADH
sdhA1
fumA
MAL
gpmA
Rres
B
pnt1A
FUM
3PG
GL trx
Qh2
nuoA
RIBxt
pgk
glpK
cyoA
NADPH
RIB trx
gapA
DPG
GL3P
G5
O2xt
atpA
pnt2A
tpi
gpsA
Pres
ATP
R3b
O2 trx
rbsK
RIB
GA3P
GL
CO 2xt
CO2 trx
O2
FDP
glpD
tktA2
R5P
fba
Metabolomics
CO 2
talA
tktA1
rpiA
pfkA
fbp
O2
G
Pi trx
Ru5P
F6P
DHAP
tres
(+)
Pixt
Pi
gnd
pgi
Rc2
Gres
P3b
0.8 C +
2 NADH
Carbon2
If R1 = 0, we say [B] is not in surplus, t 2a = t5 = 0
E4P
rpe
glk
pts
S7P
X5P
GLC
Ores
O3b
t3b (+)
tc2
(-)
Carbon1
(indirect)
G3b
Gc2
Oc2
GLCxt
If Oxygen = 0, we say [O2] = 0, tres= t3b = 0
– Has 4300
Genes
– Model Has
2000!
b9
b7
Proteomics
JBC 2002
b4
mRNA1v3=k1[mRNA1] nNMP
atRNA
v6
b6
b2
nNTP
aATP
• E. Coli
GROWTH/BIOMASS
PRECURSORS
Input Signals
EXTRACELLULAR
METABOLITE
INTRACELLULAR
METABOLITE
reaction/gene name
Interactomics
Environment
Source: Bernhard Palsson
UCSD Genetic Circuits Research Group
http://gcrg.ucsd.edu
•Escherichia coli
•Haemophilus influenzae
•Helicobacter pylori
•Homo sapiens Build 1
•Human red blood cell
•Human cardiac mitochondria
•Methanosarcina barkeri
•Mouse Cardiomyocyte
•Mycobacterium tuberculosis
•Saccharomyces cerevisiae
•Staphylococcus aureus
Biochemically, Genetically and Genomically (BiGG)
Genome-Scale Metabolic Reconstructions
S. aureus
• 640 Reactions
• 619 Genes
S. typhimurium
• 898 Reactions
• 826 Genes
M. barkeri
• 619 Reactions
• 692 Genes
RBC
Mitoc.
• 39 Rxns• 218 Rxns
H. sapiens
• 3311
Reactions
• 1496 Genes
S. aureus
S. typhimurium
H. influenzae
H. pylori
E. coli
• 2035 Reactions
• 1260 Genes
M. tuberculosis
• 939 Reactions
• 661 Genes
H. pylori
• 558 Reactions
• 341 Genes
H. influenzae
• 472 Reactions
• 376 Genes
S. cerevisiae
• 1402 Reactions
• 910 Genes
Systems Biology Research Group
http://systemsbiology.ucsd.edu
Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome
Acidobacteria bacterium Ellin345
Soil Bacterium 5.6 Mb
Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome
Source: Raj Singh, UCSD
Use of Tiled Display Wall OptIPortal
to Interactively View Microbial Genome
Source: Raj Singh, UCSD
OptIPortal–Termination Device
for the Dedicated Gigabit/sec Lightpaths
Collaborative
Analysis of
Large Scale
Images of
Cancer Cells
Integration
of High
Definition
Video
Streams
with Large
Scale Image
Display
Walls
Photo Source: David Lee,
Mark Ellisman NCMIR, UCSD
An Emerging High Performance Collaboratory
for Microbial Metagenomics
OptIPortals
UW
UMich
NW!
UIC EVL
MIT
UC Davis
JCVI
UCI
SIO
UCSD
SDSU
CICESE
OptIPortal