Part 1 - C-MORE

Download Report

Transcript Part 1 - C-MORE

DOE JGI
Production Genomics Facility
Opened in 1999
~240 UC Employees
~$66M Annual Budget
~47 Tb for 2012
>37 Human genomes / day
Mission:
“User facility providing high-throughput DNA
sequencing & analysis in support of DOE missions in
alternative energy, carbon cycling & bioremediation.”
40
FY2011
15 Units
9 FTEs
$8M
29Tb
{
{
{
35
FY2010
22 Units
15 FTEs
$11M
6Tb
FY2009
49 Units
24 FTEs
$11M
1Tb
30
40
35
30
ABI3730xl Units
25
25
20
20
Roche/454 Units
GAIIx Units
3730
15
15
Budget ($M)
10
454
GAii
GAii
454
GAii
Hiseq
0
2009
Budget ($ Millions)
Output (Trillions Bases)
10
5
HiSeq Units
2010
454
2011
Hiseq
5
0
Staff (FTE)
DOE JGI
Community Sequencing Program
>700 Users
CSP 1
CSP 2
CSP 1
CSP 2
FOCUS
AREA
CSP 3
CSP 4
CSP 3
CSP 4
Program Science
2012 - Focus Areas
Dangl
Arabidopsis
Boechera
Miscanthus
Fierer
Tuscan
Poplar
Grassland
soil carbon
cycling
PLANT-MICROBE
INTERACTIONS
Gross
Agave
Brodie
Mediterranean
grassland
SOIL
CARBON CYCLING
Andersson
Lichen
McMahon
Crystal Bog
Lake
McKay
Lake Erie
AQUATIC
CARBON
CYCLING
Gilbert
Gulf of Mexico
Moran
Ocean
heterotrophs
Mohn
Rodrigues
American
forest soils
Amazon
deforestation
Microbial Genomics
Nikos Kyrpides
Microbial Genomics and Metagenomics Programs
DOE Joint Genome institute
Overview of the talk
 Historical perspective
 Do we need more sequencing?
 Major Transitions in Genomics
The Quest for Darwin’s Grail
"To understand life
[unlike understanding an electron]
you must understand its history"
Carl R. Woese
"The time will come I believe,
though I shall not live to see it,
when we shall have fairly true
genealogical trees of each great
kingdom of nature"
[in a letter to T.H. Huxley]
Charles Darwin, 1857
1st Edition –1957
“It is a waste of time to attempt a natural system of classification for
bacteria”
2nd Edition –1963
“The only possible conclusion is, accordingly that the ultimate
scientific goal of biological classification cannot be achieved in the
case of bacteria”
3rd Edition – 1970
“For bacteria, the general course of evolution will probably never be
known, and there is simply not enough objective evidence to base
their classification on phylogenetic grounds”
4th Edition – 1977 Verbatim repeat of the 3rd edition
1978 – 1st Universal Tree of Life
Microbial Genomics
“Genome sequencing has come of age, and
genomics will become central to microbiology’s
future. It may appear at the moment that the
human genome is the main focus and primary goal
of genome sequencing, but do not be deceived.
The real justification in the long run, is
microbial genomics.”
Carl Woese
1998
GREAT CHALLENGES
P. Chain et al. Science, 2009
7000
Genome Sequencing Projects on GOLD
>15,000 projects
6000
5000
4000
3000
Incomplete
Complete
2000
1000
0
1995-2009
2010-2015
2010-2013
Finished
1000
Draft
1000
3,000
3000
20,000
10000
Genes
8 Million
92
52 Million
Why we need more sequencing
Fructokinase family
Ribokinase family
2-dehydro-3-deoxy
glucokinase family
Genome projects 2000
Genome projects 2010
other phyla
18%
other divisions
24%
Proteobacteria
45%
Low G+C gram
positives
24%
Actinobacteria
7%
71 bacterial genomes
Proteobacteria
39%
Firmicutes
28%
5500 bacterial genomes
11%
Actinobacteria
18%
Poor sequence
coverage mainly
due to lack of
isolates, but
many gaps
have
unsequenced
representatives
Norman Pace
Culturable
Unculturable
• 99% of microorganisms
are not culturable with
present methods.
Sargasso Sea
Acid Mine Drainage
Termite Hindgut
Soil
Human Gut
Reference
Genomes
Species
complexity
1
10
100
1000
1000s
10000
Binning
?
The road to success in Metagenomics is through Microbial Genomics
Source: Susannah Tringe, JGI
Human gut
Acid Mine Drainage
Marine
Termite Gut
Soil
Reference Genomes
100%
60%
50%
40%
20%
?
1%
Known diversity of
cultured vs uncultured
organisms
16S rRNA Distance
16S rRNA Distance
Coverage of Cultured
Microbes with
Genome Projects
Number of Organisms
Number of Organisms
GEBA
Goal:
Filling in the gaps in
sequencing along the
bacterial and archaeal
branches of the
Tree of Life
Status:
• >120 Complete
• >100 Draft
• >100 In progress
Wu et al. Nature 2009
http://img.jgi.doe.gov/GEBA
Improved Gene annotation and characterization of hypothetical gene families based on
Mavromatis
(a)
novel gene fusions: 7.7 times more fusion events than any other randomly selected
56 genomes
(b)
novel gene neighbourhoods: 4.3 times more novel gene neighbourhoods than any
other randomly selected 56 genomes
(c)
novel connections of protein families: 47 times more novel connections of protein
families than any other randomly selected 56 genomes
Amrita Pati
Wu et al. Nature 2009
16S distance
cultured
uncultured
15%
Rob
Knight
85%
~ 35,000 OTUs
Wu et al. Nature 2009
STEERING COMMITTEE
Nikos Kyrpides, DOE-JGI
George Garrity, Names4Life
Hans-Peter Klenk, DSMZ
Phil Hugenholtz, JGI
The Microbial Earth Project
Victor Kunin, JGI
Dino Liolios, GOLD
Microbial Earth Project
10
Prochlorococcus marinus Pangenome
17
Listeria monocytogenes Pangenome
Staphylococcus aureus Pangenome
15
Strain / species diversity
14765
2733
10434
5820
= 1.8
= 5.4
Best Blast Hit
with Pangenomes
Reference Genome
Pangenome
PANGENOMES
HOW REPRESENTATIVE OF THE SPECIES IS THE
PANGENOME?
1. How representative is the pangenome phylogenetically?
PANGENOME
☺
☺
☺
☺
☺
☺
☺
☺
16S DIVERSITY
2. How representative is it geographically?
☺☺
☺
☺
☺
☺
☺
☺
☺☺
1960-1990
PARADIGM SHIFT
1990-2010
16S RNA
2010-2020
Genomes
Pangenomes
Cell isolation
Amplification
16S Screening
Sequencing