The Tree of Life Viewed by Protein Domain Content

Download Report

Transcript The Tree of Life Viewed by Protein Domain Content

From Reductionism Comes
New Science:
Protein Structure Data Reveals
How Environmental Pressures
Shape Evolution
PHAR 201/Bioinformatics I
Philip E. Bourne
Department of Pharmacology, UCSD
PHAR 201 Lecture 08, 2012
1
Introduction
• Previously we reviewed one system of
reductionism – SCOP
• SCOP is used to assign superfamilies and
families to complete proteomes in another
resource called SUPERFAMILY
• Today we will see how this is used to do new
science (Dupont et al PNAS 2007 103(47) 17822-17827; PNAS 2010
doi: 10.1073/pnas.0912491107 )
• We cast this new science in the context of the
Gaia hypothesis
PHAR 201 Lecture 08, 2012
2
The SCOP Hierarchy v1.75
Based on 38221 Structures
7
1195
1962
3902
110800
PHAR 201 Lecture 08, 2012
3
The Gaia Hypothesis
Gaia (pronounced /'geɪ.ə/ or /'gaɪ.ə/) "land" or "earth", from the
Greek Γαῖα; is a Greek goddess personifying the Earth
Gaia - a complex entity involving the
Earth's biosphere, atmosphere, oceans,
and soil; the totality constituting a
feedback system which seeks an
optimal physical and chemical
environment for life on this planet.
James Lovelock
PHAR 201 Lecture 08, 2012
4
We Show Some Support for the Gaia
Hypothesis
Emergent
properties of an
organism have
been influenced
by the
environment
These organisms in
turn have influenced
the environment
PHAR 201 Lecture 08, 2012
5
Nature’s Reductionism
There are ~ 20300 possible proteins
>>>> all the atoms in the Universe
11.2M protein sequences from
10,854 species (source RefSeq)
38,221 protein structures
yield 1195 domain folds (SCOP 1.75)
PHAR 201 Lecture 08, 2012
6
What Does Nature’s Reductionism
Tell Us?
• The advent of a new fold is a big deal
• From new folds come new function(s)
• Are these new folds enough to distinguish
“species”?
PHAR 201 Lecture 08, 2012
7
To Answer this Question We Only
Need to Make Use of Existing
Resources
• SCOP – Further catalogs Nature’s
reductionism into structural domains, folds,
families and superfamilies
• SUPERFAMILY assigns the above to fully
sequenced proteomes
PHAR 201 Lecture 08, 2012
8
Method – Distance Determination
Presence/Absence Data Matrix
organisms
(FSF)
SCOP
Distance Matrix
SUPERFAMILY
C. intestinalis
C. briggsae
F. rubripes
a.1.1
1
1
1
a.1.2
1
1
1
a.10.1
0
0
1
a.100.1
1
1
1
a.101.1
0
0
0
a.102.1
0
1
1
a.102.2
1
1
1
C. intestinalis
C. briggsae
C. intestinalis
C. briggsae
F. rubripes
0
101
109
0
144
F. rubripes
0
PHAR 201 Lecture 08, 2012
9
The Answer Would Appear to be
Yes
• It is possible to
generate a
reasonable tree of life
from merely the
presence or absence
of superfamilies within
a given proteome
Yang, Doolittle and Bourne
2005 PNAS 102(2): 373-378
PHAR 201 Lecture 08, 2012
10
Moreover…
Distribution of among the three kingdoms
as taken from SUPERFAMILY
Eukaryota (650)
• Superfamily
distributions would
seem to be related to
the complexity of life
135
153/14
10
21/2
• Update of the work of
Caetano-Anolles2
(2003) Genome
Biology 13:1563
118
310/0
645/49
387
9/1
12
29/0
17
Archaea (416)
42
68/0
Bacteria (564)
SCOP fold (765 total)
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
Any genome / All genomes
PHAR 201 Lecture 08, 2012
11
The Unique Superfamily in Archaea – d.17.6
• Archaeosine tRNAguanine transglycosylase
(tgt), C2 domain
• First step in the
biosynthesis of an
archaea-specific modified
base, archaeosine (7formamidino-7deazaguanosine)
• Found in tRNAs
• At present found
exclusively in Archaea.
Reference: Interpro IPR004804
PHAR 201 Lecture 08, 2012
12
Let us Take This a Step Further
Consider the Distribution of Disulfide Bonds
among Folds
• Disulphides are only stable under
oxidizing conditions
• Oxygen content gradually
accumulated during the earth’s
evolution
• The divergence of the three
kingdoms occurred 1.8-2.2 billion
years ago
• Oxygen began to accumulate ~ 2.0
billion years ago
• Logical deduction – disulfides more
prevalent in folds (organisms) that
evolved later
• This would seem to hold true
Eukaryota
31.9%
(43/135)
0%
(0/10)
0%
(0/2)
1
4.7%
(18/387)
14.4%
(17/118)
5.9%
(1/17)
Archaea
16.7%
(7/42)
Bacteria
SCOP fold (708 total)
• Can we take this further?
PHAR 201 Lecture 08, 2012
13
Recap So Far
• Structure is a useful tool to study evolution
since it is conserved over longer periods of
geological time
• A course-grained characterization of
structure, namely superfamily,
distinguishes between species
• There is a tantalizing suggestion that
proteomes may contain imprints of their
ancient environment
PHAR 201 Lecture 08, 2012
14
Recap So Far
• Structure is a useful tool to study evolution
since it is conserved over longer periods of
geological time
• A course-grained characterization of
structure, namely superfamily,
distinguishes between species
• There is a tantalizing suggestion that
proteomes may contain imprints of their
ancient environment
PHAR 201 Lecture 08, 2012
15
Consider Changes in Metal Ion
Concentrations
Chris Dupont, Scripps Institute of
Oceanography (now JCVI)
Bioinformatics Final Exam 2004
Dupont, Yang, Palenik, Bourne. PNAS 2007 103(47) 17822-17827;
PNAS 2010 doi: 10.1073/pnas.0912491107
PHAR 201 Lecture 08, 2012
16
Evolution of the Earth
•
•
•
•
•
4.5 billion years of change
300+50K
1-5 atmospheres
Constant photoenergy
Chemical and geological
changes
• Life has evolved in this time
• The ocean was the “cradle”
for 90% of evolution
PHAR 201 Lecture 08, 2012
17
Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earth’s History
Bacteria
Archaea
Eukarya
1
Oxygen
0
1.00E-08
Zinc
1.00E-12
1.00E-16
1.00E-20
1.00E-06
Iron
1.00E-09
1.00E-12
1.00E-15
1.00E-07
Cobalt
Manganese
1.00E-09
1.00E-11
4.5
4
3.5
3
2.5
2
1.5
1
0.5
Billions of years before present
0
Concentration
(O2 in arbitrary units, Zn and Fe in moles L-1
0.5
• Whether the deep ocean
became oxic or euxinic
following the rise in
atmospheric oxygen (~2.3
Gya) is debated, therefore both
are shown (oxic ocean-solid
lines, euxinic ocean-dashed
lines).
• The phylogenetic tree symbols
at the top of the figure show
one idea as to the theoretical
periods of diversification for
each Superkingdom.
Replotted from Saito et al, 2003
Inorganica Chimica Acta 356: 308-318
PHAR 201 Lecture 08, 2012
18
Making the Metallome of Each Species
– Can Only be Done from Structure
1.
2.
3.
4.
5.
6.
7.
Start with SCOP
Each {super}family level
assignment was checked
manually for metal binding
All the structures
representing the family had
to bind the metal for it to be
considered unambiguous
The literature was consulted
to resolve ambiguities
Superfamily database used
to map to proteomes
23 Archaea, 233 Bacteria, 57
Eukaryota
Cu, Ni, Mo ignored (<0.3%)
of proteome
PHAR 201 Lecture 08, 2012
19
Levels of Ambiguity
• Ambiguous superfamily binds different
metals or have members that are not
known to bind metals
• Ditto families
• Approx 50% of superfamilies and 10% of
families are ambiguous
• Only unambiguous families used in this
study
PHAR 201 Lecture 08, 2012
20
Superfamily Distribution As Well As
Overall Content Has Changed
Bacteria Fe
superfamilies
a.1.1
a.1.2
a.1.1
a.1.2
a.104.1
a.110.1
a.104.1
a.110.1
a.119.1
a.138.1
a.119.1
a.138.1
a.2.11
a.24.3
a.2.11
a.24.3
a.24.4
a.25.1
a.24.4
a.25.1
a.3.1
a.39.3
a.3.1
a.39.3
a.56.1
a.93.1
a.56.1
a.93.1
b.1.13
b.2.6
b.1.13
b.2.6
b.3.6
b.33.1
b.3.6
b.33.1
b.70.2
b.82.2
b.70.2
b.82.2
c.56.6
c.83.1
c.56.6
c.83.1
c.96.1
d.134.1
c.96.1
d.134.1
d.15.4
d.174.1
d.15.4
d.174.1
d.178.1
d.35.1
d.178.1
d.35.1
d.44.1
d.58.1
d.44.1
d.58.1
e.18.1
e.19.1
e.18.1
e.19.1
e.26.1
e.5.1
e.26.1
e.5.1
f.21.1
f.21.2
f.21.1
f.21.2
f.24.1
f.26.1
f.24.1
f.26.1
g.35.1
g.36.1
g.35.1
g.36.1
Eukaryotic Fe
superfamilies
g.41.5
PHAR 201 Lecture 08, 2012
g.41.5
21
14
100
90
80
70
60
50
40
30
20
10
0
12
10
8
6
4
2
0
Unique Fe-binding fold families
(108 total)
(♦)Average copy number
(x) Percent of Bacterial proteomes
which a fold family occurs in
Metallomes are Very Diverse
(Discriminatory)
• A quantile plot showing the
percent of Bacterial proteomes
each Fe-binding fold family
occurs in (x).
• This plot also shows the
average copy number of that
fold family in the proteomes
where it occurs (♦).
• Few Fe-binding folds are in
most proteomes.
• Widespread Fe-binding folds
are not necessarily abundant.
• Similar trends are observed for
Zn, Mn, and Co in all three
Superkingdoms.
PHAR 201 Lecture 08, 2012
22
2
A
102.5
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not
Consistent Across Superkingdoms
Total domains in a proteome
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
Co
Since these data are derived from current species they are independent of
evolutionary events such as duplication, gene loss, horizontal transfer and
endosymbiosis
PHAR 201 Lecture 08, 2012
23
Power Laws: Fundamental Constants
in the Evolution of Proteomes
A slope of 1 indicates that a group of structural
domains is in equilibrium with genome
growth, while a slope > 1 indicates that the
group of domains is being preferentially
duplicated (or retained in the case of genome
reductions).
van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.).
201 Lecture 08, 2012
Power laws, scale-free networks, PHAR
and genome
biology
24
2
A
102.5
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not
Consistent Across Superkingdoms
Total domains in a proteome
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
PHAR 201 Lecture 08, 2012
Co
25
Why are the Power Laws Different
for Each Superkingdom?
• Power laws are likely influenced by selective pressure.
Qualitatively, the differences in the power law slopes
describing Eukarya and Prokarya are correlated to the
shifts in trace metal geochemistry that occur with the rise
in oceanic oxygen
• We hypothesize that proteomes contain an imprint of the
environment at the time of the last common ancestor in
each Superkingdom
• This suggests that Eukarya evolved in an oxic
environment, whereas the Prokarya evolved in anoxic
environments
PHAR 201 Lecture 08, 2012
26
Do the Metallomes Contain Further
Support for this Hypothesis?
Superkingdom
Eukarya
Archaea
Bacteria
Fold Family
Cytochrome P450
Cytochrome c3-like
Cytochrome b5
Purple acid phosphatase
Penicillin synthase-like
Hypoxia-inducible factor
Di-heme elbow motif
4Fe-4S ferredoxins
MoCo biosynthesis proteins
Heme-binding PAS domain
HemN
a helical ferrodoxin
biotin synthase
ROO N-terminal domain-like
High potential iron protein
Heme-binding PAS domain
MoCo biosynthesis proteins
HemN
4Fe-4S ferredoxins
cytochrome c
a helical ferrodoxin
%
0.44 + 0.48
0.13 + 0.3
0.12 + 0.09
0.11 + 0.08
0.07 + 0.1
0.07 + 0.04
0.06 + 0.01
1.80 + 0.7
1.60 + 0.3
1.10 + 1.0
0.80 + 0.20
0.60 + 0.16
0.55 + 0.1
0.5 + 0.1
0.38 + 0.25
0.3 + 0.4
0.21 + 0.15
0.2 + 0.15
0.2 + 0.2
0.14 + 0.2
0.12 + 0.09
Fe-binding
heme
heme
heme
amino
amino
amino
heme
Fe-S
Fe-S
heme
Fe-S
Fe-S
Fe-S
amino
Fe-S
heme
Fe-S
Fe-S
Fe-S
heme
Fe-S
O2
yes
no
no
no
yes
yes
no
no
no
no
1
no
no
2
no
1
no
no
no
no
no
Overall percent of Fe bound by
Fe-S
heme
amino
21 + 9
47 + 19
32 + 12
68 + 12
13 + 14
19 + 6
47 + 11
22 + 12
31 + 16
1. Some, but not all, PAS domains actually sense oxygen
2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
PHAR 201 Lecture 08, 2012
27
e- Transfer Proteins
Same Broad Function, Same Metal, Different Chemistry
Induced by the Environment?
Fe-S clusters
Cytochromes
Fe bound by S
Fe bound by heme (and
amino-acids)
Cluster held in place by Cys
Generally negative reduction
potentials
Generally positive reduction
potentials
Less susceptible to oxidation
Very susceptible to oxidation
PHAR 201 Lecture 08, 2012
28
The importance of “small class”
Zn folds to Eukarya
Total “small class” Zn
binding domains
10000
B
A
Eukarya
30/53
18/28
1000
5/53
0/28
100
Bacteria
0/53
0/28
10
7/53
0/28
0/53
0/28
11/53
9/28
Archaea
0/53
1/28
1
100
1000
10000
100000
Distribution of 53 unique
small class Zn families
Total number of domains
in a proteomes
Bacteria
Archaea
Eukarya
1
Oxygen
0
1.00E-08
Zinc
1.00E-12
PHAR 201 Lecture 08, 2012
1.00E-16
1.00E-20
1.00E-06
Iron
Concentration
(O2 in arbitrary units, Zn and
0.5
29
Hypothesis
• Emergence of cyanobacteria changed
oxygen concentrations
• Impacted metal concentrations in the
ocean
• Organisms used new metals in new ways
to evolve new biological processes eg
complex signaling
• This in turn further impacted the
environment
PHAR 201 Lecture 08, 2012
30
A Final Thought
Perhaps We Should Study Both
the Life Sciences and Earth
Sciences Together?
PHAR 201 Lecture 08, 2012
31