The Tree of Life Viewed by Protein Domain Content
Download
Report
Transcript The Tree of Life Viewed by Protein Domain Content
Evolutionary Insights from
Protein Structure
Philip E. Bourne
University of California San Diego
[email protected]
Support Open Access – All the work here does
Dalhousie December 2007
1
Agenda
•
•
•
•
•
Why is protein structure useful?
Tree construction using protein structure
One protein superfamily in more detail
Environmental influence
On-going work
– The role of calcium over time
– Applying structural domain combinations
– Co-evolution of kinases and phosphatases
Dalhousie December 2007
2
Phosphoinositide-3 Kinase (D) and
Actin-Fragmin Kinase (E)
PKA
ChaK (“Channel Kinase”)
Why is protein structure useful?
Dalhousie December 2007
3
The Key is Nature’s
Reductionism
There are ~ 20300 possible proteins
>>>> all the atoms in the Universe
~6.7M protein sequences from
4734 species (source RefSeq)
34,494 protein structures
yield 1086 folds (SCOP 1.73)
Why is protein structure useful?
Dalhousie December 2007
4
It follows that structure is more
conserved than sequence
Hence, structure comparison
reveals relationships not
detectable from sequence alone
Stated another way, structure offers
the opportunity to look at more distant
evolutionary relationships
Why is protein structure useful?
Dalhousie December 2007
5
Potential Problems in Using
Structure on a Proteomic Scale
• Is structural space well enough populated?
• Is proteome coverage by structure with
current detection methods enough?
• Currently 50-70%
Why is protein structure useful?
Dalhousie December 2007
6
Initial Bold Question:
With this level of coverage and
assuming we know a high
percentage of all folds, is
structure useful in discriminating
species?
Tree Construction Using
Protein Structure
Dalhousie December 2007
7
Russ Doolittle,
Professor
Center for Molecular Genetics
UCSD
Song Yang
Former Graduate Student
Department of Chemistry and
Biochemistry
UCSD
Yang, Doolittle & Bourne (2005) PNAS 102(2) 373-8
Tree Construction Using
Protein Structure
Dalhousie December 2007
8
To Answer this Question We Only
Need to Make Use of Existing
Resources
• SCOP – Further catalogs Nature’s
reductionism into structural domains, folds,
families and superfamilies
• SUPERFAMILY assigns the above to fully
sequenced proteomes
Tree Construction Using
Protein Structure
Dalhousie December 2007
9
Use of SCOP Superfamilies
Using structure, how do you distinguish
convergent versus divergent evolution?
The SCOP notion of SUPERFAMILY with
evidence of weak sequence relationships
can be used to discount convergence.
Tree Construction Using
Protein Structure
Dalhousie December 2007
10
Structural Organization
SCOP v1.73
7
1086
1777
3464
97178
Tree Construction Using
Protein Structure
Dalhousie December 2007
11
Is Structure a Useful Discriminator Maybe…
Distribution among the three kingdoms as taken from
SUPERFAMILY Eukaryota (650)
135
153/14
• Superfamily
distributions would
seem to be related to
the complexity of life
10
21/2
645/49
387
9/1
12
• Update of the work of
Caetano-Anolles2
(2003) Genome
Biology 13:1563
118
310/0
29/0
17
42
68/0
Archaea (416)
Bacteria (564)
SCOP fold (765 total)
Any genome / All genomes
Tree Construction Using
Protein Structure
Dalhousie December 2007
12
The Unique Superfamily in Archaea – d.17.6
• Archaeosine tRNAguanine transglycosylase
(tgt), C2 domain
• First step in the
biosynthesis of an
archaea-specific modified
base, archaeosine (7formamidino-7deazaguanosine)
• Found in tRNAs
• Was found exclusively in
Archaea.
Tree Construction Using
Protein Structure
Dalhousie December 2007
Reference: Interpro IPR004804
13
Method – Distance Determination
Presence/Absence Data Matrix
organisms
(FSF)
SCOP
Distance Matrix
SUPERFAMILY
C. intestinalis
C. briggsae
F. rubripes
a.1.1
1
1
1
a.1.2
1
1
1
a.10.1
0
0
1
a.100.1
1
1
1
a.101.1
0
0
0
a.102.1
0
1
1
a.102.2
1
1
1
C. intestinalis
C. briggsae
C. intestinalis
C. briggsae
F. rubripes
0
101
109
0
144
F. rubripes
Tree Construction Using
Protein Structure
0
Dalhousie December 2007
14
Is Structure a Useful Discriminator
- Yes
Archaea
Bacteria
Eukaryota
The method cleanly placed all species in their
correct superkingdoms
Tree Construction Using
Protein Structure
Dalhousie December 2007
15
Presence/absence vs. Abundance
• Abundance fails to distinctly separate the three
superkingdoms
• Presence/absence succeeds in distinctly separating the
three superkingdoms
• Why?
–
–
–
–
–
–
Emergence or loss of a FSF is a major evolutionary event
Emergence of a new FSF may lead to 1-n new functions
Gene loss likely; FSF less likely
Horizontal gene transfer only relevant if it introduces a FSF
Not affected by gene duplication
Coverage and sensitivity while not perfect is enough
Tree Construction Using
Protein Structure
Dalhousie December 2007
16
Trees of Archaea
Our
NCBI
Crenarchaeota
Pyrococcus furiosus
Pyrococcus horikoshii
Pyrococcus
Pyrococcus abyssi
Thermoplasma volcanium
15
Sulfolobus tokodaii
14
Sulfolobus solfataricus
11
Pyrobaculum aerophilum
2
Aeropyrum pernix
13
Pyrococcus furiosus
Halobacterium sp. NRC-1
12
Pyrococcus horikoshii
Sulfolobus tokodaii
10
Pyrococcus abyssi
17
Thermoplasma volcanium
Thermoplasma acidophilum
Sulfolobus solfataricus
Pyrobaculum aerophilum
Thermoplasma
Crenarchaeota
16
Thermoplasma acidophilum
Aeropyrum pernix
3
Halobacterium sp. NRC-1
Methanosarcina mazei
9
Methanosarcina mazei
Methanosarcina acetivorans
4
Methanosarcina acetivorans
Archaeoglobus fulgidus
6
Methanocaldococcus jannaschii
Methanopyrus kandleri
1
Archaeoglobus fulgidus
Methanocaldococcus jannaschii
7
Methanopyrus kandleri
Methanobacterium thermoautotrophicum
8
Methanobacterium thermoautotrophicum
Methanothermobacter thermautotrophicus
5
Methanothermobacter thermautotrophicus
Methanogen
Euryarchaeota
Tree Construction Using
Protein Structure
Dalhousie December 2007
17
Clostridiales
Our Tree of
Bacteria
Bacilli
Deinococcus
• 123 Bacteria
• Parasitic bacteria
are not grouped
with their full gene
complement
counterparts
• They are sorted into
proper groupings
that mirror the
overall tree
• A few anomalies
Tree Construction Using
Protein Structure
Dalhousie December 2007
Actinobacteria
Bacilli
Planctomycetacia
Spirochaetes
βγ-proteobacteria
α-proteobacteria
Thermotogae
Fusobacteria
Bacteroidetes
Cyanobacteria
Chlorobia
ε-proteobacteria
Aquificales
Mollicutes –
Parasitic Firmicutes
Parasitic Spirochaetes
Parasitic α-proteobacteria
Parasitic γ-proteobacteria
Parasitic Actinobacteria
Chlamydiae
18
Eukaryotes – Anomalies May Point
to Genome Problems
Frog genome appears
contaminated with bacterial genes
Tree Construction Using
Protein Structure
Dalhousie December 2007
19
A Closer Look at One Superfamily:
The Protein Kinase-Like Superfamily
Eric Scheeff
Scheeff & Bourne 2005 PLoS Comp. Biol. 1(5): e49
A Closer Look at
One Superfamily
Dalhousie December 2007
20
The Protein Kinase-like Superfamily
• A large family important
to signal transduction in
eukaryotes and many
bacteria.
• Phosphotransferases:
transfer phosphate group
from ATP to Ser/Thr or
Tyr residue on target
protein, producing a
range of downstream
signaling effects.
• PKA: an example of a
typical protein kinase
(TPK) fold, shown in
“open book” format
A Closer Look at
One Superfamily
Dalhousie December 2007
21
The Protein Kinase-Like Superfamily
• A range of different
families, all
phosphotransferases
• A variety of different
targets
• All possess a core
cassette of elements
shared with the TPKs:
Family
Structural
Representative
Phosphorylates
Biological result
Typical Protein
Kinases (TPKs)
Protein Kinase A
(PKA)
Ser/Thr or Tyr
residues of proteins
Range of signaling
effects
Alpha kinases
Channel Kinase
(ChaK)
Ser/Thr residues in
alpha-helices
Range of signaling
effects
Actin-Fragmin
Kinase (AFK)
Actin-Fragmin
Kinase (AFK)
Thr residue of actin
Control of actin
polymerization
Phosphatidyl
-inositol 3- and 4kinases
Phosphatidylinositol
3-kinase (PI3K)
Phosphatidylinositol
(PI), PIphosphates, PIbisphosphates
Range of secondmessenger signaling
effects
Phosphatidylinositol phosphate
kinases
Phosphatidylinositol
phosphate kinase
(PIPK)
PI-phosphates
Range of secondmessenger signaling
effects
Choline/
ethanolamine
kinases
Choline Kinase
(CK)
Choline
Part of pathway that
eventually produces
phoshpatidylcholine,
important constituent
of membranes
Aminoglycoside
Kinases
Aminoglycoside
Kinases (AK)
Aminoglycoside
antibiotics
Antibiotic resistance
• ATP binding
• Catalysis
• Structures can be
highly variable,
particularly in the
substrate binding regions
A Closer Look at
One Superfamily
Dalhousie December 2007
22
Method
• Begin with a multiple structure alignment
using CE-MC (NAR 2004) of 30
“comparable” TPKs and APKs and
manually correct in a pair-wise manner
over a period of 1-2 person years
• Review the literature on each structure
• Review the associated sequence
alignments derived from structure
A Closer Look at
One Superfamily
Dalhousie December 2007
23
Phosphoinositide-3 Kinase (D) and
Actin-Fragmin Kinase (E)
PKA
ChaK (“Channel Kinase”)
A Closer Look at
One Superfamily
Dalhousie December 2007
24
Can We Propose an Evolutionary History for the Protein Kinase-Like
Superfamily?
• Bayesian inference of phylogeny
(MrBayes)
• Manual structure alignment
produces very high-quality
sequence alignment of diverse
homologues
• But, sequence information too
degraded to produce branching
with sufficient support (i.e. a high
posterior probability)
• Addition of a matrix of structural
characteristics (similar to
morphological characteristics)
produces a well supported
combined model
1 2 3 4 5
Example columns:
1BO1
Atypical
0
0
0
0
1
1IA9
Atypical
1
1
1
1
0
1) Ion pair analogous
to K72-E91 in PKA
1E8X
Atypical
1
0
1
1
1
2) α-Helix B present
3) State of α-Helix C
(0: kinked, 1: straight)
4) State of Strand 4
(0: kinked, 1: straight)
5) α-Helix D present
• Neither sequence structural
characteristics sufficient to alone
produce resolved tree, must be
used in combination.
A Closer Look at
One Superfamily
Dalhousie December 2007
1CJA
Atypical
1
0
1
1
1
1NW1
Atypical
1
0
1
0
0
1J7U
Atypical
1
0
1
0
1
1CDK
AGC
1
1
1
0
1
1O6L
AGC
1
1
1
0
1
1OMW
AGC
1
1
1
0
1
1H1W
AGC
1
1
1
0
1
1MUO
Other
1
1
1
0
1
1TKI
CAMK
1
0
1
0
1
1JKL
CAMK
1
0
1
0
1
1A06
CAMK
1
0
1
0
1
1PHK
CAMK
1
0
1
0
1
1KWP
CAMK
1
0
1
0
1
1IA8
CAMK
1
0
1
0
0
1GNG
CMGC
1
0
1
0
1
1HCK
CMGC
1
0
1
0
1
1JNK
CMGC
1
0
1
0
1
1HOW
CMGC
1
0
1
0
1
1LP4
Other
1
0
1
0
1
1F3M
STE
1
0
1
0
1
1O6Y
Other
1
0
1
0
1
1CSN
CK1
1
0
1
0
1
1B6C
TKL
1
0
1
0
1
2SRC
TK
1
0
1
0
1
1LUF
TK
1
0
1
0
1
1IR3
TK
1
0
1
0
1
1M14
TK
1
0
1
0
1
1GJO
TK
1
0
1
0
1
25
Proposed Evolutionary History for the Protein Kinase-Like Superfamily
• Suggests distinctive history
for atypical kinases, as
opposed to intermittent
divergence from the typical
protein kinases (TPKs)
APH
AGC
CK
• TPK portion of tree shows
high degree of agreement
with Manning tree
• Branching is supported by
species representation of
kinase families
CAMK
0.64
AFK
0.97
CMGC
1.0
0.85
0.78
TKL
PI3K
CK1
TK
•Atypical kinase families: Blue
A Closer Look at
One Superfamily
PIPKIIβ
Dalhousie
ChaKDecember 2007
•Typical protein kinase groups
(subfamilies): Red
•Branch labels: posterior
probability of branch
26
Has the Environment had an Influence
on Modern Day Proteomes?
Chris Dupont
Scripps Institute of Oceanography
UCSD
Dupont, Yang, Palenik, Bourne. 2006 PNAS 103(47) 17822-17827
Environmental Influence
Dalhousie December 2007
27
Consider the Distribution of Disulfide
Bonds among Folds
• Disulphides are only stable under
oxidizing conditions
• Oxygen content gradually
accumulated during the earth’s
evolution
• The divergence of the three
kingdoms occurred 1.8-2.2 billion
years ago
• Oxygen began to accumulate ~ 2.0
billion years ago
• Logical deduction – disulfides more
prevalent in folds (organisms) that
evolved later
• This would seem to hold true
Eukaryota
31.9%
(43/135)
0%
(0/10)
0%
(0/2)
1
4.7%
(18/387)
14.4%
(17/118)
5.9%
(1/17)
Archaea
16.7%
(7/42)
Bacteria
SCOP fold (708 total)
• Can we take this further?
Environmental Influence
Dalhousie December 2007
28
Theoretical Levels of Trace Metals and Oxygen in
the Deep Ocean Through Earth’s History
Bacteria
Archaea
Eukarya
1
Oxygen
0
1.00E-08
Zinc
1.00E-12
1.00E-16
1.00E-20
1.00E-06
Iron
1.00E-09
1.00E-12
1.00E-15
1.00E-07
Cobalt
Manganese
1.00E-09
1.00E-11
4.5
4
3.5
3
2.5
2
1.5
1
0.5
Billions of years before present
0
Concentration
(O2 in arbitrary units, Zn and Fe in moles L-1
0.5
• Whether the deep ocean
became oxic or euxinic
following the rise in
atmospheric oxygen (~2.3
Gya) is debated, therefore both
are shown (oxic ocean-solid
lines, euxinic ocean-dashed
lines).
• The phylogenetic tree symbols
at the top of the figure show
one idea as to the theoretical
periods of diversification for
each Superkingdom.
Replotted from Saito et al, 2003
Inorganica Chimica Acta 356: 308-318
Environmental Influence
Dalhousie December 2007
29
Making the Metallome of Each Species
– Can Only be Done from Structure
1.
2.
3.
4.
5.
6.
7.
Start with SCOP
Each {super}family level
assignment was checked
manually for metal binding
All the structures
representing the family had
to bind the metal for it to be
considered unambiguous
The literature was consulted
to resolve ambiguities
Superfamily database used
to map to proteomes
23 Archaea, 233 Bacteria, 57
Eukaryota
Cu, Ni, Mo ignored (<0.3%)
of proteome
Dalhousie December 2007
Environmental Influence
30
Levels of Ambiguity
• Ambiguous superfamily binds different
metals or have members that are not
known to bind metals
• Ditto families
• Approx 50% of superfamilies and 10% of
families are ambiguous
• Only unambiguous families used in this
study
Environmental Influence
Dalhousie December 2007
31
Superfamily Distribution As Well As
Overall Content Has Changed
Bacteria Fe
superfamilies
a.1.1
a.1.2
a.1.1
a.1.2
a.104.1
a.110.1
a.104.1
a.110.1
a.119.1
a.138.1
a.119.1
a.138.1
a.2.11
a.24.3
a.2.11
a.24.3
a.24.4
a.25.1
a.24.4
a.25.1
a.3.1
a.39.3
a.3.1
a.39.3
a.56.1
a.93.1
a.56.1
a.93.1
b.1.13
b.2.6
b.1.13
b.2.6
b.3.6
b.33.1
b.3.6
b.33.1
b.70.2
b.82.2
b.70.2
b.82.2
c.56.6
c.83.1
c.56.6
c.83.1
c.96.1
d.134.1
c.96.1
d.134.1
d.15.4
d.174.1
d.15.4
d.174.1
d.178.1
d.35.1
d.178.1
d.35.1
d.44.1
d.58.1
d.44.1
d.58.1
e.18.1
e.19.1
e.18.1
e.19.1
e.26.1
e.5.1
e.26.1
e.5.1
f.21.1
f.21.2
f.21.1
f.21.2
f.24.1
f.26.1
f.24.1
f.26.1
g.35.1
g.36.1
g.35.1
g.36.1
Eukaryotic Fe
superfamilies
g.41.5
Environmental Influence
Dalhousie December 2007
g.41.5
32
14
100
90
80
70
60
50
40
30
20
10
0
12
10
8
6
4
2
0
Unique Fe-binding fold families
(108 total)
Environmental Influence
(♦)Average copy number
(x) Percent of Bacterial proteomes
which a fold family occurs in
Metallomes are Discriminatory
• A quantile plot showing the
percent of Bacterial proteomes
each Fe-binding fold family
occurs in (x).
• This plot also shows the
average copy number of that
fold family in the proteomes
where it occurs (♦).
• Few Fe-binding folds are in
most proteomes.
• Widespread Fe-binding folds
are not necessarily abundant.
• Similar trends are observed for
Zn, Mn, and Co in all three
Superkingdoms.
Dalhousie December 2007
33
2
A
102.5
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not
Consistent Across Superkingdoms
Total domains in a proteome
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
Co
Since these data are derived from current species they are independent of
evolutionary events such as duplication, gene loss, horizontal transfer and
endosymbiosis
Environmental Influence
Dalhousie December 2007
34
Power Laws: Fundamental Constants
in the Evolution of Proteomes
A slope of 1 indicates that a group of structural
domains is in equilibrium with genome
growth, while a slope > 1 indicates that the
group of domains is being preferentially
duplicated (or retained in the case of genome
reductions).
van Nimwegen E (2006) in: Koonin EV, Wolf YI, Karev GP, (Ed.).
Power laws, scale-free networks, and genome biology
Environmental Influence
Dalhousie December 2007
35
2
A
102.5
Slope of fitted power law
Total Zn-binding domains in a proteome
10
10 4
Metal Binding Proteins are Not
Consistent Across Superkingdoms
Total domains in a proteome
Environmental Influence
105
B
Archaea
Bacteria
Eukarya
Zn
Fe
Mn
1
0
Dalhousie December 2007
Co
36
Why are the Power Laws Different
for Each Superkingdom?
• Power laws are likely influenced by selective pressure.
Qualitatively, the differences in the power law slopes
describing Eukarya and Prokarya are correlated to the
shifts in trace metal geochemistry that occur with the rise
in oceanic oxygen
• We hypothesize that proteomes contain an imprint of the
environment at the time of the last common ancestor in
each Superkingdom
Environmental Influence
Dalhousie December 2007
37
Do the Metallomes Contain Further
Support for this Hypothesis?
Superkingdom
Eukarya
Archaea
Bacteria
Fold Family
Cytochrome P450
Cytochrome c3-like
Cytochrome b5
Purple acid phosphatase
Penicillin synthase-like
Hypoxia-inducible factor
Di-heme elbow motif
4Fe-4S ferredoxins
MoCo biosynthesis proteins
Heme-binding PAS domain
HemN
a helical ferrodoxin
biotin synthase
ROO N-terminal domain-like
High potential iron protein
Heme-binding PAS domain
MoCo biosynthesis proteins
HemN
4Fe-4S ferredoxins
cytochrome c
a helical ferrodoxin
%
0.44 + 0.48
0.13 + 0.3
0.12 + 0.09
0.11 + 0.08
0.07 + 0.1
0.07 + 0.04
0.06 + 0.01
1.80 + 0.7
1.60 + 0.3
1.10 + 1.0
0.80 + 0.20
0.60 + 0.16
0.55 + 0.1
0.5 + 0.1
0.38 + 0.25
0.3 + 0.4
0.21 + 0.15
0.2 + 0.15
0.2 + 0.2
0.14 + 0.2
0.12 + 0.09
Fe-binding
heme
heme
heme
amino
amino
amino
heme
Fe-S
Fe-S
heme
Fe-S
Fe-S
Fe-S
amino
Fe-S
heme
Fe-S
Fe-S
Fe-S
heme
Fe-S
O2
yes
no
no
no
yes
yes
no
no
no
no
1
no
no
2
no
1
no
no
no
no
no
Overall percent of Fe bound by
Fe-S
heme
amino
21 + 9
47 + 19
32 + 12
68 + 12
13 + 14
19 + 6
47 + 11
22 + 12
31 + 16
1. Some, but not all, PAS domains actually sense oxygen
2. The Rubredoxin oxygen:oxidoreductase (ROO) protein does not contact oxygen, but catalyzes an oxygen reduction pathway
Dalhousie December 2007
Environmental Influence
38
e- Transfer Proteins
Same Broad Function, Same Metal, Different Chemistry
Induced by the Environment?
Fe-S clusters
Cytochromes
Fe bound by S
Fe bound by heme (and
amino-acids)
Cluster held in place by Cys
Generally negative reduction
potentials
Generally positive reduction
potentials
Less susceptible to oxidation
Very susceptible to oxidation
Environmental Influence
Dalhousie December 2007
39
Agenda
•
•
•
•
•
Why is protein structure useful?
Tree construction using protein structure
One protein superfamily in more detail
Environmental Influence
On-going work
– The role of calcium over time
– Applying structural domain combinations
– Co-evolution of kinases and phosphatases
Dalhousie December 2007
40
The Role of Calcium
• Calcium concentrations have not fluctuated over
evolutionary time scales to the same degree as
iron and zinc
• Low diffusion rate and rapid kinetics
• Calcium important for maintaining cell structure
• Calcium became a very important signaling
molecule in multi-cellular organisms
The Role of Calcium
Dalhousie December 2007
41
Calcium – Positive Selection
Across All Superkingdoms
Large number of arylsulfatases
Figure 1. Power law scaling for calcium binding domains. The abundances of Ca binding domains in Archaea, Bacteria and Eukaryotes are plotted against the total
number of structural domains in a proteome. The powerlaw equations and R2 value, which describe the slope of the line and the quality of the power law fit
respectively, are included next to the corresponding line label. The circled point represents Rhodopirellula baltica.
The Role of Calcium
Dalhousie December 2007
42
Calcium – Uni vs. Multi Cellular
Figure 4. Diversity plot of calcium binding proteins across the three domains of life and between Unicellular (Uni) and Multcellular (Multi)
Eukaryotes. The x-axis is unlabelled as the FF represented by each tick mark changes depending on the Superkingdom
The Role of Calcium
Dalhousie December 2007
43
Structural Domain Combinations
• Definition
– Compact, spatially distinct
– Fold in isolation
– Recurrence
• Importance
– Understand the structure
and function of the whole
protein
Structural Domain Combinations
Dalhousie December 2007
44
Domain Trees Might Provide Insights into
Horizontal Gene Transfer
Chlamydiales
Alveolata
Rhodophyta
Cyanobacteria
Metazoa
Actinobacteria
Exists only in Cyanobacteria
Exists in only one red algae in Eukaryotes
a.1.1.3: phycocyanin-like phycobilisome proteins
A light harvesting antennae of photosystem II
Structural Domain Combinations
Dalhousie December 2007
45
Protein Kinases and
Phosphatases
•
Protein kinases and phosphatases
are components of numerous
signal transduction pathways
•
They are responsible for
regulating many cellular
processes
•
Implicated in many cancers and
diseases
•
Comprise a significant portion of
genomes
– At least 518 protein kinase genes
– At least 107 protein tyrosine
phosphatase genes
•
Alonso et al. Cell. 2004 Jun
11;117(6):699-711
Co-evolution – Kinases
and Phosphatases
Manning, et al. (2002) Science 298:1912-1934
Dalhousie December 2007
46
Example: ADF/Cofilin
• The Cofilin/ADF (actin depolymerizing factor)
family remodels the actin filaments of the
cytoskeleton
• They sever actin filaments and increase the rate
that monomers leave the filament’s pointed end
• Cofilin/ADF proteins are phosphorylated at a
conserved N-terminal serine (Ser3)
• When phosphorylated, cofilin/ADF is unable to
bind actin, and is thus inactive
• When dephosphorylated, cofilin/ADF can bind and
depolymerize actin
Co-evolution – Kinases
and Phosphatases
Dalhousie December 2007
47
Phosphorylation and
Dephosphorylation of ADF/Cofilin
• Two serine/threonine kinase families can
phosphorylate (deactivate) ADF/cofilin
– LIMK
– TESK
• Two phosphatase families have been
identified that dephosphorylate ADF/Cofilin
– Slingshot (SSH) phosphatases
– Chronophin (CIN)
Co-evolution – Kinases
and Phosphatases
Dalhousie December 2007
48
Coordinated Divergence
• Slingshot phosphatase and
TESK and LIMK protein kinase
families appear to have
emerged at same point in
eukaryotic tree
• They also underwent an
apparent gene duplication at
the same time (after Ciona
divergence)
• Can point of divergence be
more accurately pinpointed as
more organisms are
sequenced?
Emergence
Gene Duplication
Co-evolution – Kinases
and Phosphatases
Dalhousie December 2007
49
Parting Comments
•
Structure plays a useful role at various levels
of detail in the study of evolution
•
Much of the data used here are sitting on the
Web for anyone to apply
•
Perhaps we should do more to train students
in both the life sciences and the earth
sciences?
Dalhousie December 2007
50
Parting Comments
• The reductionism used here seems useful, but
there is a growing sense that protein structure
represents more of a continuum – perhaps
composed of unique fragments at the sub-fold
level – The Russian Doll effect
• Evidence is growing that proteins from different
superfamilies may share a functional site but
nothing else – does this speak to a very distant
evolutionary relationship?
Dalhousie December 2007
51
Acknowledgements
• Kristine Briedis
• Andrew Butcher
• Russ Doolittle
• Chris Dupont
• Eric Scheeff
• Song Yang
•The Whole Group
• NSF & NIH
Support Open Access – All the work here does
Dalhousie December 2007
52
Backpocket
Dalhousie December 2007
53
The importance of “small class”
Zn folds to Eukarya
Total “small class” Zn
binding domains
10000
B
A
Eukarya
30/53
18/28
1000
5/53
0/28
100
Bacteria
0/53
0/28
10
7/53
0/28
0/53
0/28
11/53
9/28
Archaea
0/53
1/28
1
100
1000
10000
100000
Distribution of 53 unique
small class Zn families
Total number of domains
in a proteomes
Bacteria
Archaea
Eukarya
1
Oxygen
0
1.00E-08
Zinc
1.00E-12
Dalhousie December 2007
Chapter 4 Environmental Influence
1.00E-16
1.00E-20
1.00E-06
Iron
Concentration
(O2 in arbitrary units, Zn and
0.5
54
Conclusions
•
•
•
•
Metallomes have diverse compositions, yet the
total abundances conform to evolutionary
constants
These constants exhibit Superkingdom-specific
differences consistent with ancient changes in
geochemistry, a hypothesis further supported by
the roles of Zn and Fe
These results provide genomic-based evidence for
the theory of Anbar and Knoll that Eukaryotic
diversification and oxygen-related changes in trace
metal chemistry are linked
Prokaryotes likely diverged in anoxic
environments, while Eukaryotes diverged in oxic
environments (supported by the fossil records)
Dalhousie December 2007
55
Possible Flaws in the Argument
Proteome Coverage: Currently only 40%
of Eukaryotes and 55% of Prokaryotes
are covered by structural families –
Estimate that 90% of the unannotated
space is covered by existing families
Dalhousie December 2007
56
Possible Flaws in the Argument
Genome Bias – there is a disproportionate number of
thermophiles among Archaea, whereas the
Eukaryotes are almost entirely aerobic
Bacteria have a better distribution
The dataset does include the Eukaryotic anaerobic
amitochondritic parasite Encephalitozoon cuniculi,
which has metallomic features typical of aerobic
Eukaryotes
Principal component analysis shows oxygen tolerance
and environment have little effect upon the trends
observed. Phylogeny groupings are apparent
however (suggests vertical inheritance)
Dalhousie December 2007
57
Possible Flaws in the Argument
• Zn concentrations are associated solely
with increased complexity – not the
environment
– Eukaryotes of varying complexity follow the
same power law
– Zn finger abundance not consistent with
complexity
– 3 Zn superfamilies found in Prokaryotes and
Eukaryotes are more abundant across all
Eukaryotes
Dalhousie December 2007
58
Manual Annotation of SCOP
(1.68) Superfamilies and Families
• 281 of the 1495 superfamilies have at
least one metal associated structure at the
domain level
• ~50% of the 281 metal associated
superfamilies are ambiguous; ~10% of the
families
• Zn associated superfamilies are the most
prevalent, followed by Fe, Cu, Mn, Co= Mo
= Ni
Dalhousie December 2007
59
Dupont, Briedis, Yang, Palenik, Bourne 2005 In preparation.
Thioredoxin FSF domains
1000
100
10
Bacteria
Archea
Eukaryotes
1
100
1000
10000
100000
Total domains
• Follows an orderly progression through evolution - domain
duplication events remain proportional to genome size
• Occasionally follow power law distribution
• Rough estimates of domain abundance e.g., thioredoxins =
~1% of “global” proteome
Dalhousie December 2007
60
All Fe-S FSF domains
1000
Bacteria
y = 4E-05x1.8193
2
R = 0.6911
100
Archaea
Eukaryotes
10
y = 0.0082x - 2.4099
2
R = 0.6004
Archaea (1-2% of the
proteome)
Bacteria (.7-.8%)
Eukaryotes (0.01-.05%)
1
Cytochrome c
100
Cytochrome c evolved after
Bacteria/Archaea split
10
1
cytochrome p450
1000
Total domains
Proliferation of cytP450 in
Eukaryotes
100
10
1
1000
10000
total domains
100000
Dalhousie December 2007
61
Case study II: Fe vs. Zn
• From 4Mya to the present:
– Fe concentrations in the ocean have fallen
10,000 fold
– Zn concentrations have risen 10,000,000
fold
Dalhousie December 2007
62
Fe Binding
y = 0.0002x1.6711
R2 = 0.7846
1000
Fe domains
• 2-3% of Bacteria
and Archaea proteomes
are Fe-binding
y = 0.0805x0.7764
R2 = 0.6667
y = 0.0001x1.6317
R2 = 0.6998
100
Bacteria
Archaea
Eukaryotes
10
• 0.5-1.5% of Eukaryota
1
1000
10000
100000
total domains
Zn Binding
• 1.5-2.5% of Bacteria
and Archaea proteomes
are Zn-binding
• 4.5-5% of Eukaryota
Zn domains (+phosphotases)
10000
1000
100
10
0.5155
y = 1.0657x
R2 = 0.788
0.8044
y = 0.0935x
R2 = 0.8511
1.0281
y = 0.0349x
1
1000
Dalhousie
December 2007
2
R = 0.8464
10000
total domains
Bacteria
Archaea
Eukaryotes
63 100000
Zn Binding by Kingdom
Hard ligands: Asp, Glu, Ser, Tyr
Soft ligands: Cys, His
100%
Zn: Lewis acid
reactions to
informational
systems (Zn
fingers are >60%
of Zn containing
superfamilies in
Eukaryotes!)
90%
80%
70%
Soft ligands
only
hard and
soft ligands
60%
50%
40%
30%
20%
10%
0%
Dalhousie December 2007
Archaea
Bacteria
Eukaryotes
64
Future Work
• Ca concentrations have also changed
dramatically – is this evident in modern
proteomes and if so what are the
evolutionary implications?
• Proteins associated with the nervous
system – 9% before a rapid expansion .5
Mya – around the time of the TK transition
• c.19 ubiquitous Mg binding
• Evolution of photosynthesis
Dalhousie December 2007
65