Transcript Slide 1
Evolution of the Genetic Code:
Before and After the LUCA
1. The genetic code evolved to its canonical form before the Last
Universal Common Ancestor of Archaea, Bacteria and
Eukaryotes - >3 billion years ago. It appears to be highly
optimized. How did it get to be this way?
2. Numerous small changes have occurred to the canonical code
since then. What is the mechanism of codon reassignment?
Codon Reassignment – The Genetic code is variable in mitochondria
(and also some cases of other types of genomes)
Second Position
F
i
r
s
t
P
o
s
i
t
i
o
n
U
C
A
G
U
C
A
G
Third
Pos.
F
F
L
L
S
S
S
S
Y
Y
Stop
Stop
C
C
Stop
W
U
C
A
G
L
L
L
L
P
P
P
P
H
H
Q
Q
R
R
R
R
U
C
A
G
CUN Leu to Thr
I
I
I
M
T
T
T
T
N
N
K
K
S
S
R
R
U
C
A
G
AGR Arg to Ser to Stop/Gly
V
V
V
V
A
A
A
A
D
D
E
E
G
G
G
G
U
C
A
G
UGA Stop to Trp
AUA Ile to Met
CGN Arg to unassigned
etc.....
But how can this happen? It should be disadvantageous.
Reassignments
in Metazoa
Porifera
Cnidaria
Arthropoda
Nematoda
Lophotrochozoa
Loss of tRNA-Ile(CAU)
but AUA remains Ile
Loss of tRNA-Arg(UCU)
and AGR : Arg -> Ser
Loss of many tRNAs +
import from cytoplasm
Platyhelminthes
Echinodermata
Hemichordata
AUA : Ile -> Met
AGR : Ser -> Stop
Urochordata
AGR : Ser -> Gly
AAA : Lys -> Asn
AAA : Lys -> unassigned
Cephalochordata
Craniata
Example 1: AUA was reassigned from Ile to Met during the early evolution of the
mitochondrial genome.
Before Codon Anticodon
Ile
Ile
Ile
Met
AUU
AUC
AUA
GAU
k2CAU
AUG
CAU
Codon
Anticodon
Ile
Ile
AUU
AUC
GAU
Met
Met
AUA
AUG
UAU or
f5CAU
After
Notes
G in the wobble position of the tRNA-Ile can pair with U
and C in the third codon position
Bacteria and some protist mitochondria possess another
tRNA-Ile with a modified base that translates AUA only.
The tRNA-Met translates AUG only.
Notes
In animal mitochondria the k2CAU tRNA has been
deleted.
There is a gain of function of the tRNA-Met by a mutation
or a base modification
Example 2: UGA was reassigned from Stop to Trp many times
(12 times in mitochondria).
Before
Codon
Anticodon
Notes
Stop
UGA
RF
Release Factor recognizes UGA codon.
Trp
UGG
CCA
Normal tRNA-Trp translates only UGG
codons.
After
Codon
Anticodon
Trp
Trp
UGA
UGG
UCA
Notes
In animal mitochondria (and elsewhere)
there is a gain of function of the tRNA-Trp
via mutation or base modification so that it
translates both UGG and UGA.
The GAIN-LOSS framework
(Sengupta & Higgs, Genetics 2005)
LOSS = deletion or loss of function of a tRNA or RF
GAIN = gain of a new tRNA or a gain of function of an existing one.
GAIN
Ambiguous codon.
Selective disadvantage.
New Code.
Selective disadvantage
because codons are used
in wrong places
Initial Code.
No Problem.
LOSS
LOSS
Unassigned codon.
Selective disadvantage.
Note – the strength of the selective
disadvantage depends on the number of
times the codon is used. There is no
disadvantage if the codon disappears.
GAIN
Mutations in coding
sequences
New Code.
Codons now used in right places.
No Problem.
Four possible mechanisms of codon reassignment.
1. Codon Disappearance - The codon disappears. The order of the gain and
loss is irrelevant.
For the other three mechanisms the codon does not disappear.
2. Ambiguous Intermediate – The gain happens before the loss. There is a
period when the gain is fixed in the population and translation is
ambiguous.
3. Unassigned Codon – The loss happens before the gain. There is a period
when the loss is fixed in the population and the codon is unassigned.
4. Compensatory Change – The gain and loss are fixed in the population
simultaneously (although they do not arise at the same time). There is
no intermediate period between the old and the new codes. - cf. theory
of compensatory substitutions in RNA helices.
Sengupta & Higgs (2005) showed that all four mechanisms work in a
population genetics simulation
Summary of Codon Reassignments in Mitochondria
Codon
reassignment
Can this be
explained by
GCAU mutation
pressure?
No.
of
times
Change
in No.
of
tRNAs
Is
mispairing
important?
Mechanism
UAG: Stop Leu
2
G A at 3rd pos.
+1
No
CD
UAG: Stop Ala
1
G A at 3rd pos.
+1
No
CD
0
Possibly. CA
at 3rd pos.
CD
UGA: Stop Trp
12
G A at 2nd pos.
CUN: Leu Thr
1
C U at 1st pos.
0
No
CD
CGN: Arg Unass
5
C A at 1st pos.
-1
No
CD
AUA: Ile Met
or Unassigned
3 / 5
-1
Yes. GA at
3rd pos.
UC
0
Yes. GA at
3rd pos.
AI
0
Possibly. GA
at 3rd pos.
UC or AI
-1
Yes. GA at
3rd pos.
UC
AAA: Lys Asn
AAA: Lys Unass
AGR:
Arg Ser
2
1
1
No
No
No
No
AGR: Ser Stop
1
No
0
No
AI(b)
AGR: Ser Gly
1
No
+1
No
AI(b)
UUA: Leu Stop
1
No
0
No
UC or AI
UCA: Ser Stop
1
No
0
No
UC or AI
CD mechanism explains disappearance of stop codons because they are
rare initially. Only a few examples of CD for sense codons. UC and AI are
important for sense codons.
Three examples in yeasts (Mutation pressure GC to AU)
CUN is rare (replaced by UUR)
Second Position
F
i
r
s
t
P
o
s
i
t
i
o
n
U
C
A
G
U
F
F
L
L
S
S
S
S
Y
Y
Stop
Stop
C
C
Stop
W
U
C
A
G
C
L
L
L
L
P
P
P
P
H
H
Q
Q
R
R
R
R
U
C
A
G
I
I
I
M
T
T
T
T
N
N
K
K
S
S
R
R
U
C
A
G
V
V
V
V
A
A
A
A
D
D
E
E
G
U
C
A
G
A
G
Third
Pos.
G
G
G
CUN Leu to Thr
CGN is rare (replaced by AGR)
CGN Arg codons become
unassigned.
AUA and AUU common and
AUC is rare
Nevertheless AUA is reassigned
to Met. Codon does not disappear
Leu and Arg codons in yeasts
Codon Disappearance causes reassignments
Leu Leu
CUN UUR
Arg
CGN
Arg
AGR
S
53
192
7
33
Y.
44
618
0**
75
C
3
279
12
29
C
132
397
47
26
C
66
547
39
45
P
25
714
18
67
K
0
286
0**
48
C
11*
294
1**
45
S
33*
333
7
49
S
19*
274
0**
40
S
22*
300
0**
46
* CUN = Thr. Unusual tRNA-Thr present instead of tRNA-Leu
** CGN = unassigned.
tRNA-Arg is deleted
AUA Ile to Met in Yeasts
codon
anticodon
AUU Ile
GUA
AUC Ile
“
AUA Ile
K2CAU
AUG Met CAU
Codon Usage
AUU AUC AUA AUG AUA is
J
133
40
32
48 Ile
O
161
34
0
57 Absent
P
113
39
49
51 Ile
tRNA
K2CAU
none
K2CAU
AUU AUC AUA AUG
119 81 229 100 Ile
303 32 193 117 Ile
274 18 562 105 Ile
213 16
7 63 ?
207 21 16 73 Met
239 31 60 73 Met
203
7 101 56 Met
218 11 95 70 Met
K2CAU
K2CAU
K2CAU
none
C*AU
C*AU
C*AU
C*AU
C
C
P
K
C
S
S
S
Evolution of the canonical code - Before the LUCA
The canonical code seems to be optimized to reduce the effects of
translational and mutational errors.
Neighbouring codons code for similar amino acids.
5
7
C LI F WM Y V PT A SG HQ
9
R
11
NK
Woese’s polar requirement scale
Measure difference between amino acid properties by
how far apart they are on this scale.
13
E
D
Principal Component Analysis Projects the 8-d space into the two
‘most important’ dimensions.
Big
Small
Hydrophobic
Hydrophilic
Cost function g(a,b) for replacing amino acid a by amino acid b
e.g. difference in Polar Requirement
E rij g (ai , a j ) / rij
i
j
i
j
rij = rate of mistaking codon i for codon j
= 1 for single position mistakes, 0 otherwise
E = measure of error associated with a code
Phe
UUU
Ser
UCU
Phe
UUC
Ser
UCC
Leu
UUA
Ser
UCA
Leu
UUG
Ser
UCG
Le
u
CUU
Pro
CCU
CUC
Pro
CCC
T
y
r
UAU
UAC
UGC
T
y
r
UAA
UGA
UAG
UGG
Le
u
Hi
*s
CAU
CAC
Hn
CUA
eu
CUG
Ile
AUU
Ile
AUC
Ile
AUA
Met
AUG
Val
GUU
Val
GUC
Val
GUA
Val
GUG
Pro
Pro
CCA
CCG
ACU
ACC
ACA
ACG
Ala
GCU
Ala
GCC
Ala
GCA
Ala
GCG
CAA
CGC
A
g
CGA
As
n
AAU
Ser
AGU
AAC
Ser
AGC
AAA
Arg
AGA
Ly
s
AAG
Arg
AGG
As
p
GAU
GGU
GAC
GGC
GAA
GGA
GAG
GGG
Ay
s
As
p
G
u
one in a million codes is
better (Freeland and
Hurst)
E
CGU
CAG
f ~ 10-6
f
A
rg
Gl
n
E is smaller for the canonical code than for almost all random codes.
Ereal
UGU
*
Generate random codes by permuting the 20 amino acids in the code table
p(E)
C
p
CGG
The statistical argument shows that the code is highly non-random but it does
not explain how the code evolved to be that way. Need a step-by-step
evolutionary argument that leads from a proposed first stage of the code to
today’s code.
Random permutations – Not Possible
Random swaps – seems unlikely
The earliest code probably had few amino acids. Which were the first?
Selection acts when new amino acids are added.
Phe
UUU
Ser
UCU
Phe
UUC
Ser
UCC
Leu
UUA
Ser
UCA
Leu
UUG
Ser
UCG
Le
u
CUU
Pro
CCU
CUC
Pro
CCC
T
y
r
UAU
T
y
r
UAA
UGA
UAG
UGG
C
p
UAC
UGU
UGC
*
Le
u
CAU
CAC
Hn
CUA
eu
CUG
Ile
AUU
Ile
AUC
Pro
Pro
CCA
CCG
ACU
ACC
ACA
Ile
AUA
Met
AUG
Val
GUU
Ala
GCU
Val
GUC
Ala
GCC
Val
GUA
Ala
GUG
Ala
Val
Hi
*s
ACG
GCA
GCG
CAA
A
rg
CGU
CGC
A
g
CGA
Gl
n
CAG
As
n
AAU
Ser
AGU
AAC
Ser
AGC
AAA
Arg
AGA
Ly
s
AAG
Arg
AGG
As
p
GAU
GGU
GAC
GGC
GAA
GGA
GAG
GGG
Ay
s
As
p
G
u
CGG
Time scale for the origin of life
The origin of the genetic code is the end of the RNA World
Dating of
rocks and
meteorites
What preceded RNA?
Another polymer?
Metabolism only?
Last oceanvaporizing impact.
Lunar craters
Microfossil evidence
Stromatolites.
Phylogenetic methods
(divergence after LUCA)
Isotopic evidence
for life
Prebiotic synthesis of organic molecules
Miller-Urey experiment (1953)
Began with a mixture of CH4 , NH3, H2O and H2.
Energy source = electric spark or UV light.
Obtained 10 amino acids.
Atmospheres and Chemistry
reducing: CH4 , NH3, H2O, H2.
or CO2, N2, H2 or CO, N2, H2
There is hydrogen gas and/or hydrogen is present combined
with other elements (methane, ammonia, water)
neutral: CO or CO2 , N2 , H2O
no hydrogen or oxygen gas
oxidizing: O2, CO2, N2
oxygen gas present
Prebiotic chemists favour reducing atmospheres.
Yields in Miller-Urey exp are higher and more diverse in reducing
than in neutral atmospheres. Doesn’t work in oxidizing atmosphere.
Planetary Atmospheres
Major element in universe is H (big bang) so doesn’t it make sense
that atmosphere was reducing?
Jupiter retains original mixture: H2, He + small amounts CH4, NH3, H2O
Smaller planets lose H2
New atmosphere created by outgassing from interior
Geologists & Astronomers favour an intermediate atmosphere.
(i) Venus - 64 Earth atmospheres pressure! Mostly CO2 and N2
(ii) Carbonates in sedimentary rocks on Earth suggest previously lots
of CO2
So maybe Miller and Urey were wrong?
:-(
Current Earth: Mostly N2, O2 + small amounts of CO2 H2O – changed by life.
Mars: very low pressure – mostly CO2 and N2
Alternative suggestion – Hydrothermal vents
Sea water passes through vents.
Heated to 350o C. Cools to 2o C in surrounding ocean.
Supply of H2 H2S etc.
Fierce debate as to whether these conditions favour formation or
breakup of organic molecules (Miller & Lazcano, 1995)
Organic compounds in meteorites
Most widely studied meteorite is the Murchison meteorite. Fell in
Australia in 1969. Carbonaceous chondrite.
Contained both biological and non-biological amino acids
Both optical isomers (later shown to be not quite equal)
Compounds are not contamination
Just about all the building block molecules have now been found in
carbonaceous meteorites (Sephton, 2002).
Astrochemistry: molecular clouds; icy grains; parent bodies of
meteorites....
Delivery by: dust particles; meteorites; comets....
Was external delivery an important source of organic molecules?
The earliest code probably had few amino acids. Which were the first?
Selection acts when new amino acids are added.
Phe
UUU
Ser
UCU
Phe
UUC
Ser
UCC
Leu
UUA
Ser
UCA
Leu
UUG
Ser
UCG
Le
u
CUU
Pro
CCU
CUC
Pro
CCC
T
y
r
UAU
UAC
UGC
T
y
r
UAA
UGA
UAG
UGG
C
p
UGU
*
Le
u
CUA
Pro
CCA
eu
CUG
Pro
CCG
Ile
AUU
Ile
AUC
Hi
*s
CAU
CAC
Hn
ACU
ACC
ACA
Ile
AUA
Met
AUG
Val
GUU
Ala
GCU
Val
GUC
Ala
GCC
Val
GUA
Ala
Val
GUG
Ala
ACG
GCA
GCG
CAA
A
rg
CGU
CGC
A
g
CGA
Gl
n
CAG
As
n
AAU
Ser
AGU
AAC
Ser
AGC
AAA
Arg
AGA
Ly
s
AAG
Arg
AGG
As
p
GAU
GGU
GAC
GGC
GAA
GGA
GAG
GGG
Ay
s
As
p
G
u
CGG
Prebiotic Synthesis of amino acids
Higgs and Pudritz (2009) Astrobiology
Amino acids are found in
• Meteorites
• Atmospheric chemistry experiments (Miller-Urey)
• Hydrothermal synthesis
• Icy dust grains in space
Rank amino acids in order of decreasing frequency in 12 observations.
Derive ranking.
Comparison of amino acid frequencies produced non-biologically
Gly
Ala
Asp
Glu
Val
Ser
Ile
Leu
Pro
Thr
Miller
1.000
1.795
0.077
0.018
0.044
0.011
0.011
0.026
0.003
0.002
Murchison Yamato
1.00
1.000
0.34
0.380
0.19
0.035
0.40
0.110
0.19
0.100
0.003
0.13
0.060
0.04
0.035
0.29
0.003
Ice Exp.
1.000
0.293
0.022
0.012
0.072
concentrations
normalized
relative to Gly
0.001
10 amino acids are found in the Miller-Urey experiments. Very similar ones are
also found in meteorites, an Ice grain analogue experiment, and other places.
These are ‘early’ amino acids that were available for use by the first organisms.
GADEVSILPT
The other 10 are not seen. These are late amino acids that were only used when
organisms evolved a means of synthesizing them biochemically.
KRHFQNYWCM
The earliest amino acids are
those that are cheapest to
form thermodynamically
Positions of early and late amino acids....
What does this mean?
Second Position
F
i
r
s
t
P
o
s
i
t
i
o
n
U
C
A
G
U
FF
FF
L
L
S
S
S
S
Y
Y
Stop
Stop
C
C
Stop
W
U
C
A
G
C
L
L
L
L
P
P
P
P
H
H
Q
Q
R
R
R
R
U
C
A
G
I
I
I
MM
T
T
T
T
N
N
K
K
S
S
R
R
U
C
A
G
V
V
V
V
A
A
A
A
D
D
E
E
G
G
G
G
U
C
A
G
A
G
Third
Pos.
Maybe only 2nd position was
relevant initially.
Late amino acids took over
codons previously assigned
to amino acids with similar
properties.
Propose that the four earliest amino
acids were Val, Ala, Asp, Gly
U
C
A
G
U
C
U
A
G
U
C
C
Val
A
Ala
Asp
Gly
A
G
U
C
A
G
G
U
C
A
G
Four column code.
(Higgs Biol. Direct. 2009)
This is a triplet code but only the
second base means anything.
The second base is the most
important for codon-anticodon
recognition.
Unlikely to make a mistake at second
position.
All first and third position mistakes are
synonymous.
Code structure after addition of the 10 early amino acids.
.
Add new amino acids in positions that were formerly occupied by amino
acids with similar properties.
This minimizes disruption to existing gene sequences.
Summary of my argument Selection acts at the time of addition of new amino acids to the code. The
new amino acid is assigned to codons that formerly coded for an amino acid
with similar properties. This minimizes disruption to existing genes.
The result is that codons in the same columns end up assigned to amino
acids with similar properties. The column structure is retained from the
earliest code.
Hence the code appears to minimize translational error with respect to
randomly reshuffled codes, even though translational error was not the main
factor being selected.
Pathways of amino acid synthesis in modern organisms (from Di Giulio 2008)
Other points –
Column structure suggests that translational errors were more important
than mutational errors (tRNA structure/RNA world)
Precursor-product pairs tend to be neighbours (but doubts over
statistical significance). Maybe late amino acids took over codons
previously assigned to their biochemical precursors.
Direct chemical interactions between RNA motifs and amino acids
(“stereochemical theory”). In vitro selection experiments suggest
binding sites of aptamers preferentially contain codon and anticodon
sequences.
RNA World
First hypothesis:
There was a stage of evolution at when RNA molecules performed
both genetic and catalytic roles. DNA later took over the genetic role
and proteins took over the catalytic role.
Almost certainly true
Translation depends on RNA:
mRNA supplies the information for protein synthesis.
Active ingredient of the ribosome is rRNA – 3d structures
show site of peptidyl transferase reaction. Proteins probably
added as a late addition to the ribosome.
tRNAs also essential for translation.
Second hypothesis:
The RNA world arose de novo in the form of self replicating
ribozymes.
The jury is still out
RNA world idea originated in 60’s as a theoretical solution to the
chicken and egg problem of DNA and proteins.
Self-splicing introns. First RNA catalysts to be discovered. Tom
Cech (1982).
‘RNA World’ term coined by Walter Gilbert (1986).
Example of an RNA catalyst
Hammerhead ribozyme
Cleaves RNA at a specific
point.
Rolling circle mechanism
of replication of virus-like
RNAs in plants. Chops
long strand into pieces.
What can ribozymes do?
Ligases
E’
A B E
T. A. Lincoln, G. F. Joyce, Science 323, 1229 (2009)
An Autocatalytic Set Made from
Ligases
T. A. Lincoln, G. F. Joyce, Self-Sustained
Replication of an RNA Enzyme, Science 323,
1229, (2009)
Given a supply of A, B, A’, B’, the E and E’ make
more of themselves.
E'
A B
E
E
A ' B '
E'
What can ribozymes do?
Recombinases
E.J. Hayden, G.v. Kiedrowski & N.
Lehman, Angew. Chem. Int. Edit.
(2008) 120, 8552
Catalyst is autocatalytic given a supply of W X
Y Z.
The non-covalent assembly is also a catalyst.
What can ribozymes do?
Polymerases
Black +Blue – ribozyme
Red – template
Orange – primer
Primer extended by up
to 14 nucleotides
Johnstone et al. (2001) Science
Gradual improvement of Polymerases in the lab
Wochner et al. (2011) Science
- up to 95 nucleotides
What can ribozymes do?
Nucleotide Synthetases
Unrau and Bartel, (1998) Nature
An RNA organism must have had
a metabolism.
Hypothetical pathway for RNA
catalyzed RNA synthesis (Joyce)
Synthesis of nucleosides
Phosphorylation
Generation of NTPs
Creation of activated nucleotides
Stepwise polymerization
Clutter of RNA synthesis (Joyce)
Why is this particular set of monomers used for
nucleic acids?
How is this set synthesized specifically?
Where is the chemistry occurring? Earth, or space? Hydrothermal vents?
A new route to Pyrimidine ribonucleotide assembly.
MW Powner et al. Nature 459, 239-242 (2009) doi:10.1038/nature08013
Previously assumed synthesis of -ribocytidine-2',3'-cyclic phosphate 1 (blue; note the failure of the step in which
cytosine 3 and ribose 4 are proposed to condense together) and the successful new synthesis described here
(green). p, pyranose; f, furanose.
Chemical synthesis of monomers and polymers must have occurred
before the origin of ribozymes.
Ferris (2002) Orig. Life Evol. Biosph.
Montmorillonite catalyzed synthesis of RNA oligonucleotides (30-50 mers)
Rajamani et al. (2008) Orig. Life Evol. Biosph.
Lipid assisted synthesis of RNA-like polymers from mononucleotides
Costanzo et al. (2009) J. Biol. Chem.
Synthesis of long RNA strands from cyclic nucleotides in water
Rajamani et al. (2010) J. Am. Chem. Soc.
Measurements of error rates in non-enzymatic RNA replication
There are still some experimental issues…
But this is a logical necessity!
How could the RNA world
have got started?
Getting from chemistry to
biology….
RNA replicators must have
emerged from prebiotic
synthesis of random
sequences
Jump-starting the RNA World
Wu & Higgs (2009) J. Mol. Evol.
Synthesis
Precursors
Monomers
Ribozymes
Long
polymers
Polymerization
Activated
monomers
Short
polymers
Are there alternatives to RNA?
RNA
a – Threose Nucleic Acid – TNA
b – Peptide nucleic acid – PNA
c – Glycerol derived nucleic acid
d – Pyranosyl RNA
RNA hybridizes with other nucleic acids. Information is not lost.
DNA-RNA hybrids DNA takes over at end of RNA world.
Maybe TNA or PNA preceded the RNA world. Information passed to RNA.
Would need to show that the alternative was easier to synthesize than RNA.
Two scenarios from
Segré & Lancet (2000)
A – RNA first (strong RNA
world hypothesis)
B – Lipids first (lipid world
hypothesis –
compositional genomes –
metabolism without
genes)