Transcript Ben Good.

Benjamin Good
March 17, 2008
The Genetic Code
 Codon
• Sequence constructed from 4 “letters” known as nucleotides
or bases, denoted “A”, “G”, “C”, “U” / ”T”
• These letters form fixed length “words” known as codons.
• Groups of codons form “sentences” which encode proteins.
The Genetic Code
• A given codon can either stand for a specific amino acid
or act as a “start/stop codon”, which signals either the
beginning or end of a protein’s code respectively.
• There are 4*4*4=64 different codons but only 20 amino
acids to code for, making a total of 21 different possible
meanings for a given codon (including start/stop).
• How are codons distributed among the 21 different
categories?
The Genetic Code
• The “Canonical Code”
• But why this arrangement and not another?
• Crick: Canonical code is a frozen artifact of a code that
was “good enough” to work
Why the canonical code?
• An alternative is that the canonical code
itself evolved to optimize for some
selected trait.
• Noting the connection between similar
codons and similar amino acids, several
researchers hypothesized that the
canonical code evolved to optimize
against copying/transcription errors.
The Polar Requirement
• Woese and Alf-Steinberger came up with a measure for
error susceptibility in genetic code based on
hydrophobicity.
• A given codon is subject to a single mutation. The polar
difference between the new amino acid and the old one
is calculated.
• The “error” resulting from the mutation is taken as the
distance squared (mean squared distance).
How Optimal is the Canonical Code?
• Unfortunately, Alf-Steinberger’s results have not been
reproducible.
• The first reproducible “test” of the polar requirement was
published by Haig and Hurst in 1991.
• Using this method, they calculated the total error for a
large sample of possible code assignments.
• Out of 10,000, only two
other codes had lower
error values than the
canonical code!
One in a million?
• Freeland and Hurst built upon H&H’s model to introduce
more realistic assumptions.
• Two types of code errors
possible: transition and
transversion.
• Introduced weighting for
two types of errors
because they are not
equally probable in nature.
• Also introduced bias towards
mistranslation rather than
mutation (higher rates of
errors in 1st and 3rd slots)
One in a million?
Weighted errors make the canonical
code even more optimized relative
to the rest.
Peak efficiency
Around w = 3
One in a million?
Out of a sample
of 1,000,000
random codes,
only 1 had a
lower error value
than the CC!
It was relatively
far away in
search space,
but behaved
similarly to CC.
Beyond the Polar Requirement
• In the paper we read for class, Freeland and Hurst
question previous studies (including their own).
• Is the polar requirement a biased measurement?
• Is using the (W)MSD a biased measurement?
• Some biosynthetic acids might be tied to particular
codons, so code space could be artificially symmetric.
Proposed a new measurement based on PAM matrices,
which measure the “similarity” of two amino acids on a
functional level.
Beyond the Polar Requirement
General error metric:
A code’s total error =
ei is the physical error resulting
from substitution i
αi is the number of
transition errors leading
to substitution i.
i.e. U ↔ C,A↔G
PAM matrix
βi is the number of
transversion errors leading
to substitution i.
i.e. U,C ↔ A,G
Polar requirement
Beyond the Polar Requirement
• Results:
PAM Matrix
Polar Requirement
Far from overturning the adaptive hypothesis, this new study showed the
canonical code to be even more optimized than previously thought!
Other optimizations…
• Studies of the assignment of stop codons found that the
canonical code is highly optimized against frameshift and
nonsense mutations. (S. Naumenko et al., 2008)
• Furthermore, these same optimizations against frame
shift errors allow the CC to be more efficient at encoding
parallel information on top of a protein coding sequence.
(Itzkovitz and Alon, 2007)
Is the canonical code optimized?
• YES!
• But many aspects are still unclear – e.g. a
mechanism for code selection.
• Conditions in precanonical times are still
relatively unknown and the canonical code
seems to be universally adhered to in
modern organisms.