Codon-usage bias
Download
Report
Transcript Codon-usage bias
Codon Usage
Dan Graur
1
Because of the degeneracy of all
genetic codes, 18-20 amino acids are
encoded by more than one codon (2,
3, 4, or 6).
2
If synonymous mutations are strictly
neutral, they should be used
randomly as dictated by genomic
GC content.
3
Codon-usage bias
4
Measures of codon-usage bias
5
The relative synonymous codon usage (RSCU) is the
number of times a codon appears in a gene divided by the
number of expected occurrences under equal codon usage.
X
RSCU n i
i 1 X
n i
i1
n = number of synonymous codons (1 n 6) for the
amino acid under study, Xi = number of occurrences of
codon i.
If the synonymous codons of an amino acid are used with
equal frequencies, their RSCU values will equal 1.
6
The codon adaptation index (CAI) measures the degree
with which genes use preferred codons.
We first compile a table of RSCU values for highly
expressed genes. From this table, it is possible to identify
the codons that are most frequently used for each amino
acid. The relative adaptiveness of a codon (wi) is
computed as
RSCU
i
w
i RSCUmax
where RSCUmax = the RSCU value for the most
frequently used codon for an amino acid.
7
The CAI value for a gene is calculated as
the geometric mean of wi values for all the
codons used in that gene.
1
L
L
CAI w
i
i1
where L = number of codons.
8
The effective number of codons (ENC)
9
1
5
3
ENC 2
F F F F
2 3 4 6
where Fi (i = 2, 3, 4, or 6) is the average probability that
two randomly chosen codons for an amino acid with i
codons will be identical.
ENC values range from 20 (the number of amino acids),
which means that the bias is at a maximum, and only one
codon is used from each synonymous-codon group, to 61
(the number of sense codons), which indicates no codon9
usage bias.
Am in o Acid
Leucine
Valine
Isoleucine
P henylalanine
Escherichia
coli
C odon
Saccharomyces
cerevisae
High
Low
High
Low
UUA
1%
20%
8%
25%
UUG
1%
15%
89%
25%
CUU
2%
12%
0%
12%
CUC
3%
11%
0%
9%
CUA
1%
5%
3%
15%
CUG
92%
37%
0%
14%
GUU
60%
27%
52%
28%
GUC
2%
25%
48%
19%
GUA
28%
16%
0%
30%
GUG
10%
32%
0%
23%
AUU
16%
46%
42%
43%
AUC
84%
37%
58%
22%
AUA
0%
17%
0%
35%
UUU
17%
67%
10%
69%
UUC
83%
33%
90%
31% 10
Universal and species-specific
patterns of codon usage
11
The genome hypothesis
All genes in a genome tend to
have the same coding strategy.
That is, they employ the codon
catalog similarly and show
similar choices between
synonymous codons.
Different taxa have different
coding strategies.
Richard Grantham
12
Are there universal preferences?
There are NO universally preferred or universally avoided
codons.
There may be some universal preferences and avoidances as
far as codon neighbor pairs are concerned. For example, the
pair NNG GNN, where N stands for all four possible
nucleotides, seems to be preferred, while the pair NNG CNN
13
seems to be avoided.
Biases in synonymous codon usage can be
caused by:
(1) mutational biases
(2) selection favoring preferred codons
(3) purifying selection against disfavored
codons
14
Mutational Biases
If the unequal codon-usage is due to biases
in mutation patterns, then the expectation is
that the magnitude and the direction of the
bias will be more or less the same for all
codon families and for all genes,
regardless of function or expression levels.
15
Mutational Biases
Let us assume that the mutation pattern in an
organism tends to result in AT rich
sequences. Under such a mutational regime, it
is expected that all four-fold degenerate
codon families will exhibit a preference for
codons ending in A or T. Thus, the preferred
codons for valine should be GTA and GTT
and the preferred codons for arginine should
be CGA and CGT.
16
Mutational Biases
Some bacterial genomes (e.g., Mycoplasma
capricolum), exhibit this type of consistent
codon-usage bias.
Codon family
Amino acid
T/A in 3rd
(%)
G/C in 3rd
(%)
CU
LEU
93
7
GU
VAL
95
5
UC
SER
98
2
CC
PRO
95
5
AC
THR
98
2
GC
ALA
94
6
CG
ARG
100
0
GG
GLY
95
5
17
Mutational Biases
In Escherichia coli, there is no such consistent
bias.
Codon family
Amino acid
T/A in 3rd
(%)
G/C in 3rd
(%)
CU
LEU
20
80
GU
VAL
42
58
UC
SER
49
51
CC
PRO
36
64
AC
THR
34
66
GC
ALA
39
61
CG
ARG
47
53
GG
GLY
45
65
18
(2) positive selection favoring preferred
codons
(3) purifying selection against disfavored
codons
19
(2) positive selection is expected to accelerate the
rate of substitution
(3) purifying selection is expected to slow down the
rate of substitution
20
(2) positive selection is expected to accelerate the
rate of substitution
(3) purifying selection is expected to slow down the
rate of substitution
21
(2) positive selection is expected to accelerate the
rate of substitution
(3) purifying selection is expected to slow down the
rate of substitution
There is a negative
correlation between
codon usage bias and rate
of synonymous
substitution.
22
positive selection is expected to accelerate the rate
of substitution
purifying selection is expected to slow down the
rate of substitution
There is a negative
correlation between
codon usage bias and rate
of synonymous
substitution.
23
Two selective factors have been
convincingly invoked to explain
codon usage bias.
(1) translation optimization
(2) folding stability of the mRNA
24
The translation efficiency of a
codon is related to the relative
quantity of tRNA molecules that
recognize the particular codon.
25
26
Codon Usage
is
related
to
Translation
Efficiency
27
Toshimichi Ikemura
28
Is codon usage bias uniform along the length of the mRNA?
For many highly expressed genes, codons recognized by low
abundance tRNAs are overrepresented in the 5’ region of the coding
region. This pattern suggests that ribosomes translate more slowly
over the initial 50 codons or so (the so-called ramp stage) and then
translate the remainder of the mRNA at full speed.
29
What purpose does the ramp play in translation? Slowing translation
elongation immediately after initiation effectively generates more
uniform spacing between ribosomes further down the mRNA, which
prevents ribosome congestion and translation stalling and termination.
30
Another potential role for the ramp involves protein folding. The
length of the ramp corresponds well to the length of the polypeptide
needed to fill the exit tunnel of the ribosome, so the nascent peptide
chain should emerge from the ribosome as it transitions from the slow
ramp stage to the fast stage of elongation. This raises the possibility
that the slowdown in the ramp might increase the fraction of correctly
folded product.
31
Folding stability of the mRNA
RNA is synthesized as single strands of
ribonucleotides.
Intrastrand base pairing will produce twodimensional (2D) structures.
32
Folding stability of the mRNA
The stability of a secondary structure is
quantified as the amount of free energy
released or used to form it. Positive free
energy requires work to form a structure.
Negative free energy release stored work.
The more negative the free energy of a
structure, the more likely is formation of that
structure, because more stored energy is
released.
33
Folding stability of the mRNA
Free energies are additive, so one can
determine the total free energy of a
secondary structure by adding all the
component free energies.
local folding energy = ΔG.
along the mRNA sequence using a sliding
window of 30 nucleotides (nt) in length,
moving from the start codon to the
34
Folding stability of the mRNA
ΔG = Local free energy of a sequence.
Expectation = mean local free energy of
1000 permuted sequences.
ZΔG = A measure of the extent to which a
local ΔG value deviates from expectation.
35
Folding stability of the mRNA
A positive ZΔG means that local mRNA stability is reduced.
A negative ZΔG means that local mRNA stability is increased.
36
Codon arrangement along the mRNA
The arrangement of different codons along the length of the mRNA
influences translation efficiency.
In the autocorrelated pattern, when an amino acid recurs in the
protein, there is a strong propensity to use the same codon the
second time as that for the first occurrence of the amino acid.
In the anticorrelated pattern, when an amino acid recurs in the
protein, there is a strong tendency to use a different codon the
second time from that used in the first occurrence of the amino
acid.
37
38
Some organisms display
biased codon usage; others do
not.
Certain organisms, such as the bacterium Helicobacter
pylori and humans present little evidence of translational
selection, while others such as the bacterium Escherichia
coli, the yeast Saccharomyces cerevisae, the nematode
Caenorhabditis elegans, and the fly Drosophila
melanogaster, show a marked codon bias due to selection.
39
A possible solution was suggested by dos Reis et al.
(2004). dos Reis et al. (2004) discovered that tRNA-gene
redundancy and genome size are interacting forces in
determining translational selection and codon-usage bias.
They suggested that an optimal combination of these
factors exists for which the action of translational
selection is maximal.
40
The magnitude of selection was maximal in genomes 1-30
Mb in size that contain 150-600 tRNA specifying genes.
Both Helicobacter pylori and humans fall outside this
range.
The genome of Helicobacter pylori contains only 36
tRNA-coding genes (only one tRNA-gene having two
copies).
The haploid genome size of humans is approximately
3,500 Mb.
41
Subramanian S. 2008. Genetics 178:2429-2432