Transcript Document

The response of amino acid frequencies to directional mutation pressure in mitochondrial genomes
is related to the physical properties of the amino acids and to the structure of the genetic code.
Daniel Urbina, Bin Tang, Paul G Higgs.
Department of Physics, McMaster University, Hamilton, Ontario L8S 4M1, Canada.
Aims of this project - Here we study the variation in frequency of DNA bases in the protein-coding regions of
mitochondrial genomes and the corresponding variation in frequency of amino acids in the proteins.
1
3
2
Directional mutation pressure in DNA – The rates of mutation between the four bases are usually not equal. This
causes a mutational pressure that drives the base frequencies away from 25%. If no selection acts on the DNA, base
frequencies will reach an equilibrium determined by mutation. The frequencies of bases at synonymous sites vary
enormously in mitochondrial genomes, indicating that mutation pressure varies in direction among species.
Response of amino acid frequencies – Mutation pressure will alter the frequency of usage of codons in gene
sequences. This will cause amino-acid substitutions in the proteins that will often be deleterious. Selection will
therefore oppose variation in the frequencies of bases and amino acids. In mitochondrial sequences, it is observed
that amino acid frequencies vary considerably in response to base frequency changes. Mutation pressure is thus
strong enough to drive amino acid frequencies away from their optimal values.
Influence of physical properties – Most observed amino acid substitutions are between amino acids with similar
physical properties. Selection acts less strongly against these changes because they have a smaller effect on protein
structure and function. Here we will show that the physical properties of the amino acids determine the degree to
which amino acid frequencies can respond to mutation pressure.
Image reproduced from http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/A/AnimalCells.html
Mitochondria are organelles inside eukaryotic cells. They possess their own
genomes that are distinct from the main genome in the nucleus. Typical animal
mitochondrial genomes contain 12 protein-coding genes, 2 rRNAs and 22 tRNAs.
Strand asymmetry - There is an asymmetry in replication of the two DNA strands in mitochondrial genomes. The
strands are subject to different mutational pressures and the base frequencies are not equal on the two strands. All the
data in this study refer to frequencies on the plus strand of the genome, which codes for the majority of genes.
This is the front page of OGRe, our relational database for the comparative analysis of mitochondrial
genomes. OGRe contains information on gene sequences, gene order and genome rearrangements.
Please visit OGRe on-line at http://ogre.mcmaster.ca
Vol
.
6
SECOND POSITION
T
4
F
I
R
S
T
T
C
P
O
S
I
T
I
O
N
A
G
C
TTT F 1
TTC F
A
G
THIRD
POSITION
TAT Y 10
TAC Y
TGT C 17
TGC C
TTA L 2
TTG L
TCT S
TCC S 6
TCA S
TCG S
TAA Stop
TAG Stop
TGA W 18
TGG W
T
C
A
G
CTT L
CTC L
CTA L
CTG L
CCT
CCC
CCA
CCG
P
P 7
P
P
CAT H 11
CAC H
CAA Q 12
CAG Q
CGT R
CGC R 19
CGA R
CGG R
T
C
A
G
T
T 8
T
T
AAT N 13
AAC N
AGT S 20
AGC S
ATA M 4
ATG M
ACT
ACC
ACA
ACG
AAA K 14
AAG K
AGA Stop
AGG Stop
T
C
A
G
GTT
GTC
GTA
GTG
GCT
GCC
GCA
GCG
A
A 9
A
A
GAT D 15
GAC D
GGT
GGC
GGA
GGG
T
C
A
G
ATT I
ATC I
3
V
V 5
V
V
GAA E 16
GAG E
G
G 21
G
G
This is the genetic code used in vertebrate mitochondrial DNA. It shows the
mapping between the 64 possible codons and the 20 possible amino acids. The
shaded boxes are four-codon families. Third-position sites in four-codon families
are synonymous (or fourfold degenerate). Base changes may occur at these sites
without influencing the amino acid. Hence, selection should be negligible (or at
least very weak). In contrast, most first and second position changes are nonsynonymous. Therefore selection should be significant at these sites.
7
5
Polarity
pI
Hyd.1
Hyd.2
Surface
Area
Fract.
Area
Ala
A
67
11.50
0.00
6.00
1.8
1.6
113
0.74
Arg
R
148
14.28
52.00
10.76
-4.5
-12.3
241
0.64
Asn
N
96
12.28
3.38
5.41
-3.5
-4.8
158
0.63
Asp
D
91
11.68
49.70
2.77
-3.5
-9.2
151
0.62
Cys
C
86
13.46
1.48
5.05
2.5
2.0
140
0.91
Gln
Q
114
14.45
3.53
5.65
-3.5
-4.1
189
0.62
Glu
E
109
13.57
49.90
3.22
-3.5
-8.2
183
0.62
Gly
G
48
3.40
0.00
5.97
-0.4
1.0
85
0.72
His
H
118
13.69
51.60
7.59
-3.2
-3.0
194
0.78
This is a table of 8 physical properties of amino
acids that are thought to influence protein folding
and function (Volume, Polarity, Hydrophobicity
etc.) Using Principal Component Analysis, we
projected this 8-dimensional space into 2d, so that
the similarities between the amino acids can be
clearly visualized.
The PCA plot shows that the amino acids FLIMV in the first column of the genetic code table form a tight cluster with
very similar physical properties. This is also true for SPTA in the second column. Most of the third-column amino acids
are fairly similar to one another. Surprisingly, the fourth-column amino acids are all very different. There is also no
particular similarity between amino acids in the same row of the genetic code.
For each of the 473 species in OGRe, we measured T4 (the frequency of T at the fourfold-degenerate
sites), and T1 and T2 (the frequency of T at first and second positions). T4 varies enormously due to
mutational pressure, from less than 10% to more than 90%. T1 and T2 vary almost linearly with T4,
but over a narrower range. This shows that both mutation and selection influence T1 and T2. By
fitting a mutation-selection model to the data, we can estimate the relative strength of mutation and
selection. The slope for T2 is less than for T1, which shows that selection against second-position
substitutions is stronger than that against first-position substitutions. Similar plots are also seen
for A, C and G.
On the left, we show the variation in the frequencies
of three amino acids in response to the variation in
T4. Serine shows a significant increase; threonine
shows a significant decrease; and alanine shows no
trend. The direction and magnitude of these trends is
influenced by mutations at all three codon positions.
This explains what we saw in part 5 – The similarity between amino acids in columns 1, 2 and 3 means that many
first-position substitutions are only weakly selected against, whereas the dissimilarity between amino acids in the same
row means that second-position substitutions are more strongly selected against.
9
8
Key point – The amino acids in the first two columns (numbers
1-9) have large slopes that may be either positive or negative,
i.e. they are responsive to mutational pressure. The amino acids
in the third and fourth columns (numbers 10-21) have slopes
close to zero, i.e. they are non-responsive.
Hypothesis – An amino acid will respond significantly to
mutational pressure at the DNA level if there are neighbouring
amino acids in the genetic code to which it can mutate that have
similar physical properties. If the neighbouring amino acids are
very different in properties, selection will oppose these
mutations, and the amino acid will not be responsive to
mutation pressure.
Responsiveness – We measured the slope
for each amino acid against each of the four
base frequencies for two independent data
sets (fish and mammals). We define the
responsiveness of an amino acid as the root
mean square value of these 8 slopes.
Proximity – We define the distance dij
between any pair of amino acids as the
euclidean distance between them in the 8d
physical property space (after normalizing
each property to have unit variance). We
then define the proximity of an amino acid
as the mean of 1/dij for all its neighbouring
amino acids (i.e. those accessible by a single
mutation in the DNA). A high-proximity
amino acid is one whose neighbours have
similar physical properties.
On the right, we show the slope of the linear regression of each
of the amino acid frequencies against each of the four base
frequencies. The amino acids are numbered in the order they
appear in the genetic code diagram (see part 4). Note that serine
has two separate blocks and is thus numbered twice.
Filled symbols are data points from fish genomes. Open
symbols are derived from a mutation-selection model.
The solid and dashed lines are linear regressions through
the data and theory points, respectively.
Bulk.
Result – The graph shows that there is a strong correlation between Proximity and Responsiveness (R = 0.86, p < 10-6). This
confirms the hypothesis in part 8, and means that physical properties have a direct influence on evolutionary properties.
Squares are data points from fish genomes.
Triangles are derived from a mutation-selection model.
Summary – The frequencies of bases and amino acids in mitochondrial genomes vary in a complex way due to the action of
directional mutation pressure on the DNA and stabilizing selection pressure on the protein sequences. Our model of mutationselection balance explains the trends seen in these frequencies (see 5, 7 and 8). We developed a measure of similarity between
amino acids that enabled us to make quantitative predictions about the responsiveness of the different amino acids to mutation
pressure (see 6 and 9). This work also reveals non-random patterns of similarity between neighbouring amino acids in the
genetic code that are of interest from the point of view of the evolution of the genetic code itself.