Mutation - NIU Department of Biological Sciences
Mutation - NIU Department of Biological Sciences
Changes in DNA
Any change in the DNA sequence of an organism is a mutation.
Mutation is a decay force whose ultimate roots are in the second law of
thermodynamics (entropy). Living things survive inevitable mutations by a
combination of being tolerant of a certain level of mutation, repairing
mutational damage, killing cells that are mutated beyond repair, and relying
on natural selection to remove individuals with unfavorable mutations.
Mutations are the source of the altered versions of genes that provide the
raw material for evolution.
A central tenet of biology is that the flow of information from DNA to protein
is one way. DNA cannot be altered in a directed way by changing the
environment. Only random DNA changes occur.
Some terminology: the genotype is the organism’s genetic constitution, at
the bottom, the sequence of its DNA. The phenotype is the physical
characteristics of the organism: its appearance, biochemistry, reactions to
the environment, etc.
– before DNA sequencing, the genotype was deduced from the phenotypes of
parents and offspring.
– the point of genome annotation is to deduce the phenotype that will result from a
More Mutation Generalities
• Most mutations have no effect on the organism, especially among
the eukaryotes, because a large portion of the DNA is not in genes
and thus does not affect the organism’s phenotype.
• Even within genes, mutations can have little or no effect
– the genetic code is degenerate: some mutations ar translated into the
same amino acid
– many amino acid changes have little or no effect on protein function.
• Of the mutations that do affect the phenotype, the most common
effect of mutations is lethality, because most genes are necessary
• From a bioinformatics point of view, the three simplest types of
mutation: base substitution, small insertions and deletions (indels),
and simple sequence repeats, affect sequence alignment programs.
Larger mutations such as transposable element movements,
recombination-induced mutations, and general chromosome
rearrangements, affect large scale issues such as genomic maps.
Base Change Mutations
The simplest mutations are base changes,
where one base is converted to another.
(Also called “substitutions”, or “point
mutations”.) These can be classified as either:
--“transitions”, where one purine is changed to
another purine (A -> G, for example), or one
pyrimidine is changed to another pyrimidine (T > C, for example).
“transversions”, where a purine is substituted for
a pyrimidine, or a pyrimidine is substituted for a
purine. For example, A -> C.
Transitions are more common than
transversions, because they are easier to
create, and because transitions often have
less drastic effects than transversions.
Base change mutations are the cause of
single nucleotide polymorphisms (SNPs).
Mapping SNPs is the current best way to
locate human disease genes.
Base change mutations are the most common
mutations, and they are the easiest to handle
for statistics and evolutionary studies.
Base Change Causes
• Base changes occur naturally as errors in replication: the
wrong base gets inserted.
– DNA polymerase has an editing function that detects most
errors, then backs up, removes the wrong base and puts in the
– enzymes that replicate RNA don’t have the editing function, so
their error rate is 100 x that of DNA polymerase, causing the high
mutation rate of RNA viruses.
Various chemical changes in a base can cause
mutation. For instance, the spontaneous loss of the
amino group on cytosine converts it to uracil (which will
pair with A, not G).
• environmental chemicals that attach bulky groups onto
bases (alkylating agents) can cause the bases to be misread by DNA polymerase.
Phenotypic Effects of Base
Mutations can be classified according to their effects on the protein (or mRNA)
produced by the gene that is mutated.
1. Silent mutations (synonymous mutations). Since the genetic code is degenerate,
several codons produce the same amino acid. Especially, third base changes often
have no effect on the amino acid sequence of the protein. These mutations affect the
DNA but not the protein. Therefore they are called neutral mutations, mutations
which should have no effect on the organism’s phenotype.
2. Missense mutations. Missense mutations substitute one amino acid for another.
Some missense mutations have very large effects, while others have minimal or no
effect. It depends on where the mutation occurs in the protein’s structure, and how
big a change in the type of amino acid it is.
3. Nonsense mutations convert an amino acid into a stop codon. The effect is to
shorten the resulting protein. Sometimes this has only a little effect, as the ends of
proteins are often relatively unimportant to function. However, often nonsense
mutations result in completely non-functional proteins.
4. Sense mutations are the opposite of nonsense mutations. Here, a stop codon is
converted into an amino acid codon. Since DNA outside of protein-coding regions
contains an average of 3 stop codons per 64, the translation process usually stops
after producing a slightly longer protein.
Base changes can also affect RNA initiation, splicing and termination.
More on Substitution
In addition to synonymous
mutations, some amino acid
changes are “conservative” in
that they have little or no affect
on the protein’s function.
for example, isoleucine and
valine are both hydrophobic
and readily substitute for each
other amino acid substitutions
are very unlikely: leucine
(hydrophobic) for aspartic acid
(hydrophilic and charged). This
would be a non-conservative
Some amino acids play unique
roles: cysteines form disulfide
bridges, prolines induce kinks
in the chain, etc.
However, some amino acids
are critical fro active sites and
cannot be substituted.
Tables of substitution
frequencies for all pairs of
amino acids have been
BLOSUM62 Table. Numbers on the diagonal
indicate the likelihood of the amino acid
staying the same. The off-diagonal numbers
are relative substitution frequencies.
Another simple type of mutation is the gain
or loss of one or a few bases. These
mutations are called indels, which is short
– When comparing two species it isn’t easy to
tell whether an insertion occurred in one
species or a deletion occurred in the other.
Indels are thought to be generated when
the DNA polymerase slips forward or
backward on the template DNA it is
– This occurs most easily in repeated
sequences, but can occur anywhere.
A second cause of short indels is
chemical- or radiation-induced loss of the
base portion of the nucleotide. The DNA
polymerase often skips right over these
sugar/phosphate stumps, leaving a
missing base in the resulting DNA chain.
Frameshifts and Reversions
Translation occurs codon by codon,
examining nucleotides in groups of 3.
If a nucleotide or two is added or
removed, the groupings of the codons
is altered. This is a frameshift
mutation, where the reading frame of
the ribosome is altered.
Frameshift mutations result in all
amino acids downstream from the
mutation site being completely
different from wild type. These
proteins are generally non-functional.
A reversion is a second mutation that
reverse the effects of an initial
mutation, bringing the phenotype back
to wild type (or almost).
Frameshift mutations sometimes have
“second site reversions”, where a
second frameshift downstream from the
first frameshift reverses the effect.
Microsatellites/Simple Sequence Repeats
Two words for the same phenomenon.
During replication, DNA polymerase can “stutter” when it replicates several tandem
copies of a short sequence, say 2-5 bp.
Outside of genes, this effect produces useful genetic markers called SSR (simple
They are heavily used in genetic mapping, for several reasons.
For example, CAGCAGCAGCAG, 4 copies of CAG, will occasionally be converted to 3
copies or 5 copies by DNA polymerase stuttering.
They are easy to detect,
They are fairly stable across generations yet have a high enough mutation rate that many
alleles exist in the population.
They are found in many locations in the genome of all organisms.
Within a gene, this effect can cause certain amino acids to be repeated many times
within the protein. In some cases this causes disease
Huntington Disease. A dominant
autosomal disease, with most people
Onset usually in middle age.
Neurological: starts with irritability and
depression, includes fidgety behavior and
involuntary movement (chorea), followed
by psychosis and death.
Caused by CAG repeats within the coding
region, giving a tract of glutamines.
Below 28 copies is normal, between 28
and 34 copies is the premutation allele:
normal phenotype but unstable copy
number that puts the next generation at
risk. Above 34 copies gives the disease.
HD shows “anticipation”: the age of onset
gets earlier with every generation. This is
due to a direct correlation between copy
number and age of onset.
There is a genetic test for the disease, but
in the absence of effective treatment few
actually take the test.
Function of the protein remains unknown,
the excess glutamines may cause it to
aggregate and lose function.
Larger Scale Mutations
Larger mutations include insertion of whole new
sequences, often due to movements of transposable
elements in the DNA or to chromosome changes such
as inversions or translocations.
Deletions of large segments of DNA also occurs.
These phenomena affect the order of genes on the
– In classical genetics, synteny means that two genes are
on the same chromosome. This term has a slightly
different meaning in genomics and bioinformatics: that a
group of genes are in the same order on the chromosome
in different species.
– Synteny tends to be conserved in closely related species,
but breaks down in more distantly related species.
Also, the genes at the breakpoints of a large scale
mutation are often broken in half or otherwise
Transposable elements are DNA sequences that
move from place to place in the genome. Unlike
genes, transposable elements don’t have a fixed
location on the chromosome.
Transposable elements are essentially parasites. In
general they don’t contribute to the evolutionary
fitness of the organism.
Most of the genes in an organism are necessary, at
least under some circumstances, for the organism’s
survival. Genes avoid being destroyed by random
mutations because individuals with mutated genes
are less fit: don’t survive or reproduce as well as
Transposable elements avoid being destroyed by
increasing their numbers by enough to keep some
functional copies present even if some are
– However, too much increase in numbers will kill the
organism because sometimes transposable elements
insert within a gene, inactivating it.
More Transposable Elements
Two basic types: those that are strictly DNA, and those that replicate
through an RNA intermediate. These are sometimes called type 1 and
type 2, but I have a hard time keeping those arbitrary numbers
straight. The most important nomenclature issue is that the prefix
“retro-” implies the use of reverse transcriptase, which copied RNA
into DNA, the defining characteristic of RNA-intermediate
Eukaryotes often contain very short (200-500 bp) elements that
contain the ends of a longer DNA transposon and miscellaneous junk
inside. They move to new locations using the transposase enzyme
from a full length element.
Most bacterial TEs are DNA only. In eukaryotes, DNA transposable
elements occur, but are less common than retrotransposons.
– Transposable elements were first studied by Barbara
McClintock in corn. They are an important source of the
variation seen in ornamental flowers.
Most common type in bacteria: Insertion Sequences (IS)
– roughly 1-3 kbp long, containing a transposase gene, and are
bounded by short (10-40 bp) inverted repeats
– many different families, not well conserved across species
Transposons are longer TEs, usually composed of 2 IS elements and
a gene(s) in between, often an antibiotic resistance gene.
RNA transposable elements are called retrotransposons in
eukaryotes. They are characterized by the use of reverse
transcriptase in their life cycle.
They are related to retroviruses, such as HIV, feline leukemia
There are a variety of retro element types, some of which
contain long terminal repeats (LTRs) and some of which don’t.
Also, there are many non-functional, degenerate sequences
in eukaryotic genomes that started out as retrotransposons.
. Retrotransposons lack the gene necessary to move outside the
Up to 25% of the human genome.
In bacteria, the common RNA TE is a “mobile group II intron”.
– When transcribed into messenger RNA they can splice
themselves out without the need for proteins
– group II introns contain a gene for reverse transcriptase,
which copies the RNA back into DNA at a new location
in the genome.
• Most recombination occurs between
homologous sites: two chromosomes
line up in meiosis and have a breakand-rejoin event at the same location,
resulting in daughter chromosomes
that contain a mixture of alleles from
• However, any two sites that contain
similar DNA sequences can pair up
and have a crossover. These events
can significantly rearrange the
Hemophilia A: Inversion Problems
The clotting factor VIII gene, F8, is on the X
chromosome and is the major cause of
F8 is a large gene, and completely contained
within intron 22 are two small genes
transcribed from the opposite strand.
One of these genes, F8A, has another copy
several hundred kb away, on the opposite
strand. Thus, these two very similar genes
are in opposite orientation.
Sometimes crossing over during meiosis will
pair these regions are recombination will
occur. This results in an inversion.
The inversion completely disrupts the main
F8 gene, because its 5’ half is now inverted
and far away from its 3’ half.
This accounts for about 45% of hemophilia A
Almost all new cases arise during male
meiosis: in females, the two homologous X
chromosomes are paired, which seems to
inhibit this inversion.
• Genes are duplicated if there is more than one copy present in the
– Some duplications are “dispersed”, found in very different locations from
– Other duplications are “tandem”, found next to each other.
• Tandem duplications play a major role in evolution, because it is
easy to generate extra copies of the duplicated genes through the
process of unequal crossing over.
– These extra copies can then mutate to take on altered roles in the cell,
or they can become pseudogenes, inactive forms of the gene, by
• Most commonly tandem duplications affect only one gene, resulting
in an array of very similar genes.
– Sometimes duplicated regions exist within a gene, which can cause
havoc in trying to align the sequences
Unequal Crossing Over
Unequal crossing over happens during prophase
of meiosis 1. Homologous chromosomes pair at
this stage, and sometimes pairing occurs between
the similar but not identical copies of a tandem
duplication. If a crossover occurs within the
mispaired copies, one of the resulting gametes will
have an extra copy of the duplication and the
other will be missing a copy.
As an example, the beta-globin gene cluster in
humans contains 6 genes, called epsilon (an
embryonic form), gamma-G, gamma-A (the
gammas are fetal forms), pseudo-beta-one (an
inactive pseudogene), delta (1% of adult beta-type
globin), and beta (99% of adult beta-type globin.
Gamma-G and gamma-A are very similar, differing
by only 1 amino acid.
If mispairing in meiosis occurs, followed by a
crossover between delta and beta, the hemoglobin
variant Hb-Lepore is formed. This is a gene that
starts out delta and ends as beta. Since the gene
is controlled by DNA sequences upstream from
the gene, Hb-Lepore is expressed as if it were a
delta. That is, it is expressed at about 1% of the
level that beta is expressed. Since normal beta
globin is absent in Hb-Lepore, the person has
DNA sometimes breaks due to mechanical stress,
ionizing radiation, or chemical attack.
Most organisms contain enzymes that reassemble
broken DNA molecules, called non-homologous
If there is more than one break, ends are joined
randomly, which can lead to a rearranged
– This breaks up blocks of genes over evolutionary
Horizontal Gene Transfer
In eukaryotes, there is little doubt that almost all
genes are transmitted from parent to offspring,
with each species having a separate line of
This is much less true in the prokaryotes, where a
great deal of DNA is transferred across species
Large exceptions: endosymbionts, the mitochondria
and chloroplasts. Many genes from these formerly
free-living organisms have migrated into the nucleus.
There are other cases of single genes being
I have seen an estimate of 15% of all prokaryotic
genes are derived from horizontal transfers
Horizontal gene transfer is usually identified by
performing phylogenetic linage studies on
individual genes, and seeing that some gene has
more in common with genes in distant species
than with genes in closely related species.
Sources of New DNA
Bacteria reproduce by binary
fission: replicating their DNA,
then splitting in half. Each
cell has only 1 parent, and
there is no regular sexual
Bacteria have 3 main ways
of bringing in new DNA:
– conjugation: direct transfer
of DNA between 2 cells
(although not necessarily of
the same species)
– transduction: transfer of
DNA between cells using a
bacteriophage (virus) as an
– transformation: the cell
takes up DNA molecules
from the environment
Bacteriophage (phage) are bacterial viruses: DNA (or RNA) surrounded by
a protein coat, but with no internal metabolic activity.
Most bacteriophage enter the cell, hijack its machinery to reproduce
themselves, and then kill the cell by lysing it (breaking it open). This is
called the lytic cycle.
Some phage have the ability to insert themselves into the bacterial genome
and remain there, inactive, for many generations: the lysogenic cycle.
– First described in phage lambda
– the inserted phage chromosome is called the prophage.
When conditions get harsh, the phage DNA comes out of the chromosome
and enters the normal lytic pathway. It reproduces and kills the host cell.
Sometimes the prophage is inactivated by mutation and becomes a
permanent part of the chromosome.