Transcript 2 - UPCH

EVOLUCIÓN:
PRINCIPIOS Y
CUANTIFICACIÓN
Human nuclear genome
Only 3% coding DNA
The Evolutionary forces:
•
•
•
•
Natural selection (Darwin, Bernardi)
Neutral Theory: Genetic drift (Kimura)
Small population sizes
Mutation
– Mutationalist theory (Sueoka)
– Thermodynamic pressure theory (Zimic & Arévalo)
• Gene flow (migration), horizontal transfer
MUTATION
THE ULTIMATE SOURCE OF NEW GENETIC VARIATION.
Mutation rates are in the general range of:
Approx 10-7 to 10-8 per nucleotide per generation
Approx 10-5 per gene per generation
Approx 10-3 per generation at microsatellites
BZM210: E.Willassen
Genomes, genes and molecular evolution
purines
A
G
transitions
transversions
C
T
transitions
pyrimidines
Interesting links:
http://www.no.embnet.org/
http://www.ncbi.nlm.nih.gov/index.html
Transitions - transversions
purines
A
G
transitions
Expected:
transversions TS / TV = 4 / 8 = 0.5
T
C
transitions
pyrimidines
TS / TV ratios
mtDNA
 and  globins
9.0
0.66
GENE FLOW
SPREAD OF VARIATION OVER SPACE BY
MOVEMENT AND/OR INTERMARRIAGE
AMONG PEOPLE (‘ADMIXTURE’)
INTRODUCES NEW VARIATION INTO A
POPULATION
REDUCES VARIATION BETWEEN
POPULATIONS
ADMIXTURE
IF ALLELES FROM TWO ‘PARENTAL’
POPULATIONS (1 and 2) MIX IN
PROPORTION m FROM POPULATION 1,
THE ALLELE FREQUENCY IN THE
ADMIXED POPULATION (a)WILL BE
pa = mp1 + (1-m)p2
ADMIXTURE IS COMMON IN THE US
AND REFLECTS POPULATION HISTORY
Autosomal vs. Y-specific vs. mtDNA Native
American genetic contribution in the Hispanic
population of San Luis Valley, Colorado.
100%
80%
% Native American
contribution
60%
40%
20%
0%
AutosomalGlobal
Y-chromosome
mtDNA
GENETIC DRIFT
ALLELE FREQUENCY CHANGE
DUE TO CHANCE FACTORS IN
SEGREGATION, SURVIVAL &
REPRODUCTION IN FINITE
POPULATIONS.
GENETIC DRIFT
INVERSELY RELATED TO POPULATION
SIZE
POSITIVELY RELATED TO TIME.
PROBABILITY OF ULTIMATE FIXATION
OF AN ALLELE IS ITS CURRENT
FREQUENCY
APPLIES TO WHOLE SPECIES, BECAUSE
TIME IS LONG
UP OR OUT IN SMALL POPULATIONS
Initial p=0.5, N=25, 80 generations
About ½ get fixed
CHANGE IS SLOWER IN BIG POPULATIONS
Initial p=0.5, N=300, 100 generations
Change is slower in
larger populations
Drift reduces variation within
populations due to fixation &
loss of neutral alleles.
Drift increases variation
between populations because
different alleles are fixed in
each population
LAS RAZAS HUMANAS EXISTEN?
FOUNDER EFFECT
DRIFT EFFECT ON ALLELE FREQUENCIES
WHEN A POPULATION IS FOUNDED BY A
SMALL NUMBER OF PEOPLE FROM A LARGER
POPULATION
CAN RAISE THE FREQUENCY OF A DISEASE
ALLELE BY CHANCE
e.g., RELIGIOUS ISOLATES LIKE THE AMISH
Variation, measured by heterozygosity is reduced by genetic drift due
to allele loss
Humans:
10,000 years
Effect of drift over time, random mating, no mutation:
Ht=H0(1-1/2N)t
Mean times to fixation and loss for selectively neutral alleles
Loss occurs more rapidly than fixation
Common alleles are generally old alleles
Geographically widespread alleles are usually old
GENE ‘TREES’
What happens to a DNA
sequence over time?
. . . . . and why?
(THINK OF THE DICE EXPERIMENT)
Darwin’s Tree of Life. What about a gene?
SPECIATION
A reduction in gene flow between populations
accompagnied by divergent selection and/or
genetic drift, can lead to speciation.
Evolutionary history includes the
transformation and divergens of lineages
Phylogenetic evolution or anagenesis
Looking forward in time, a
sequence will diverge among
descendant copies, because of the
accumulation of mutations.
Looking backward in time,
present-day sequences
coalesce
to a common ancestor in the past.
THEN
DEMOGRAPHIC HISTORY OF GENE COPIES
T1
T2
Coalescent events
(common ancestry)
T3
T4
T5
NOW
1
2
3
4
5
6
SAMPLES
THEN
MUTATION HISTORY OF ALLELES
A1
T1
mutation
T2
T3
T4
T5
NOW
A2
A3 A1
A1
A4
ALLELES
A5 SAMPLED
The Coalescent process
Mutations arise hierarchically
over time, generating a phylogeny
of cladistic (tree-like, branching)
DNA sequence relationships.
WHAT CAN WE SAY ABOUT THE
RELATIONSHIPS AMONG A SET OF
DNA SEQUENCES SAMPLED TODAY?
ACTAA
AATGA
CGAAA
CGAAG
AGTAG
DNA sequences have a common ancestor and their
variation reflects their descent history
MRCA of all samples
MRCA of these
3 samples
ACTAA
AATGA
CGAAA
CGAAG
Current sample of DNA sequences
AGTAG
Ancestral sequences can sometimes be inferred
AAAAA (common ancestor, or coalescent)
AGAAA
AATAA
CGAAA
ACTAA
AATGA
CGAAA
CGAAG
Current sample of DNA sequences
AGTAG
Population history is reflected in
the pattern of sequence variation,
and the geographic location where
DNA sequence haplotypes are found.
Similar sequences are found
geographically near each other.
Common alleles are usually old
Ancient alleles are found globally.
Global alleles were present at our
species’ origin
New sequences are geographically
localized.
The Neutral Theory
• The great majority of mutations that are
fixed are effectively neutral with respect
to fitness, and are fixed by genetic drift
• polymorphism within populations is
transient and due to the presence of
selectively neutral alleles on their way to
fixation or loss
The Neutral Theory
• Adaptive Evolution is due to Natural
Selection
• Advantageous mutations are rare
• most genetic variation at the molecular
level is not selected within a population
• most genetic substitutions at the
molecular level are not due to selection
Functional Constraint
• Definition:
• an amino acid in a protein cannot be changed
– at all
– only to an amino acid of the same type
• without giving rise to a deleterious mutation
Functional Constraint
•
•
•
•
vertebrates
fibrinopeptides
hemoglobin
cytochromes
hemoglobin
fibrinopeptide
cytochrome c
• rates depend on
functional
constraints
millions of years since divergence
Functional Constraint
• Mitochondrial
gene in mammals
• uniform rate
• rate difference
between silent and
amino-acid
replacement
mutations
silent
replacement
Molecular Clock: observations
•  -hemoglobin in
vertebrates
• plot amino acid
differences
against
divergence time
• good linear
approximation
Molecular Clock: observations
-hemoglobin
about
constant
rate
over
time
Molecular Clock
1
•
What use is the molecular clock?
date divergence in phylogeny
•
as a first approximation
Rates of Nucleotide Substitution
Insuline
• Rate variation
among different
regions of a gene
• Insuline: excised
C-protein evolves
faster
Functional:
Rate: 0.13
Excised:
Rate: 0.97
Rates of Nucleotide Substitution
Rate: number of
substitutions K
between two
homologous
sequences divided by
twice the time of
divergence t
Ancestral sequence
t
t
Sequence 1 Sequence 2
Rates of Nucleotide Substitution
• Number of substitutions
1 lineage
K=rt
2 lineages
from split
K=2rt
Molecular Clock
•
•
•
•
•
molecular clock is used to put a time to
phylogenies
construct phylogeny first by clock
independent method
clock based on well established partial
phylogenies
rate tests on reference set and subsets
estimate times on total data base
Orthologous genes or not?
Well matching sequences may not be directly homologous
•orthologous - same gene copy
•paralogous - duplicate gene copy
•xenologous - introgressed gene copy
(hybridization, virus)  Horizontal transfer
transversions
Base pair differences
’Multiple hits’ and ’saturation’
0.11
0.10
0.08
0.07
v (3rd)
0.05
0.04
0.03
0.01
-0.0581
0.0000
0.0581
0.1163
0.1744
0.2326
0.2907
0.3489
F84 distance
Time
G-T-A-T
E
D
C
A>T
T>A
G>T
B
A
Reversal to a previous state may be
detected as homoplasy. True
phylogenetic signal would be masked
with time and give false
synapomorphies.
Signal depends on mutation rates, r.
transversions
Base pair differences
Adjusted sequence change
0.11
’correction factor’
0.10
0.08
0.07
v (3rd)
0.05
0.04
0.03
0.01
-0.0581
0.0000
0.0581
0.1163
0.1744
0.2326
0.2907
0.3489
F84 distance
Time
different models have been made with intention to correct for
multiple hits by converting observed distances between
sequences to actual (expected) distances (under the particlar
model)
We can use genetic differences among
populations or species to reconstruct
evolutionary history
Infering on likely evolutionary history from
genetic differences
Divergence can be used for grouping
Human
Horse
Cow
Kangaroo
Newt
Carp
0.1
Amino acid sequences of hemoglobin alpha chains
No. of Taxa : 6
Gaps/Missing data : Complete Deletion
Distance method : Amino: Poisson correction
No. of Sites : 140
d : Estimate
1
2
3
4
5
6
[1] Human
[2] Horse
0.13 [3] Cow
0.13 0.13 [4] Kangaroo
0.21 0.23 0.20 [5] Newt
0.57 0.64 0.60 0.64 [6] Carp
0.66 0.65 0.62 0.71 0.75 -
An example of phyllogeny reconstruction
from genetic differences by UPGMA
Molecular clocks:The longer time =>
The more genetic divergence
Sequence divergence and time
let K be the distance between two sequences
the rate of amino acid substitution, r, can be estimated if we
know the time of divergence, T
the rate is:
r = K / 2T
(because 2 sequences are diverging)
Human
Horse
If we know r, the time of divergence T can be estimated as:
T = K / 2r
Dating a branch split
approximate T2 is known from historical record, vicariance event,
or fossil record
compare sequences A, B, C pairwise and compute number of
substitutions per site, K
T1 = (KAC + KBC)T2 / (2KAB)
A
B
This procedure assumes
constant substitution rate
in all branches
C
T2
T1
also, - mutations
(divergence points) may
be older than the dating
speciation events
Molecular clock
Zuckerkandl & Pauling (1965): rate of amino acid change
appears constant through time
Kimura (1968,1983):
•if sequence divergence between humans and horses is scaled for
time using fossils
•and estimated evolutionary rate, r, is applied to all known protein
coding loci
•one amino acid substitution has been fixed every second year on
average
Interpretation: This is too much for selection to have been
influential during evolution of the vertebrates
the fate of mutations
mutations can be neutral
mutations can be advantageous and subject to positive selection
mutations can be disadvantageous and subject to purifying
selection
Mutations can be driven by thermodynamic pressure
selection can be detected by testing sequences against the
predictions of neutral theory
(for instance synonymous vs non-synonymous codons)
Evolutionary constraints on DNAs
0.7
Entropy
0.6
0.5
0.4
0.3
0.2
0.1
0
Base position
AA UA
AG
UU
AA
U
U
A
A
A
U
A
U
A
U
U
U
U
U
U
U
U
A
A
AA U A
UU
U
U
A
UA U A A UU
U
A
G
U
C
G
G
A U
A U
G U
U U
G
C
U G
U A
A U
U A
U A
A U
G C
U A
U A
C
A U
U
A UA
AA
U A
U
A
A
UA
G
G A C
U A U U
U
UU
C C U A A A A GC U U
UA
GGA U U U U U G A A
A
G
CA
U
C
A
U
G AU U
U
U
A
G
AA
A A
C
A U UA
A A A U U AG
U
C GCU
U G A G U AU
U
A U
UUU
loop
helix
Constraints are associated with functionality, for instance the need for
rRNA to base pair and form helices in a secondary molecular structure
Transcription and translation
Translation requires
available tRNA with
appropriate anticodons to
match with each codon on
mRNA
anticodon
codon
DNA coding
Codes in organelle genomes differ slightly from the standard code
Codon usage
codon bias: all codons are not equally frequent
Anopheles gambiae
AAcid Codon Fraction
Gly
GG G
0.14
Gly
GG A
0.56
Gly
GG T
0.27
Gly
GG C
0.03
Codon redundancy:
synonomous (silent) substitutions
give the same amino acids.
Glu
Glu
Asp
Asp
GA G
GA A
GA T
GA C
0.02
0.98
0.95
0.05
synonomous substitutions do not
affect the translation product and
thus should be neutral in expressed
genes
Val
Val
Val
Val
GT G
GT A
GT T
GT C
0.02
0.50
0.45
0.02
Ala
Ala
Ala
Ala
GC G
GC A
GC T
GC C
0.00
0.28
0.64
0.08
However, availability of specific
tRNAs may make some codons
more ’fit’
5. Anomalous DNA composition
Synonymous codons are expected to be neutral,
are expected to occur in equal frequency
Expect 50/50 frequency for two phenylalanine codons
Codon biases are found in all known prokaryotes
Codon frequencies in E. coli
Translational efficiency
depends on tRNA availability
some tRNAs may pair with different codons due to:
•’wobbles’ on the anticodon
•modified nucleotides on the anticodon
(possibility of G-U-pairing, Inosine (G’) pairs with A,C and U
for instance: codons xxC
and xxU can be read by the
same anticodon, xxG
xxG xxG
xxC xxU
anticodon
codon
Consequently some genomes do well with reduced number of
tRNA types in the genome:
22 in vertebrate mitochondrial (mtDNA).
Leucine codons in two organisms
tRNAavailability
Usage: highly expressed
Usage: lowly expressed
Factor Analysis of codon usage of B. subtilis genes
reveals three classes of genes
Class 2 (5%) genes that are
highly expressed under
exponential growth conditions
Class 1 comprises
the majority of the
B. subtilis genes
(82%)
Class 3 (13%) genes
that were apparently
Kunst, F et al. Nature
(1997) 390 249-256
horizontally
Because some of the genes in this group showedtransferred.
clear relationships with
bacteriophage genes, the hypothesis has been proposed that all these genes
were alien and have been acquired horizontally from various sources.
Why do horizontally transferred genes use the genetic code differently?
Mozner I. Current Opinion in Microbiology 1999, 2:524–528
Bacterial species display a wide degree of variation
in their overall G+C content
Rocha EP. Trends Genet 2002 Jun;18(6):291-4
However, most genes have roughly the same GC content within a genome
•
•
•
Distribution of A + T-rich islands along the chromosome of B.
subtilis.
Location of genes from class 3 according to codon usage
analysis is indicated by dots at the bottom of the graph.
Known prophages (PBSX, SPb and skin) are indicated by their
names, and prophage-like elements are numbered from 1 to 7.
Kunst, F et al. Nature (1997) 390 249-256
Synonimous substitutions are not necessarily
neutral
lowly expressed genes
weak selection
for translational efficiency
highly expressed genes
strong selection
for translational efficiency
more tRNAs used
fewer tRNAs used
weak codon bias
strong codon bias
high rates of silent
(neutral) mutations
low rates of silent
mutations:
i.e. synonomous mutations
are not necessarily neutral!
purifying selection
Code Table: Standard
Method: Nei-Gojobori (1986)
S = No. of synonymous sites
N = No. of nonsynonymous sites
----- No of Sites Redundancy
----- for codon Pos Pos Pos
Codon
S N 1st 2nd 3rd
TTT (F) 0.333 2.667 0 0 2
TTC (F) 0.333 2.667 0 0 2
TCT (S) 1.000 2.000 0 0 4
TCC (S) 1.000 2.000 0 0 4
TCA (S) 1.000 2.000 0 0 4
TCG (S) 1.000 2.000 0 0 4
TAA (*) 0.000 3.000 0 0 0
TAG (*) 0.000 3.000 0 0 0
TGA (*) 0.000 3.000 0 0 0
TGT (C) 0.500 2.500 0 0 2
TGC (C) 0.500 2.500 0 0 2
TGG (W) 0.000 3.000 0 0 0
CTT (L) 1.000 2.000 0 0 4
CTC (L) 1.000 2.000 0 0 4
CTA (L) 1.333 1.667 2 0 4
TTA (L) 0.667 2.333 2 0 2
TTG (L) 0.667 2.333 2 0 2
Redundancy and rates on
codon positions
With codon redundancy we
would expect less selective
constraints on 3rd codon
positions.
1st and 2nd position should be
under stronger selective
pressure.
Consequently evolution rates
on 3rd codon positions are
usually found to be higher
than on 1st and 2nd positions
The problem: different molecules can yield different trees
AND may still be telling the truth
Even the sacred of sacreds of phylogenetic taxonomy can be violoated:
Gene tree A
Gene tree B
Archae
Bacteria
Gene tree C
Kingdoms are not monophyletic
in gene tree B and C
The solution: Horizontal Gene Transfer (HGT)
HGT possesses two ingredients sure to cause a controversy
1. Challenges the traditional tree-based view of evolution
2. Is difficult to prove unambiguously
“Infectious heredity”
The significance of horizontal transfer was first recognized in the
1950’s resistance to multiple antibiotics could be transferred
simultaneously from Shigella to Escherichia coli
Xenologs arise by horizontal transfer
organisms
Ancestral gene
Paralogs
Speciation
time
Orthologs
Duplication
Xenologs
Horizontal Transfer
Xenologs
Xenologs – homologs related by horizontal transfer
Orthologs – homologs related by speciation
Paralogs – homologs related by duplication
Mechanisms of horizontal transfer (also referred to as lateral transfer)
1) Transformation – prokaryotes can take up free DNA from
their surroundings
2) Conjugation – (bacterial sex) an organism builds a tube-like
structure known as the pilus, joins it to its ‘‘mate’’, and
transfers a plasmid through the tube. E. coli has been shown to
conjugate with cyanobacteria, AND EVEN with S. cerevisiae!
3) Transduction – genes can be moved from one prokaryote
species to another via viruses.
Horizontally transferred genes retain the sequence
characteristics of the donor genome
Base composition differences are mostly due to third position of codons
Lawrence and Ochman. J Mol Evol (1997) 44:383–397
4. Conservation of gene order
Gene order is not generally conserved in microbial genomes
E. coli
B. subtilis
V. cholerae
•
The presence of three or more genes in the same order in distant
genomes is extremely unlikely unless these genes form an operon.
•
Each operon typically emerges only once during evolution and is
maintained by selection ever after.
•
Therefore, when an operon is present in only a few distantly related
genomes, horizontal gene transfer seems to be the most likely
scenario.