Evolution - Nematode bioinformatics. Analysis tools and data
Download
Report
Transcript Evolution - Nematode bioinformatics. Analysis tools and data
Evolutionary Biology
Concepts
Molecular Evolution
Phylogenetic Inference
Reading: Ch7
BIO520 Bioinformatics
Jim Lund
Evolution
Evolution is a process that results in heritable changes in
a population spread over many generations.
"In fact, evolution can be precisely defined as any change
in the frequency of alleles within a gene pool from one
generation to the next." - Helena Curtis and N. Sue
Barnes, Biology, 5th ed. 1989 Worth Publishers, p.974
Levels of Evolution
• Changes in allele frequencies within
a species.
• Speciation.
Molecular changes:
– Single bp changes.
– Genomic changes (alterations in large
DNA segments).
Branching Descent
Populations
Individuals
Phylogeny
Branching diagram showing the
ancestral relations among species.
“Tree of Life”
History of evolutionary change
FRAMEWORK for INFERENCE
The framework for
phylogenetics
• How do we describe phylogenies?
• How do we infer phylogenies?
Inheritance
DNARNA Protein Function
Common Phylogenetic Tree Terminology
Terminal Nodes
Branches or
Lineages
A
B
C
D
Ancestral Node
or ROOT of
the Tree
Internal Nodes or
Divergence Points
(represent hypothetical
ancestors of the taxa)
E
Represent the
TAXA (genes,
populations,
species, etc.)
used to infer
the phylogeny
Phylogenetic trees diagram the evolutionary
relationships between the taxa
Taxon B
Taxon C
Taxon A
Taxon D
No meaning to the
spacing between the
taxa, or to the order in
which they appear from
top to bottom.
Taxon E
This dimension either can have no scale (for ‘cladograms’),
can be proportional to genetic distance or amount of change
(for ‘phylograms’ or ‘additive’ trees).
((A,(B,C)),(D,E)) = The above phylogeny as nested parentheses
These say that B and C are more closely related to each other than either is to A,
and that A, B, and C form a clade that is a sister group to the clade composed of
D and E. If the tree has a time scale, then D and E are the most closely related.
Two types of trees
Cladogram
Phylogram
or additive tree
6
Taxon B
1
Taxon C
Taxon A
Taxon D
no
meaning
1
5
Taxon B
Taxon C
Taxon A
Taxon D
genetic
change
Meaning of branch length differs.
All show the same evolutionary relationships,
or branching orders, between the taxa.
Rooted vs Unrooted Trees
More Trees
A
B
C
D
E
F
Trees-3
A
B
C
D
E
F
Extinction
A
B
C
D
E
F
Population Genetic Forces
Hardy-Weinberg Paradigm
p+q=1
p2 + 2pq + q2 =1
• Natural Selection (fitness)
• Drift (homozygosity by chance)
– much greater in small populations
• Mutation/Recombination (variation)
• Migration
– homogenizes gene pools
Modes of speciation
Many ways speciation can occur, among
the most common are:
• Geographic isolation.
• Reproductive isolation.
– Sexual selection.
– Behavioral isolation.
DNA, protein sequence change
Multiple Changes/No Change
..CCU
..CCC
..CCC
..CCC
..CCU
..CCU
AUA
AUA
AUG
AUG
AUG
AUA
GGG..
GGG..
GGG..
GGC..
GGC..
GGC..
5 mutations
1 DNA change
0 amino acid changes (net)
Enumerating bp/aa changes
underestimates evolutionary change
Mechanisms of DNA Sequence Change
Neutral Drift vs Natural Selection
Traditional
selection
model
Neutral
(Kimura/Jukes)
Pan-neutralism
Rate of change (evolution) of
hemoglobin protein
Each point on the graph is for a pair of species, or groups of species.
From Kimura (1983) by way of Evolution, Ridley, 3rd ed.
Mutation rate varies Gene-to-Gene
9
Protein
Rate (x 10 yr)
Lysozyme
2.0
Insulin
0.4
Histone H4
0.01
Rate varies Site-to-Site
Protein
Coding
Silent
Albumin
0.9
6.7
0.03
6.1
0.9
4.6
Histone
H4
Average
Rate varies Site-to-Site
From Evolution. Mark Rdley, 3rd Ed.
Constraints on “Silent” Changes
• Codon Biases-translation rates
• Transcription elongation rates
– polymerase ‘pause’ sites
• “Silent” regulatory elements
– select for or against
presence/absence
• Overall genome structure
DNA, Protein Similarity
• Similarity by common descent
– phylogenetic
• Similarity by convergence (rare)
– functional importance
• Similarity by chance
– random variation not limitless
– particular problem in wide divergence
Homology-similar by
common descent
CCCAGG
CCCAAG
CCCAAA
CCTAAA
Inferring Trees and
Ancestors
CCCAGG
CCCAAG->
CCCAAG
CCCAAA->
CCTAAA
CCTAAA->
CCTAAC
Not always straightforward.
The data doesn’t always give a single, correct answer.
Homology, Orthology, Paralogy
Paralogy Trap
Improper Inference
Garbage in,
garbage out!
Our Goals
• Infer Phylogeny
– Optimality criteria
– Algorithm
• Phylogenetic inference
– (interesting ones)
Watch Out
“The danger of generating incorrect results
is inherently greater in computational
phylogenetics than in many other fields
of science.”
“…the limiting factor in phylogenetic
analysis is not so much in the facility of
software application as in the conceptual
understanding of what the software is
doing with the data.”