PowerPoint 簡報

Download Report

Transcript PowerPoint 簡報

Lecture 3
Molecular Evolution and
Phylogeny
Facts on the molecular basis of life
• Every life forms is genome based
• Genomes evolves
• There are large numbers of apparently
homlogous intra-genomic (paralog) and
inter-genomic (ortholog) genes
• Some genes, especially those related to
the function of transcription and
translation, are common to ALL life forms
• The closer two organisms seem to be
phylogenetically, the more similar their
genomes and corresponding genes are
Central dogma of molecular biology
DNA
RNA
Protein
Basic assumptions of molecular
evolution
• Closer related organisms have more
similar genomes
• Highly similar genes are homologs (have
the same ancestor)
• A universal ancestor exists for all life forms
• Molecular difference in homologous
genes (or protein sequences) are
positively correlated with evolution time
• Phylogenetic relation can be expressed
by a dendrogram (a “tree”)
The five steps in phylogenetics dancing
1
2
3
Sequence data
Align Sequences
Phylogenetic signal?
Patterns—>evolutionary processes?
Distances methods
Characters based methods
Distance calculation
(which model?)
Choose a method
4
MB
Model?
ML
MP
Wheighting?
Model?
(sites, changes)?
Optimality criterion
LS
ME
Calculate or estimate best fit tree
5
Test phylogenetic reliability
Modified from Hillis et al., (1993). Methods in Enzymology 224, 456-487
Single tree
NJ
Why protein phylogenies?
• For historical reasons - first sequences...
• Most genes encode proteins...
• To study protein structure, function and
evolution
• Comparing DNA and protein based
phylogenies can be useful
•Different genes - e.g. 18S rRNA versus EF-2
protein
•Protein encoding gene - codons versus
amino acids
Protein were the first molecular
sequences to be used for
phylogenetic inference
Fitch and Margoliash (1967)
Construction of phylogenetic trees.
Science 155, 279-284.
Most of what follows taken from:
Statistical Physics and Biological Information
Institute of Theoretical Physics
University of California at Santa Barbara
2001 May 7
Understanding trees
Root
30 Mya
Time
22 Mya
7 Mya
same as
Understanding trees #2
Understanding trees #3
Difference in homologous sequences is
a measure of evolution time
Part of multiple sequence alignment of Mitochondrial
Small Sub-Unit rRNA
Full length is ~ 950
11 primate species with mouse as outgroup
靈長目
Change similarity matrix to distance matrix: d = 1 - S
From alignment construct pairwise distance*
*Note:
Alignment
is not the
only way to
compute
distance
Models of sequence evolution
Jukes-Cantor (minimal) Model
All substitution rates = a all base frequency = 1/4
= 3 Pij(2t)
A
C
Derivation of Jukes-Cantor formula
• Let probability of site being a base at time t be P(t)
• After elapse time Dt
mutate to other three bases is –3aDt P(t)
Gain from other bases is aDt (1 - P(t))
• Hence
P(t + Dt) = P(t) –3aDt P(t) + aDt (1 - P(t))
dP(t)/dt = a - 4a P(t)
• Write P(t) = a exp(-bt) +c, solution is b= 4a, c=1/4
P(t) = a exp(- 4a t) +1/4
• If P(0) = 1, then a = ¾. If P(0) = 0, then a = -1/4
• Finally
Psame(t) =1/4 +3/4 exp(- 4a t)
Pchange(t) =1/4 - 1/4 exp(- 4a t)
Hasegawa-Kishino-Yano model
Has a more general substitution rate
Transition A G or C T
Transversion A T or C G
Part of Jukes-Cantor distance matrix
for primate examples
(is much larger; for outgroup)
Matrix will be used for clustering methods
Clustering
UPGMA
Neighbor-Joining Method
N-J Method produces an Unrooted,
Additive tree
Neighbor-Joining
Method
What is required for the Neighbour joining method?
An Example
Distance matrix
PAM
Spina ch
Rice
Mosquito
Monkey
Human
0. Distance Matrix
Spina ch
0.0
84.9
105.6
90.8
86.3
Rice
84.9
0.0
117.8
122.4
122.6
Mosquito
105.6
117.8
0.0
84.7
80.8
Monkey
90.8
122.4
84.7
0.0
3.3
Human
86.3
122.6
80.8
3.3
0.0
1. First Step
PAM distance 3.3 (Human - Monkey) is the minimum. So we'll
join Human and Monkey to MonHum and we'll calculate the new
distances.
Mon-Hum
Mosquito
Spinach
Rice Human
Monkey
2. Calculation of New Distances
After we have joined two species in a subtree we have to compute the
distances from every other node to the new subtree. We do this with a
simple average of distances:
Dist[Spinach, MonHum]
= (Dist[Spinach, Monkey] + Dist[Spinach, Human])/2
= (90.8 + 86.3)/2 = 88.55
Mon-Hum
Spinach
Human
Monkey
3. Next Cycle
PAM
Spina ch
Rice
Mosquito
MonHu m
Spina ch
0.0
84.9
105.6
88.6
Rice
84.9
0.0
117.8
122.5
Mosquito
105.6
117.8
0.0
82.8
MonHu m
88.6
122.5
82.8
0.0
Mos-(Mon-Hum)
Mon-Hum
Rice
Spinach
Mosquito
Human
Monkey
4. Penultimate Cycle
PAM
Spina ch
Rice
MosMonHum
Spina ch
0.0
84.9
97.1
Rice
84.9
0.0
120.2
MosMonHum
97.1
120.2
0.0
Mos-(Mon-Hum)
Spin-Rice
Rice
Spinach
Mon-Hum
Mosquito
Human
Monkey
5. Last Joining
PAM
Spinach
MosMonHum
SpinRice
0.0
108.7
MosMonHum
108.7
0.0
(Spin-Rice)-(Mos-(Mon-Hum))
Mos-(Mon-Hum)
Spin-Rice
Rice
Mon-Hum
Spinach
Mosquito
Human
Monkey
The result:
Unrooted Neighbor-Joining Tree
Human
Spinach
Monkey
Rice
Mosquito
Bootstrapping
Why are trees not exact?
Pairwise distances usually not tree-like
Searching tree space
Maximum likelihood criterion
Parsimony criterion
Parsimony with molecular data
Parsimony criterion
Paul Higgs:
Is the best tree much better than others?
L: likelihood at nodes
Use Maximum Likelihood to rank alternate trees
NJ tree is 2nd best
same topology
yes
yes
Use Parsimony to rank alternate trees
different topology
; parsimony differentiates weakly
Quartet puzzling
MCMC: Markov chain with Monte Carlo
Topology probabilities according to MCMC
Clade probability compared from tree methods
NJ method is very fast and close to being the best
Lecture and Book
•Lecture by Paul Higgs
• online.itp.ucsb.edu/online/infobio01/higgs/
• see online.itp.ucsb.edu/online/infobio01/
for many lectures
•Book by Wen-Hsiong Li 李文雄
•“Molecular Evolution” (Sinauer Associates, 1997)
Some web sites on Molecular Evolution
•CMS Molecular Biology Resource
•www.unl.edu/stc-95/ResTools/cmshp.html
•Phylogeny - Molecular Evolution
•www.unl.edu/stc-95/ResTools/biotools/biotools2.html
•The Tree of Life Web Project
•tolweb.org/tree/phylogeny.html
•Web Resources in Molecular Evolution and
Systematics
•darwin.eeb.uconn.edu/molecular-evolution.html
Some web sites on ClustalW
• On-line service
• www.ebi.ac.uk/clustalw/
• clustalw.genome.ad.jp/
• Software
• ftp-igbmc.u-strasbg.fr/pub/ClustalX/
• ftp-igbmc.u-strasbg.fr/pub/ClustalW/