Transcript Document

Chapter 4
Evolutionary Changes in Nucleotide Sequences
Chau-Ti Ting
[email protected]
Unless noted, the course materials are licensed under Creative
Commons Attribution-NonCommercial-ShareAlike 3.0 Taiwan (CC
BY-NC-SA 3.0)
Introduction
Calculate of the distance between two sequences
is the simplest phylogenetic analysis
Important because
The first step in distance methods for
phylogeny reconstruction
Markov-process models of nucleotide
substitution used in distance calculation form
the basis of likelihood and Bayesian analysis
The distance between two nucleotide sequences is
defined as the expected number of nucleotide
substitutions per site.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.
A simplest distance measure is the proportion of
different sites, sometimes called the p-distance. If
10 sites are different between two sequences,
each 100 bp long, then p= 10% = 0.1
However, a variable site may result from more than
one substitutions that have occurred, and even a
constant site may harbor back or parallel
substitutions.
Multiple hits: multiple substitutions at the same site
(i.e., some changes are hidden)
Note: p is usable only for high similar sequences,
with p < 5%.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,, New York, USA.
ACTGAACGTAACGC


A
A
A
C
T
T
G
G
A
G
A
A
A
G
G
G
A
A
A
A
T
T
C
C
G
G
C
C
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.
Jukes and Cantor’s one-parameter model
This simple model assumes that substitutions occur
with equal probability among the four nucleotide types.
The rate of substitution for each nucleotide is 3 pre
unit time, and the rate of substitution is in each of the
three possible directions of change is . Because the
model involves a single parameter,  it is called the
Dan Graur and Wen-Hsiung Li
one-parameter model. Source:
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA,
USA.

A
G


C



T
National Taiwan University Chau-Ti Ting

A


C
G



T
National Taiwan University Chau-Ti Ting
Since we start with A, the probability hat this site is occupied by
A at time 0 is PA(0) =1. At time 1, the probability of still having A
at this site is given by
PA(1) = 1 – 3
In which 3 is the probability of A changing to T, C or G, and 1 –
3 is the probability that A has remained unchanged.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.
The probability of having A at time 2 is
PA(2) = (1 – 3 PA(1) +  PA(1)
To derive this equation, we consider two possible scenarios:
1) the nucleotide has remained unchanged from time 0 to time 2,
and 2) the nucleotide has changed to T, C, or G at time 1, but has
subsequently reverted to A at time 2.
I
t=0
A
II
A
PA(1)
 PA(1)
t=1
A
(1 – 3
Not A

t=2
A
A
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc. Sunderland, MA, USA.
PA(2) = (1 – 3 PA(1) +  PA(1)
Using the above formulation, we can show that the following
recurrence equation applies to any t:
PA(t+1) = (1 – 3 PA(t) +  PA(t)
We can rewrite this equation in terms of the amount of change in PA(t)
per unit time as
PA(t) = PA(t+1)  PA(t) = {(1 – 3 PA(t) +  PA(t)} PA(t)
– 3 PA(t) +  PA(t)
– 4 PA(t) + 
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 69. Sinauer Associates, Inc. Sunderland, MA, USA.
d PA(t)
dt
= – 4 PA(t) + 
1
1
] e – 4t
+[ PA(0) –
PA(t) =
4
4
When PA(0) = 1
1
3 – 4t
+
PA(t) =
4
4 e
1
3 – 4t
+
PA(t) =
4
4 e
1
3 – 4t
+
PAA(t) =
4
4 e
1
3 – 4t
+
Pii(t) =
4
4 e
1
1
] e – 4t
+[ PA(0) –
PA(t) =
4
4
When PA(0) = 0
1
1 – 4t
–
PA(t) =
4
4 e
1
1 – 4t
–
PA(t) =
4
4 e
1
PGA(t) =
4
1 – 4t
–
4 e
1
1 – 4t
–
Pij(t) =
4
4 e
where i ≠ j
P
0.25
Time
National Taiwan University Chau-Ti Ting
Kimura’s two-parameter model
In this model, the rate of transitional substitution at each nucleotide
site is  per unit time, whereas the rate of each transversional
Dan Graur and Wen-Hsiung Li
substitution is  per unit time. Source:
2000. Fundamentals of Molecular Evolution., p. 71. Sinauer Associates, Inc. Sunderland, MA,
USA.

A


C
G



T
National Taiwan University Chau-Ti Ting

A


C
G



T
National Taiwan University Chau-Ti Ting
Let us consider the probability that a site that has A at time 0 will
have A time t. After one time unit, the probability of A changing
to G is , and the probability of A changing to either C or T is 2.
Thus the probability of A remaining unchanged after one time unit
Dan Graur and Wen-Hsiung Li
is Source:
2000. Fundamentals of Molecular Evolution., p. 72. Sinauer Associates, Inc. Sunderland, MA, USA.
PA(1) = 1 –   
t=0
A
I
II
A
III
IV
A
A
t=1
A
transition
transversion
transversion
G
C
T
t=2
A
transition
transversion
transversion
A
A
A
PA(2) = (1 –    PA(1) + PT(1) + PC(1) + PG(1)
By extention,
PA(t+1) = (1 –    PA(t) + PT(t) + PC(t) + PG(t)
Similarly, we can obtain
PT(t+1) =  PA(t) + (1 –    PT(t) + PC(t) + PG(t)
PC(t+1) =  PA(t) + PT(t) + (1 –    PC(t) + PG(t)
PG(t+1) =  PA(t) + PT(t) + PC(t) + (1 –    PG(t)
1
1 – 4t 1 – 2(a+ t
e
e
+
+
PAA(t) =
4
4
2
1
1 – 4t 1 – 2(a+ t
e
e
+
+
PAA(t) =
4
4
2
PAA(t) = PGG(t) = PCC(t) = PTT(t)
1
1 – 4t 1 – 2(a+ t
e
e
+
+
X(t) =
4
4
2
Let Y(t) = the probability that the initial nucleotide and the
nucleotide at time t differ from each other by a transition.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.
Y(t) = PAG(t) = PGA(t) = PTC(t) = PCT(t)
1
1 – 4t 1 – 2(a+ t
e
e
+
–
Y(t) =
4
4
2
The probability, Z(t), that the initial nucleotide and the nucleotide at
time t differ by a specific type of transversion is given by
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc. Sunderland, MA, USA.
1
1 – 4t
e
–
Z(t) =
4
4
1
1 – 4t 1 – 2(a+ t
e
e
+
+
X(t) =
4
4
2
1
1 – 4t 1 – 2(a+ t
e
e
+
–
Y(t) =
4
4
2
1
1 – 4t
e
–
Z(t) =
4
4
Note that each nucleotide subject to two types of transversion,
but only one type of transition. Also
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.
X(t) + Y(t) + 2 Z(t) = 1
Number of nucleotide substitutions between two DNA sequences
If two sequences of length N differ from each other at n site, then
the proportion of differences, n/N, is referred to as the degree of
divergence or Hamming distance.
If the degree of divergence is substantial, then the observed number
of differences is likely to be smaller than the actual number of
substitutions due to multiple substitution or multiple hit at the same
site.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc. Sunderland, MA, USA.
ACTGAACGTAACGC


A
A
C A
C
T
T
G
G
A
A C T G
A
A
C A
C G
G
G
T A
T A
A
A
A T
A C T
C
C
G
G
C T C
C
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc. Sunderland, MA, USA.
single substitution
sequential substitution
Coincidental substitution
Parallel substitution
Convergent substitution
Back substitution
Number of nucleotide substitutions between two noncoding
sequences
Let us start with one-parameter model. In this model, it is sufficient
to consider only I(t), which is the probability that the nucleotide at a
given site at the time t is the same in both sequences. Suppose that
the nucleotide at a given site was A at time 0. At time t, the
probability that a descendant sequence will have A at this site is PAA(t),
and consequently the probability that two descendant sequences have
A at this site is P2AA(t). Similarly, the probabilities that both sequence
have T, C, G at this site are P2AT(t) P2AC(t) P2AG(t), respectively.
Therefore,
I(t) = P2AA(t) +P2AT(t) +P2AC(t) +P2AG(t)
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.
I(t) = P2AA(t) +P2AT(t) +P2AC(t) +P2AG(t)
1
3 – 8t
+
I(t) =
4
4 e
Note that the probability that the two sequences are different at a
site at time t is p = 1 I(t). Thus,
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.
or
3
p=
(1-e – 8t)
4
8 t = – ln [ 1 – (4/3) p ]
ancestral sequence
3 t
sequence 1
3 t
sequence 2
National Taiwan University Chau-Ti Ting
The time of divergence between two sequences is usually given
not known, and thus we can not estimate . Instead, we compute
K, which is the number of substitutions per site since the time of
divergence between two sequences. In the case of the one
parameter model, K= 2(3t), where 3t is the number of
substitutions per site in a single lineage.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.
8 t = – ln [ 1 – (4/3) p ]
K = 2(3t)
We can calculate K as
K = – (3/4) ln [ 1 – (4/3) p]
Where p is observed proportion of different nucleotides between
two sequences.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.
In the case of two-parameter model, the differences between two
sequences are classified into transitions and transversions. Let P
and Q be the proportion of transitional and transversional
differences between two sequences, respectively. Then the number
of nucleotide substitutions per site between two sequences, K, is
Source: Dan Graur and Wen-Hsiung Li
estimated by
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc. Sunderland, MA, USA.
K = (1/2) ln [ 1 / (1-2P-Q)]+(1/4) ln [1/(1-2Q)]
One-parameter
K = – (3/4) ln [ 1 – (4/3) p]
Two-parameter
K = (1/2) ln [ 1 / (1-2P-Q)]+(1/4) ln [1/(1-2Q)]
Ex.1: 2 sequences with 200 nucleotides that differ by 20
transitions and 4 transversions
One-parameter
L = 200
p = 24/200 = 0.12
Two-parameter
L = 200
P = 20/200 =0.10
Q = 4/200 = 0.02
K ≈ 0.13
K ≈ 0.13
One-parameter
L = 200
p = 24/200 = 0.12
Two-parameter
L = 200
P = 20/200 =0.10
Q = 4/200 = 0.02
K ≈ 0.13
K ≈ 0.13
In this example, the two models give essentially
the same estimate because the degree of
divergence is small enough that the corrected
degree of divergence (i.e., the number of
nucleotide substitutions, K) is only only slightly
larger than the uncorrected value (i.e., the
number of nucleotide differences, p).
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.
One-parameter
K = – (3/4) ln [ 1 – (4/3) p]
Two-parameter
K = (1/2) ln [ 1 / (1-2P-Q)]+(1/4) ln [1/(1-2Q)]
Ex.2: 2 sequences with 200 nucleotides that differ by 50
transitions and 16 transversions
One-parameter
L = 200
p = 66/200 = 0.33
Two-parameter
L = 200
P = 50/200 =0.25
Q = 16/200 = 0.08
K ≈ 0.43
K ≈ 0.48
One-parameter
L = 200
p = 66/200 = 0.33
Two-parameter
L = 200
P = 50/200 =0.25
Q = 16/200 = 0.08
K ≈ 0.43
K ≈ 0.48
When the degree of divergence between two
sequences is large, and especially in cases where
there are prior reasons to believe that the rate of
transition differs from the rate of transversion,
the two parameter model tends to be more
accurate than the one-parameter model.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc. Sunderland, MA, USA.
Violation of assumptions
Several assumptions have been made that are not necessary met by the
sequences under study.
1) The rate of substitution was assumed to be the same at all sites.
This assumption might not hold, as the rate may vary greatly from
site to site.
2) The substitution occur in an independent manner.
3) The substitution matrix was assumed not to change in time, so that
the nucleotide frequencies are maintained at a constant equilibrium
value throughout their evolution.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 79. Sinauer Associates, Inc. Sunderland, MA, USA.
Substitution mutations
Transition
Transversion
changes beween A and G, or between T and C
changes between a purine and a pyrimidine
Source: Marjorie A. Hoy
2003. Insect molecular genetics: an introduction to principles and applications, 2 nd edition, p. 23. Academic Press. USA.
Synonymous (silent mutations)
Nucleotide changes do not effect amino acid sequence.
Source: A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.
2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.
Nonsynonymous (replacement mutations)
A change in single nucleotide in a codon can result in an
amino acid replacement.
Source: A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart.
2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and Company. New York, USA.
DNA
mRNA
Amino acid
CCG
CCG
Proline
CTG
CUG
Leucine
CTC
CUC
Leucine
The transition/transversion rate ratio
Three definitions of the ‘transition/transversion rate ratio’
are in use
1.The ratio of numbers of transitional and transversional
differences between the two sequences, without
correcting multiple hits. (E(S)/E(V))
2. k = a / b , with k = 1meaning no rate difference between
transitions and transversions
3.Average transition/transversion ratio (R): same as the
first one but with correction
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 17. Oxford University Press Inc,, New York, USA.
Overall, R is convenient to use for comparing estimates
under different models, while  is more suitable for
formulating the null hypothesis of no
transition/transversion rate difference.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 18. Oxford University Press Inc,, New York, USA.
Models of amino acid and codon substitution
Introduction
With protein coding genes, we have the advantage of
being able to distinguish synonymous or silent
substitutions from the nonsynonymous or replacement
substitutions.
Synonymous and nonsynonymous mutations are under
very different selection pressures and are fixed at very
different rates.
Thus, comparison between synonymous and
nonsynonymous substitution rates provides a means to
understand the effect of natural selection on the protein.
This comparison does not require estimation of absolute
substitution rates
or knowledge of the divergence time.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 40. Oxford University Press Inc,, New York, USA.
Models of amino acid replacement
Empirical models attempts to describe the relative rates of
substitution between two amino acids without considering
explicitly factors that influence the evolutionary process.
They are often constructed by analyzing large quantities of
sequence data, as compiled from database.
Mechanistic models consider the biological process
involved in amino acid substitution, such as mutation
biases in the DNA, translation of the codons into amino
acid after filtering by natural selection. Mechanistic models
have more interpretative power and are particular useful
for study the forces and mechanisms of gene sequence
evolution.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 40-41. Oxford University Press Inc,, New York, USA.
The first empirical amino acid substitution matrix was
constructed by Dayhoff and colleagues.
They compiled and analyzed protein sequences available
at the time, using a parsimony argument to reconstruct
ancestral protein sequences and tabulating amino acid
changes along branches on the phylogeny.
Dayhoff et al. approximated the transition-probability
matrix for an expected distance of 0.01 changes per site,
call 1 PAM (for point-accepted mutations).
Different PAM matrices are derived from the multiplication
of the PAM1 matrix.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 41. Oxford University Press Inc,, New York, USA.
PAM matrix
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
X
*
A
2
-2
0
0
-2
0
0
1
-1
-1
-2
-1
-1
-3
1
1
1
-6
-3
0
0
0
0
-8
R
-2
6
0
-1
-4
1
-1
-3
2
-2
-3
3
0
-4
0
0
-1
2
-4
-2
-1
0
-1
-8
N
0
0
2
2
-4
1
1
0
2
-2
-3
1
-2
-3
0
1
0
-4
-2
-2
2
1
0
-8
D
0
-1
2
4
-5
2
3
1
1
-2
-4
0
-3
-6
-1
0
0
-7
-4
-2
3
3
-1
-8
C
-2
-4
-4
-5
12
-5
-5
-3
-3
-2
-6
-5
-5
-4
-3
0
-2
-8
0
-2
-4
-5
-3
-8
Q
0
1
1
2
-5
4
2
-1
3
-2
-2
1
-1
-5
0
-1
-1
-5
-4
-2
1
3
-1
-8
E
0
-1
1
3
-5
2
4
0
1
-2
-3
0
-2
-5
-1
0
0
-7
-4
-2
3
3
-1
-8
G
1
-3
0
1
-3
-1
0
5
-2
-3
-4
-2
-3
-5
0
1
0
-7
-5
-1
0
0
-1
-8
H
-1
2
2
1
-3
3
1
-2
6
-2
-2
0
-2
-2
0
-1
-1
-3
0
-2
1
2
-1
-8
I
-1
-2
-2
-2
-2
-2
-2
-3
-2
5
2
-2
2
1
-2
-1
0
-5
-1
4
-2
-2
-1
-8
L
-2
-3
-3
-4
-6
-2
-3
-4
-2
2
6
-3
4
2
-3
-3
-2
-2
-1
2
-3
-3
-1
-8
K
-1
3
1
0
-5
1
0
-2
0
-2
-3
5
0
-5
-1
0
0
-3
-4
-2
1
0
-1
-8
M
-1
0
-2
-3
-5
-1
-2
-3
-2
2
4
0
6
0
-2
-2
-1
-4
-2
2
-2
-2
-1
-8
F
-3
-4
-3
-6
-4
-5
-5
-5
-2
1
2
-5
0
9
-5
-3
-3
0
7
-1
-4
-5
-2
-8
P
1
0
0
-1
-3
0
-1
0
0
-2
-3
-1
-2
-5
6
1
0
-6
-5
-1
-1
0
-1
-8
S
1
0
1
0
0
-1
0
1
-1
-1
-3
0
-2
-3
1
2
1
-2
-3
-1
0
0
0
-8
National Taiwan University Chau-Ti Ting
T
1
-1
0
0
-2
-1
0
0
-1
0
-2
0
-1
-3
0
1
3
-5
-3
0
0
-1
0
-8
W
-6
2
-4
-7
-8
-5
-7
-7
-3
-5
-2
-3
-4
0
-6
-2
-5
17
0
-6
-5
-6
-4
-8
Y
-3
-4
-2
-4
0
-4
-4
-5
0
-1
-1
-4
-2
7
-5
-3
-3
0
10
-2
-3
-4
-2
-8
V
0
-2
-2
-2
-2
-2
-2
-1
-2
4
2
-2
2
-1
-1
-1
0
-6
-2
4
-2
-2
-1
-8
B
0
-1
2
3
-4
1
3
0
1
-2
-3
1
-2
-4
-1
0
0
-5
-3
-2
3
2
-1
-8
Z
0
0
1
3
-5
3
3
0
2
-2
-3
0
-2
-5
0
0
-1
-6
-4
-2
2
3
-1
-8
X
0
-1
0
-1
-3
-1
-1
-1
-1
-1
-1
-1
-1
-2
-1
0
0
-4
-2
-1
-1
-1
-1
-8
*
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
-8
1
BLOSUM
(BLOcks of Amino Acid SUbstitution Matrix)
http://en.wikipedia.org/wiki/BLOSUM
Features of these matrices:
1. amino acids with similar physico-chemical properties
tend to interchange with each other at high rates than
dissimilar amino acids. (D E or I V)
2. The “mutational distance” between amino acids
determined by the structure of the genetic code. Amino
acids separated by differences of two or three codon
positions have lower rates than amino acids separated
by a difference of one codon position. (R K for
nuclear proteins or for mitochondrial proteins)
Both factors may be operating at the same time.
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 42. Oxford University Press Inc,, New York, USA.
Estimate synonymous and nonsynonymous substitutions rates
Two distances are usually calculated between protein-coding DNA
sequences, for synonymous and nonsynonymous substitutions,
respectively.
dS or KS: the number of synonymous changes per synonymous site
dN or KN: the number of nonsynonymous changes per
nonsynonymous site
Two classes of methods: heuristic counting methods and the ML
method
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 49. Oxford University Press Inc,, New York, USA.
Counting Methods
Three steps:
1. Count synonymous and nonsynonymous sites
2. Count synonymous and nonsynonymous differences
3. Calculate the proportion of differences and correct for
multiple hits
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,, New York, USA.
A
G
C
T
National Taiwan University
Chau-Ti Ting
Wikipedia
Nei and Gojobori (1986)
1. Count synonymous and nonsynonymous sites: S and N
2. Count synonymous and nonsynonymous differences: Sd and Nd
3. Calculate the proportion of differences (pS and pN) as
pS = Sd / S
pN = N d / N
apply the JC69 correction for multiple hits
3
4
dS = - n(1- pS )
4
3
3
4
dN = - n(1- pN )
4
3
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 50,. Oxford University Press Inc,, New York, USA.
Ser
TCT
Leu CTT
Ile ATT
Val GTT
TTC Phe 1/3
TTA Leu
TTG Leu 2/3
TTT
Phe
TGT
TAT
Cys
Tyr
National Taiwan University Chau-Ti Ting
Ser Thr Glu Met Cys Leu
TCA ACT GAG ATG TGT TTA
TCG ACA GAG ATA TGT CTA
Ser Thr Glu Ile Cys Leu
National Taiwan University Chau-Ti Ting
CCC (Pro)
CAA (Gln)
Path I
S
R
CCC  CCA  CAA
(Pro)
(Pro)
(Gln)
Path II
R
R
CCC  CAC  CAA
(Pro)
(His)
(Gln)
National Taiwan University Chau-Ti Ting
Nei and Gojobori (1986)
1. Count synonymous and nonsynonymous sites: S and N
2. Count synonymous and nonsynonymous differences: Sd and Nd
3. Calculate the proportion of differences (pS and pN) as
pS = Sd / S
pN = N d / N
apply the JC69 correction for multiple hits
3
4
dS = - n(1- pS )
4
3
3
4
dN = - n(1- pN )
4
3
Source: Ziheng Yang
2006. Computational Molecular Evolution., p. 50,. Oxford University Press Inc,, New York, USA.
Number of substitutions between two protein-coding genes
nondegernerate (L0): all the possible changes at this site are nonsynonymous
twofold degenerate (L2): one of the three possible changes is synonymous
fourfold degenerate(L4): all possible changes at the site are synonymous
The nucleotide differences in each class are further classified into
transitional (Si) and transversional (Vi) differences, where i = 0, 2,
and 4 denoted nondegerneracy, twofold degeneracy and fourfold
degeneracy, respectively.
All the substitutions at nondegenerate sites are nonsynonymous.
All the substitutions at fourfold degenerate sites are synonymous.
At twofold degenerate site, transitional changes are synonymous,
whereas transversitional changes are nonsynonymous.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 83. Sinauer Associates, Inc. Sunderland, MA, USA.
Transversion
UUU
UUC
UUA
UUG
Phe
Phe
Leu
Leu
UCU
UCC
UCA
UCG
Ser
Ser
Ser
Ser
nondegernerate
National Taiwan University Chau-Ti Ting
twofold degenerate/transition
twofold degenerate/transition
fourfold degenerate
The proportion of transitional differences at i-fold degenerate sites
between two sequences is calculated as
Si
Pi =
Li
Similarly, the proportion of transversional differences at i-fold
degenerate sites between two sequences is calculated as
Vi
Qi =
Li
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.
Kimura’s two-parameter method is used to estimate the number of
transitional (Ai) and transversional (Bi) substitutions per ith type site.
K = -(3/4) ln (1 – (4/3) p
Ai = (1/2) ln (ai) – (1/4) ln (bi)
Bi = (1/2) ln (bi)
K = Ai + Bi
Where ai =1/(1– 2 Pi –Qi), bi = 1/(1– 2Qi)
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.
The total number of substitutions per ith type of degenerate site,
Ki, is given by
Ki = Ai +Bi
A2 and B2 denote the numbers of synonymous and nonsynonymous
substitutions per twofold degenerate site, respectively.
K4 = A4 +B4 denote the numbers of synonymous substitutions per
fourfold degenerate site.
K0 = A0 +B0 denote the numbers of nonsynonymous substitutions
per nondegenerate site.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc. Sunderland, MA, USA.
then, the number of synonymous substitutions per synonymous site
(KS) and the number of nonsynonymous substitutions per
nonsynonymous site (KA) can be obtained by
L2 A2 + L4 K4
KS =
(L2/ 3) + L4
L2 B2 + L0 K0
KA =
(2L2/ 3) + L0
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.
Li (1993) and Pamilo and Bianchi (1993) proposed to calculated the
number of symnonymous substitution by taking (L2 A2 + L4 K4 )/ (L2
+ L4) as an estimate of the transition component of nucleotide
substitution at twofold and fourfold degenerate site
L2 A2 + L4 K4
+ B4
KS =
L2 + L4
L2 B2 + L0 K0
+ A0
KA =
L2 + L0
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.
Indirect estimations of the number of nucleotide substitution
Indirect estimate of K values are subject to much larger sampling
errors than those based on direct comparisons of nucleotide
Source: Dan Graur and Wen-Hsiung Li
sequence.
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc. Sunderland, MA, USA.
National Taiwan University Chau-Ti Ting
Number of Amino acid replacements between two proteins
From the comparison of two amino acid sequences, we can
calculate the observed proportion of different amino acid between
two sequences as
p = n/L
where n is the number of amino acid differences between two
sequences an L is the length of the aligned sequences.
A simple model that can be used to convert p into the number of
amino acid replacements between two sequences is the Poisson
process. The number of amino acid replacements per site, d, is
estimated as
d = – ln (1– p)
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.
Comparison of two homologous sequences involves the identification
if the location of deletions and insertions that might have occurred in
either of the two lineages since their divergence from a common
ancestor. This process is referred to as sequence alignment.
There are three types of aligned pairs:
A matched pair is on in which that same nucleotide appears in both
sequence.
A mismatched pair is a pair in which different nucleotides are found
in the two sequences.
A gap is a pair consisting a base from one sequence and a null base
from the other. Null base are denoted by -. A gap indicates that a
deletion has occurred in one sequence or an insertion has occurred in
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc. Sunderland, MA, USA.
the other.
Range of Alignment
ATTGTCAAAGGCTTGAGCTGATGCAT
GGCAGGCTTTA CTTACAAGGGTATCG
Mismatch
S=
Gap
(identities, mismatches) -
Score Max(S)
National Taiwan University Chau-Ti Ting
(gap penalties)
In evolutionary terms, each pair in an alignment represents an
inference concerning positional homology, i.e., a claim to the
effect that the two members of the pair descended from a common
ancestral nucleotide.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.
ATCGCATGGTTAACGACTG
    

ATCACATGGTTAA– –ACTCACC
National Taiwan University Chau-Ti Ting
Manual alignment by visual inspection
Advantages:
1) it uses the most powerful and trainable of all tools – the brain,
2) it allows the direct integration of additional data.
The main disadvantage of this method is that it is subjective and
unscalable, i.e., its results cannot be compared to those derived
from other methods.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.
The dot matrix
In a dot matrix, the two sequences to be aligned are written out as
column and row headings of a two-dimensional matrix. A dot is
put in the dot matrix plot at a position where the nucleotides in
the two sequences are identical. The alignment is defined by a
path through the matrix starting with the upper-left element and
ending with the lower-right element.
There are four possible types of steps in this path:
1) a diagonal step through a dot indicates a match,
2) a diagonal step through an empty element of the matrix indicates
a mismatch,
3) a horizontal step indicates a null nucleotide in the sequence on
the top of the matrix,
4) a vertical step indicates a null nucleotide in the sequence on the
left of the matrix.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 88. Sinauer Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 91. Sinauer
Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 92. Sinauer Associates, Inc. Sunderland, MA, USA.
Distance and similarity methods
The best possible alignment between two sequences, or the optimal
alignment, is the one in which the numbers of mismatches and gaps
are minimized according to certain criteria. Unfortunately, reducing
the number of mismatches usually results in an increase in the
number of gaps, and vice versa.
A:
B:
TCAGACGATTG
TCGGAGCTG
LA=11
LB=9
(I)
TCAG-ACG-ATTG
TC-GGA-GC-T-G
# of mismatches = 0
# of gaps = 6
(II)
TCAGACGATTG
TCGGAGCTG-TCAG-ACGATTG
TC-GGA-GCTG-
# of mismatches = 5
# of gaps = 1
# of mismatches = 2
# of gaps = 4
(III)
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.
As a consequence, we must find a common denominator with which
to compare gaps and mismatches. The common denominator is
called the gap penalty or gap cost. The gap penalty is a factor by
which gap values are multiplied to make the gaps equivalent in
value to the mismatches.
Fro any given alignment, we can calculate a distance or
dissimilarity index (D) between the two sequences in the alignment
as
D =  miyi + wkzk
where yi is the number of mismatches of type i, mi is the mismatches
penalty for an i-type of mismatch, zk is the number of gaps of length
k, and wk is a positive number representing the penalty of gaps of
Source: Dan Graur and Wen-Hsiung Li
length k.
2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.
Alternatively, the similarity between two sequences in an alignment
may be measured by a similarity index (S). For any given
alignment, the similarity between two sequences is
S = x –  wkzk
where x is the number of matches, zk is the number of gaps of length
k, and wk is a positive number representing the penalty of gaps of
length k.
In the most frequent used gap penalty systems, it is assumed that the
gap penalty has two components, a gap-opening penalty and a
gap-extension penalty.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc. Sunderland, MA, USA.
Using a linear gap penalty system in which the mismatch penalty is
1, the gap-open penalty is 2 and the gap-extension penalty is 6.
(I)
TCAG-ACG-ATTG
TC-GGA-GC-T-G
# of mismatches = 0
# of gaps = 6
D = (0 x 1)+(6 x 2)+6 (1–1)=12
(II)
TCAGACGATTG
TCGGAGCTG--
# of mismatches = 5
# of gaps = 1
D = (5 x 1)+(1 x 2)+6 (2–1)=13
(III)
TCAG-ACGATTG
TC-GGA-GCTG-
# of mismatches = 2
# of gaps = 4
D = (2 x 1)+(4 x 2)+6 (1–1)=10
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc. Sunderland, MA, USA.
Using a different penalty system in which the mismatch penalty is 1,
the gap-open penalty is 3 and the gap-extension penalty is 0.
(I)
TCAG-ACG-ATTG
TC-GGA-GC-T-G
# of mismatches = 0
# of gaps = 6
D = (0 x 1)+(6 x 3)=18
(II)
TCAGACGATTG
TCGGAGCTG--
# of mismatches = 5
# of gaps = 1
D = (5 x 1)+(1 x 3)=8
(III)
TCAG-ACGATTG
TC-GGA-GCTG-
# of mismatches = 2
# of gaps = 4
D = (2 x 1)+(4 x 3)=14
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.
Alignment algorithms
The Needleman-Wunsch algorithm used dynamic programming,
which is a general computational technique used in many fields of
study.
Dynamic programming can be applied to alignment problems
because similarity indices obey the following rule:
S1x, 1y = max S1x-1, 1y-1+ Sx,y
In which S1x, 1y is the similarity index for the two sequences up to
residue x in the first sequence and residue y in the second sequence,
max S1x-1, 1y-1 is the similarity index for the best alignment up to
residue x-1 in the first sequence and y-1 in the second sequence, and
Sx,y is the similarity score for aligning residues x and y.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc. Sunderland, MA, USA.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 96. Sinauer Associates, Inc. Sunderland, MA, USA.
Multiple sequence alignment
Multiple sequence alignment can be viewed as an extension of
pairwise sequences alignment, but the complexity of the
computation grows exponentially with the number of sequences
being considered and, therefore, it is not feasible to search
exhaustively for optimal alignment.
Most of the programs use some sort of incremental or progressive
algorithm, in which a new sequences is added to a group of already
aligned sequences in order of decreasing similarity.
It is usually advisable to take a look at the final multiple alignment,
as such alignments can be frequently improved by visual inspection.
Source: Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 97. Sinauer Associates, Inc. Sunderland, MA, USA.
Copyright Declaration
Work
Author/Source
Page
“Calculate of the distance
between two sequences is the
simplest phylogenetic
analysis … The distance
between two nucleotide
sequences is defined as the
expected number of
nucleotide substitutions per
site.”
Ziheng Yang
2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P2
“A simplest distance measure
is the proportion of different
sites, … site (i.e., some
changes are hidden)
Note: p is usable only for
high similar sequences, with
p < 5%.”
Ziheng Yang
2006. Computational Molecular Evolution., p. 3. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P3
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P4
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P5
“This simple model assumes
that substitutions occur
with … is . Because the
model involves a single
parameter,  it is called the
one-parameter model
Licensing
76
Work
Licensing
Author/Source
Page
P5, P6
National Taiwan University Chau-Ti Ting
“Since we start with A, the
probability hat this site is
occupied by A at time …
probability that A has
remained unchanged.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P6
“The probability of having A
at time 2 is … 1) the
nucleotide has remained
unchanged from time 0 to
time 2, and 2) the nucleotide
has changed to T, C, or G at
time 1, but has subsequently
reverted to A at time 2.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 68. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P7
“Using the above formulation,
we can show that the
following recurrence
equation applies to any …
We can rewrite this equation
in terms of the amount of
change in PA(t) per unit time
as”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 69. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P8
P13
National Taiwan University Chau-Ti Ting
77
Work
“In this model, the rate of
transitional substitution at
each nucleotide site is  per
unit time, whereas the rate of
each transversional
substitution is  per unit time.
“
Licensing
Author/Source
Page
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 71. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P14
P14, P15
National Taiwan University Chau-Ti Ting
“Let us consider the
probability that a site that has
A at time 0 will have A time t.
After one time unit, … of A
changing to either C or T is
2. Thus the probability of A
remaining unchanged after
one time unit is “
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 72. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P15
“Let Y(t) = the probability
that the initial nucleotide and
the nucleotide at time t differ
from each other by a
transition. “
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P19
“The probability, Z(t), that the
initial nucleotide and the
nucleotide at time t differ by
a specific type of
transversion is given by “
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 73. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P20
78
Work
Author/Source
Page
"Note that each nucleotide
subject to two types of
transversion, but only one
type of transition. Also”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P21
“If two sequences of length N
differ from each other at n
site, then the proportion
of … the actual number of
substitutions due to multiple
substitution or multiple hit at
the same site.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 74. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P22
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 75. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P23
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P24
Let us start with oneparameter model. In this
model, it is sufficient to
consider only I(t), which is the
probability that the
nucleotide … , C, G at this
site are P2AT(t) P2AC(t) P2AG(t),
respectively. Therefore,
Licensing
79
Work
“Note that the probability that
the two sequences are
different at a site at time t is p
= 1 I(t). Thus,”
Licensing
Author/Source
Page
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P25
P26
National Taiwan University Chau-Ti Ting
“The time of divergence
between two sequences is
usually given not known, and
thus we can not estimate .
Instead, we compute … K=
2(3t), where 3t is the
number of substitutions per
site in a single lineage.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P26
“Where p is observed
proportion of different
nucleotides between two
sequences.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P27
“In the case of two-parameter
model, the differences
between two sequences are
classified into transitions
and … substitutions per site
between two sequences, K, is
estimated by “
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 76. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P28
80
Work
Licensing
Author/Source
Page
“In this example, the two
models give essentially the
same estimate because the …
substitutions, K) is only only
slightly larger than the
uncorrected value (i.e., the
number of nucleotide
differences, p).”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P30
“When the degree of
divergence between two
sequences is large, and
especially in cases where …
of transversion, the two
parameter model tends to be
more accurate than the oneparameter model.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 77. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P32
“Several assumptions have
been made that are not
necessary … change in time,
so that the nucleotide
frequencies are maintained at
a constant equilibrium value
throughout their evolution.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 79. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P33
“Transition
changes
beween A and G, or between
T and C
Transversion
changes
between a purine and a
pyrimidine”
Marjorie A. Hoy
2003. Insect molecular genetics: an introduction to principles and applications,
2nd edition, p. 23. Academic Press. USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P34
81
Work
Licensing
Author/Source
Page
“Synonymous (silent
mutations)
Nucleotide
changes do not effect amino
acid sequence.
A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart. P34
2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and
Company. New York, USA.
http://www.ncbi.nlm.nih.gov/books/NBK21878/
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
“Nonsynonymous
(replacement mutations)
A change in
single nucleotide in a codon
can result in an amino acid
replacement.”
A. J. F. Griffiths, J. H. Miller, D. T. Suzuki, R. C. Lewontin, and W. M. Gelbart. P34
2000. An Introduction to Genetic Analysis, 7th edition. W. H. Freeman and
Company. New York, USA.
http://www.ncbi.nlm.nih.gov/books/NBK21878/
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
“Three definitions of the
‘transition/transversion rate
ratio’ are in use … ratio (R):
same as the first one but with
correction”
Ziheng Yang
2006. Computational Molecular Evolution., p. 17. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P35
“Overall, R is convenient to
use for comparing estimates
under different models, while
 is more suitable for
formulating the null
hypothesis of no
transition/transversion rate
difference.”
Ziheng Yang
2006. Computational Molecular Evolution., p. 18. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P35
82
Work
Licensing
Author/Source
Page
“With protein coding genes,
we have the advantage of
being able to distinguish …
comparison does not require
estimation of absolute
substitution rates or
knowledge of the divergence
time.”
Ziheng Yang
2006. Computational Molecular Evolution., p. 40. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P36
“Empirical models attempts
to describe the relative rates
of substitution between two
amino acids … interpretative
power and are particular
useful for study the forces
and mechanisms of gene
sequence evolution”
Ziheng Yang
2006. Computational Molecular Evolution., p. 41 Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P37
“The first empirical amino
acid substitution matrix was
constructed by Dayhoff and
colleagues … multiplication
of the PAM1 matrix.”
Ziheng Yang
2006. Computational Molecular Evolution., p. 41 Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P38
P39
National Taiwan University Chau-Ti Ting
Features of these matrices: …
Both factors may be
operating at the same time.
Ziheng Yang
2006. Computational Molecular Evolution., p. 42. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P41
83
Work
Licensing
Author/Source
Page
“Two distances are usually
calculated between proteincoding DNA sequences, …
methods: heuristic counting
methods and the ML method”
Ziheng Yang
2006. Computational Molecular Evolution., p. 49. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P42
“Three steps:
1. Count synonymous and
nonsynonymous sites
2. Count synonymous and
nonsynonymous differences
3. Calculate the proportion of
differences and correct for
multiple hits”
Ziheng Yang
2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P43
Wikipedia
http://en.wikipedia.org/wiki/Codon
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•Wikipedia Fundation Terms of Use
P44
P44
National Taiwan University Chau-Ti Ting
“Nei and Gojobori (1986)
1. Count synonymous and
nonsynonymous sites: S and
N … (pS and pN) as
apply the JC69 correction for
multiple hits “
Ziheng Yang
2006. Computational Molecular Evolution., p. 50. Oxford University Press Inc,,
New York, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P45, P49
84
Work
Licensing
Author/Source
Page
P46
National Taiwan University Chau-Ti Ting
P47
National Taiwan University Chau-Ti Ting
P48
National Taiwan University Chau-Ti Ting
“nondegernerate (L0): all the
possible changes at this
site … At twofold degenerate
site, transitional changes are
synonymous, whereas
transversitional changes are
nonsynonymous.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 83. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P50, P52
P51
National Taiwan University Chau-Ti Ting
“The proportion of
transitional differences at ifold degenerate …
degenerate sites between two
sequences is calculated as”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P53
85
Work
Licensing
Author/Source
Page
“Kimura’s two-parameter
method is used to estimate
the number of transitional (Ai)
and transversional (Bi)
substitutions per ith type site
… Where ai =1/(1– 2 Pi –Qi),
bi = 1/(1– 2Qi)”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P54
“The total number of
substitutions per ith type …
denote the numbers of
nonsynonymous substitutions
per nondegenerate site.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 84. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P55
“then, the number of
synonymous substitutions per
synonymous site (KS) and the
number of nonsynonymous
substitutions per
nonsynonymous site (KA) can
be obtained by”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P56
“Li (1993) and Pamilo and
Bianchi (1993) proposed to
calculated the number of
symnonymous substitution
by … of the transition
component of nucleotide
substitution at twofold and
fourfold degenerate site.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P57
86
Work
“Indirect estimate of K
values are subject to much
larger sampling errors than
those based on direct
comparisons of nucleotide
sequence”
Licensing
Author/Source
Page
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 85. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P58
P58
National Taiwan University Chau-Ti Ting
“From the comparison of two
amino acid sequences, …
The number of amino acid
replacements per site, d, is
estimated as”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P59
Comparison of two
homologous sequences
involves the identification if
the location of deletions and
insertions that might have …
A gap indicates that a
deletion has occurred in one
sequence or an insertion has
occurred in”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 86. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P60
P61
National Taiwan University Chau-Ti Ting
87
Work
“In evolutionary terms, each
pair in an alignment
represents an inference
concerning positional
homology, i.e., a claim to the
effect that the two members
of the pair descended from a
common ancestral nucleotide.”
Licensing
Author/Source
Page
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P62
P62
National Taiwan University Chau-Ti Ting
“Advantages:
1)it uses the most powerful
and trainable of all tools – the
brain,
2)it allows the direct … its
results cannot be compared to
those derived from other
methods.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P63
“In a dot matrix, the two
sequences to be aligned are
written out as column and
row headings of … a vertical
step indicates a null
nucleotide in the sequence on
the left of the matrix.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 87. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P64
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 88. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P65
88
Work
Licensing
Author/Source
Page
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P66
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P67
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 91. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P67
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 92. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P68
89
Work
Licensing
Author/Source
Page
“The best possible alignment
between two sequences, or
the optimal alignment, is the
one in which the numbers of
mismatches and gaps are
minimized according to
certain criteria. … TC-GGAGCTG- # of gaps = 4”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P69
“As a consequence, we must
find a common denominator
with which to compare gaps
and mismatches. … zk is the
number of gaps of length k,
and wk is a positive number
representing the penalty of
gaps of length k”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P70
“Alternatively, the similarity
between two sequences in an
alignment may be measured
by … gap penalty systems, it
is assumed that the gap
penalty has two components,
a gap-opening penalty and a
gap-extension penalty.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 93. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P71
Using a linear gap penalty
system in which the
mismatch penalty is 1, the
gap-open penalty … D = (2
x 1)+(4 x 2)+6 (1–1)=10
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 90. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P72
90
Work
Author/Source
Page
“Using a different penalty
system in which the
mismatch penalty is 1, the
gap-open penalty is 3 and
the … D = (2 x 1)+(4 x 3)=14”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P73
“The Needleman-Wunsch
algorithm used dynamic
programming, which is a
general computational
technique used in many fields
of study. … is the similarity
score for aligning residues x
and y.”
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 94. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P74
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 96. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P75
Dan Graur and Wen-Hsiung Li
2000. Fundamentals of Molecular Evolution., p. 97. Sinauer Associates, Inc.
Sunderland, MA, USA.
It is used subject to the fair use doctrine of:
•Taiwan Copyright Act Articles 52 & 65
•The "Code of Best Practices in Fair Use for OpenCourseWare 2009
(http://www.centerforsocialmedia.org/sites/default/files/10-305-OCWOct29.pdf)" by A Committee of Practitioners of OpenCourseWare in the
U.S. The contents are based on Section 107 of the 1976 U.S. Copyright Act
P76
“Multiple sequence
alignment can be viewed as
an extension of pairwise
sequences … as such
alignments can be frequently
improved by visual
inspection.”
Licensing
91
• Synonymous sites = 0 + 0 + 1/3
• Non-synonymous sites = 1+1+2/3