Transcript monkey pig

What is positive selection?
dN = rate of nonsynonymous
substitution
dS = rate of synonymous
substitution
Let  ratio = ratio of dN/dS
Positive selection occurs when
the  ratio exceeds unity
__________________________________
Type of Selection
Outcome
__________________________________
Purifying selection
dN/dS < 1
No selection
Positive selection
dN/dS = 1
dN/dS > 1
__________________________________
How do we test for positive
selection?
1. Estimate means and variances of dN
and dS for all pair-wise species
comparisons.
2. Use t-test to determine if dN and
dS differ significantly.
Some problems…
1. Averages over all amino acid positions
in a protein.
Some problems…
1. Averages over all amino acid positions
in a protein.
2. Averages over all lineages.
Some problems…
1. Averages over all amino acid positions
in a protein.
2. Averages over all lineages.
3. Can detect positive selection only
when it is very strong and consistent
through evolutionary time.
Some more problems…
4. Ignores the phylogenetic framework
in which adaptive molecular evolution
occurs!
Suppose a significant  ratio is
detected between cow and pig
1. Cow
2. Deer
3. Whale
4. Hippo
5. Pig
6. Camel
1
---
2
ns
---
ns = non significant
3
ns
ns
---
4
ns
ns
ns
---
5
3.45
ns
ns
ns
---
6
ns
ns
ns
ns
ns
---
A phylogenetic perspective
 > 1?
 > 1?


 > 1?

 > 1?

Cow
Deer
Whale
Hippo
Pig
Camel
Outgroup
An example: lysozyme
evolution in colobine monkeys
• Colobine monkeys are leaf-eaters that
have evolved a complex foregut (like
ruminants).
• Stomach expresses a high level of the
bacteriolytic enzyme, lysozyme.
Phylogeny of Colobines and
Cercopithecines
Foregut
fermentation
evolved 
Hanuman langur
Purple-faced langur
Dusky Langur
Francois’ Langur
Proboscis monkey
Guereza colobus
Angolan colobus
Patas monkey
Vervet
Talapoin
Rhesus macaque
Allen’s monkey
Olive baboon
Sooty mangabey
Chimpanzee
from Messier & Stewart (1997)
Colobines
Cercopithecines
Phylogeny of Colobines and
Cercopithecines
 = 4.7

Hanuman langur
Purple-faced langur
Dusky Langur
Francois’ Langur
Proboscis monkey
Guereza colobus
Angolan colobus
Patas monkey
Vervet
Talapoin
Rhesus macaque
Allen’s monkey
Olive baboon
Sooty mangabey
Chimpanzee
from Messier & Stewart (1997)
Colobines
Cercopithecines
A Maximum-Likelihood (ML)
approach to the detection of
positive selection
A Maximum-Likelihood (ML)
approach to the detection of
positive selection
ML methods evaluate the probability
(i.e., likelihood) of obtaining a set of
DNA sequences given:
1. a specific phylogenetic tree
2. an explicit model of nucleotide
substitution.
Some details of the model…
 Implemented in the PAML package of
Yang (1997)
 Uses a Markov process to describe
substitutions between sense codons
 Parameters include:
transition/transversion ratio ()
codon frequencies ()
branch lengths scaled for time (t)
Testing for positive selection
involves comparing two models:
Model M7: Assumes  ratios follow a
beta distribution (i.e., constrained in
the interval 0-1).
Testing for positive selection
involves comparing two models:
Model M7: Assumes  ratios follow a
beta distribution (i.e., constrained in
the interval 0-1).
Model M8: Adds a second class of sites
to M7 at which  ratios can exceed
unity (i.e., positive selection).
Statistical testing can be done
by likelihood ratio tests (LRTs)
1. Obtain log likelihood score from
model M7, ℓM7 (null model).
Statistical testing can be done
by likelihood ratio tests (LRTs)
1. Obtain log likelihood score from
model M7, ℓM7 (null model).
2. Obtain log likelihood score from
model M8, ℓM8 (positive selection).
Statistical testing can be done
by likelihood ratio tests (LRTs)
1. Obtain log likelihood score from
model M7, ℓM7 (null model).
2. Obtain log likelihood score from
model M8, ℓM8 (positive selection).
3. Test for significance:
X 2 = 2 (ℓM8 – ℓM7 ) with 1 d.f.
Advantages of ML approach
1. Allows for formal statistical testing
by likelihood ratio tests.
Advantages of ML approach
1. Allows for formal statistical testing
by likelihood ratio tests.
2. Allows for individual codons subject
to positive selection to be identified.
Advantages of ML approach
1. Allows for formal statistical testing
by likelihood ratio tests.
2. Allows for individual codons subject
to positive selection to be identified.
3. Allows for positive selection to be
inferred along individual branches of
a phylogeny.
Application to the pantophysin
gene in marine gadid fishes
 Pantophysin is an integral membrane
protein localized to small (<100 nm)
cytoplasmic microvesicles
 Believed to function in a variety of
intracellular shuttling pathways
 Exact function remains unknown
Transmembrane structure of pantophysin
T S I V A L
V N E E I F A S F N Y P F R L M
Q
P
P
G
T P V Q Y
K
C
Q
K N G
V
T T E S
N
F
I
L
S
Y N G I
T
T
T
S S
G
A
S
E
P
S
Y
S
S
C T
A G
L
G
L
N
S
V
S
I
V
A
L
F
L
S
G
W
F
F
L
S
L
F
Y
L
L
G
V
V
I
Y
F
G G Q
A
V L Q N V V D M-NH2
F
I
P
Y
G
K
R
H
P
R
W
V
G
E
L
C
D
V
F
K
N
L
L
L
S
F
L
P
W
T
R
E
L
A
S
S
L
T
Y
K
R
Microvesicle
membrane
I
A
A
P
N
A
L
T
E
F
L
F
S
V
A
S
G
L
G
R
S
V
L
M
A
S
I
H
K
I
I
P V
G
T
F
S
T
W
T
F
N
D
V
A
G
V
F
A
R
D S
K
F
S
Lumen of
microvesicle
D V C
T
T
A
W
E
Q P E D A
P
T
N
E
P T-COOH
P
A
A G
F H K S
R
Cytoplasm
Transmembrane structure of pantophysin
Intravesicular
domains
T S I V A L
V N E E I F A S F N Y P F R L M
Q
P
P
G
T P V Q Y
K
C
Q
K N G
V
T T E S
N
F
I
L
S
Y N G I
T
T
T
S S
G
A
S
E
S
IV1
Y
D
G
L
N
S
V
S
I
L
F
L
S
G
W
F
F
L
S
L
Y
L
L
G
V
V
I
Y
F
G G Q
A
V L Q N V V D M-NH2
F
I
P
Y
G
K
R
H
P
R
W
V
G
E
L
C
D
V
F
K
N
L
L
L
S
F
L
P
W
T
R
E
L
A
S
S
L
T
Y
K
R
Microvesicle
membrane
I
A
A
P
N
A
L
T
E
F
L
F
S
V
A
S
V
A
L
G
R
S
G
L
M
A
F
S
H
K
V
I
A G
P V
L
S
I
C T
T
I
F
IV2
V
N
G
T
F
G
W
T
A
R
D S
K
V
Lumen of
microvesicle
D V C
T
T
A
W
F
A
Cytoplasmic
(Cyt) domains
S
S
F
S
TransMembrane
(TM) domains
P
E
Q P E D A
P
T
N
E
P T-COOH
P
A
A G
F H K S
R
Cytoplasm
BA105A
BS39A
BA107A
BA108A
Genealogy of PanI
alleles in the
100
Atlantic cod
BA112A
BA143A
BS21A
BS29A
IC70A
IC74A
BS71A
BS72A
BA126A
BA115A
BA128A
BS49A
BS53A
BA132A
BS81A
BS87A
NS1A
NS12A
NS28A
NS73A
NS79A
NS91A
NS34A
NS41A
NS58A
NS74A
IC2A
IC30A
NF24A
NF42A
NF88A
NF94A
NF142A
NF158A
NF162A
BA138A
BA140A
BA149A
BS20A
NS83A
NF17A
NF73A
BS31A
BS64A
NS68A
NS70A
IC6A
IC8A
IC9A
IC41A
IC80A
IC42A
IC61A
IC78A
NF6A
NF56A
NF11A
NF36A
PanIA alleles
(N = 64)
BA105B
BA107B
BA108B
BA112B
BA128B
NS1B
NF88B
BA115B
BA126B
BA132B
BA138B
BA140B
BA143B
BA149B
BS20B
BS21B
BS29B
BS31B
BS39B
BS49B
BS53B
BS64B
BS71B
BS72B
BS81B
BS87B
NS12B
NS28B
NS41B
NS68B
NS70B
NS74B
NS91B
IC2B
IC6B
IC8B
IC25B
IC30B
IC41B
IC42B
IC61B
IC70B
IC74B
IC78B
IC80B
NF11B
NF24B
NF36B
NF56B
NF73B
NF94B
NF158B
100
1 change
PanIB alleles
(N = 64)
NS34B
NF6B
NF17B
NF162B
NS58B
NS73B
NS79B
NS83B
NF42B
NF142B
Gadus ogac
Amino acid differences between PanIA and PanIB
alleles
E
•
T
Q
•
•
IV1
SV N
P
GK
P V
C
K N G
Q
V
T S I V A L
E I F A S F N Y P F R L MQ
P
P
T T E S
N
S
•
TS
G
N
C T
V
A G
D
Y N G I
T
T
G
S
R
D S
K
F
L
I
D V C
T
T
A
W
Y
S
T
F
L
W
N
A
T
S
T
G
V
S
I
V
A
L
F
L
S
I
S
S
V
F
G
W
F
F
L
S
L
F
L
L
T
A
L
G
V
V
Y
P
P
R
F
I
Y
G
R
S
L
T
Y
K
R
F
G G Q
A
V L Q N V V D M-NH2
W
P
H
L
V
G
E
K
C
D
V
F
E
N
L
L
L
S
F
L
P
W
T
R
I
F
L
A
S
V
Microvesicle
membrane
I
A
E
L
N
A
S
L
L
F
Y
I
A
S
I
F
G
R
G
V
A
M
K
F
A
H
G
A
E
S
P V
L
S
Y
Lumen of
microvesicle
S
S
Cytoplasm
K
E
Q P E D A
P
T
N
E
P T-COOH
P
A
A G
F H K S
R
Amino acid substitutions within PanI allelic classes
_____________________________________________________
______
Allele
Codon
Position
Amino Acid
Change
Location Classificationa
Distribution
in sample
_____________________________________________________
______
PanIA
PanIB
61
64
79
43
61
64
Lys to Gln
Asn to Thr
Ser to Thr
Glu to Val
Lys to Asn
Asn to Asp
IV1
IV1
IV1
IV1
IV1
IV1
Radical
Radical
Radical
Radical
Radical
Radical
Fixed
Fixed
Fixed
Fixed
Fixed
Fixed
_____________________________________________________
______
a
following Taylor (1986)