class02 - CS, Technion

Download Report

Transcript class02 - CS, Technion

Basic Principles of Population
Genetics
Lecture 2
Background Readings: Chapter 1, Mathematical and statistical
Methods for Genetic Analysis, 1997, Kenneth Lang.
.
This slide show follows closely Chapter 1 of Lang’s book. Prepared by Dan Geiger.
Founders’ allele frequency
A1/A2
B1/B2
2
1
A’1/A’2
B’1/B’2
A”1/A”2
3
B”1/B”2
In order to write down the likelihood function of a data given a
pedigree structure and a recombination value , one need to specify
the probability of the possible genotypes of each founder. Assuming
random mating we have,
Pr(G1,G2)=Pr(A1/A2, B1/B2) Pr(A’1/A’2, B’1/B’2)
The likelihood function also consists of transmission matrices that
depend on  and penetrances matrices to be discussed later.
2
Hardy-Weinberg and Linkage Equilibriums
The task at hand is to establish a theoretical basis for specifying the
probability Pr(A1/A2, B1/B2) of a multilocus, from allele frequencies.
We will derive under various assumptions the following two rules
which are widely used in genetic analysis (Linkage & Association)
and which ease computations a great deal. Of course, the assumptions
are not satisfied for all genetic analyses.
Hardy-Weinberg (HW) Equilibrium: Pr(A1/A2) =
PA1· PA2, namely, the probability of an ordered genotype
A1/A2 is the product of the frequencies of the alleles
constituting that genotype.
Linkage Equilibrium: Pr(A1B1) = PA1· PB1, namely,
the probability of a haplotype A1,B1 is the product of the
frequencies of the alleles constituting that haplotype.
These rules imply: Pr(A1/A2, B1/B2)=PA1· PA2 · PB1 · PB2
A1
A2
B1
B2
3
A simple setup to study HW equilibrium
Consider a bi-allelic locus A with alleles A1, A2 .
Let u,v, and w be the frequencies of unordered genotypes
A1/A1, A1/A2, A2/A2. Clearly, u+v+w=1.
How are these frequencies related to allele frequencies p1 and p2
of A1 and A2 ,respectively ? Answer: p1 = u + ½v and p2 = ½v + w
But, the Hardy-Weinberg equilibrium states that also
u = p 12
v = 2 p1 p2 (The factor 2 because A1/A2 genotypes are not ordered.)
w = p22
-------------
(p1+p2)2=1
Clearly these relations do not hold for arbitrary frequencies u,v,w ;
only for those values in the image of this polynomial mapping.
4
Assumptions made to Justify HW
1.
2.
3.
4.
5.
6.
7.
Infinite population size
Discrete generations
Random mating
No selection
No migration
No mutation
Equal initial genotype frequencies in the two sexes
HW equilibrium can be shown to hold under more relaxed sets of
assumptions as well. These assumption are clearly not universal.
5
What happens after one generation ?
Mating TypeUnordered genotype
Nature of Offspring
and segregation ratios
Frequency
of mates
A1/A1 x A1/A1
A1/A1
u2
A1/A1 x A1/A2
½ A1/A1 + ½ A1/A2
2uv
A1/A1 x A2/A2
A1/A2
2uw
A1/A2 x A1/A2
¼ A1/A1 + ½ A1/A2 + ¼ A2/A2
v2
A1/A2 x A2/A2
½ A1/A2 + ½ A2/A2
2vw
A2/A2 x A2/A2
A2/A2
w2
(u+v+w)2=1
Frequency of A1/A1 after one generation:
u’=u2+ ½(2uv)+ ¼v2 = (u+ ½v)2 = p12
6
After one generation …
So, after one generation the genotype frequencies u,v,w
change to u’,v’,w’ as follows (using the previous table):
Frequency of A1/A1: u’=u2+uv+ ¼v2= (u+ ½v)2 = p12
Frequency of A1/A2:
v’= uv+2uw + ½v2 + vw = 2(u+½v)(½v+w) = 2p1p2
Frequency of A2/A2:
w’=¼v2 + vw + w2 = (½v+w)2 = p22
Hardy-Weinberg seems to be established after one generation, but
u’,v’,w’ are frequencies for the second generation while p1 and p2
are defined as the allele frequencies of the first generation. Are
these also the allele frequencies of the second generation ?
Yes ! Because p’1= u’+ ½v’ = p12+p1p2=p1 and similarly p’2= p2.
7
After yet another generation …
Have we reached equilibrium ? Let’s look at one more
generation and see that genotype frequencies are now fixed.
Frequency of A1/A1:
u”=(u’+ ½v’)2 = (p12+p1p2)2 = p12
Frequency of A1/A2: v”= 2(u’+ ½v’)(½v’+w’)
= 2(p12+p1p2 )(p22+p1p2 )= 2p1p2
Frequency of A2/A2:
w”=(½v’+w’)2 = (p22 + p1p2)2 = p22
Hardy-Weinberg is indeed established after one generation; allele and
genotype frequencies do not change under the assumptions we have
made. Can you trace where each assumption is used ?
8
Use of Assumptions in the derivation
1.
2.
3.
4.
5.
6.
7.
Infinite population size
Discrete generations (mating amongst ith generation members only)
Random mating
No selection
No migration
No mutation
Equal initial genotype frequencies in the two sexes
Segregation ratios below assume 1,2,3,6,7
Mating TypeUnordered genotype
Nature of Offspring
and segregation ratios
Frequency
of mates
A1/A1 x A1/A2
½ A1/A1 + ½ A1/A2
2uv
Frequency formula of A1/A1 after one
generation: u2+ ½(2uv)+ ¼v2 assume 4,5.
9
An alternative justification
Previously, we started with arbitrary genotype frequencies u,v,w
and showed that they are modified after one generation to satisfy
HW equilibrium.
Now we start with arbitrary allele frequencies p1 and p2.
Random mating is equivalent to random pairing of alleles; each
person contributes one allele with the prescribed frequencies.
So the frequency of A1/A1 in the new generation is p12 , that of
A1/A2 is 2p1p2 , and that of A2/A2 is p22. Argument completed ?
The frequency p’1 of A1 in this new generation is p12+ ½(2p1p2 )= p’1
and the frequency of A2 in this new generation is p22+ ½(2p1p2 )=p’2.
So after one generation allele frequency is fixed and satisfies the HW
equilibrium .
Exercise: Generalize the argument to k-allelic loci.
10
HW equilibrium at X-linked loci
Consider an allele at an X-linked locus. At generation n, let qn
denote that allele’s frequency in females and rn denote that
allele’s frequency in males.
More explicitly,
Number of X - chromosome s in males having the allele
rn 
Total number of X - chromosome s in males
Questions:
•What is the frequency pn of the allele in the population ?
•Does pn converge and to which value p ?
•Does qn and rn converge to the same value ?
11
Argument Outline
Assuming equal number of males and females, we have
pn = 2/3 qn + 1/3 rn for every n.
Let p = p0 = 2/3 q0 + 1/3 r0. We will now show that both qn and
rn converge quickly to p (but not in one generation as before).
Having shown this claim, the female genotype frequency of A1/A1
must be p2 , that of A1/A2 is 2p(1-p) , and that of A2/A2 is (1-p)2,
satisfying HW equilibrium.
For male, genotypes A1 and A2 have frequencies p and 1-p.
12
The recursion equations
Because a male always gets his X chromosome from his
mother, and his mother precedes him by one generation,
rn = qn-1
(Eq. 1.1)
Similarly, females get half their X-chromosomes from females
and half from males,
qn = ½ qn-1+ ½ rn-1
(Eq. 1.2)
Eqs 1.1 and 1.2 imply:
2/3 qn+1/3 rn = 2/3(½ qn-1+ ½ rn-1 ) + 1/3 qn-1= 2/3 qn-1 + 1/3 rn-1
It follows that the allele frequency pn= 2/3 qn + 1/3 rn never changes
and remains equal to p0= p. To see that qn converges to p, we need to
relate the difference qn-p with the difference qn-1-p.
13
The fixed point solution
qn-p = qn- 3/2 p + ½ p
= ½ qn-1+ ½ rn-1 - 3/2 (2/3 qn-1 + 1/3 rn-1) + ½ p
= - ½ qn-1+ ½ p
(just cancel terms)
= - ½ (qn-1- p)
Continuing in this manner,
qn-p= - ½ (qn-1- p) = (- ½)2 (qn-2- p) = …= (- ½)n (q0- p)  0
So in each step the difference diminishes by half and qn
approaches p in a zigzag manner. Hence, rn = qn-1 also converges
to p. What does this mean ?
Having shown this claim, the female genotype frequency of A1/A1 must be p2 , that
of A1/A2 is 2p(1-p) , and that of A2/A2 is (1-p)2, satisfying HW equilibrium. For
male, genotypes A1 and A2 have frequencies p and 1-p. HW equilibrium is not
reached in one generation but gets there fast (quite there in 5 generations).
14
Linkage equilibrium
Let Ai be allele at locus A with frequency pi
Let Bj be allele at locus B with frequency qj
Denote the recombination between these loci by f and m for
females and males, respectively.
Let = (f + m )/2.
Ai
A’i
Linkage equilibrium means that Pr(Ai Bj) = piqj
Bj
B’j
We use the same assumptions employed earlier to demonstrate
linkage equilibrium, namely, to show that Pn(Ai Bj) converges to piqj
at a rate that is fastest when the recombination  is the largest.
15
Convergence Proof
No recombination
Pn(Ai Bj) = ½ [gamete from female] + ½ [gamete from male]
recombination
= ½ [ (1-f )Pn-1(Ai Bj) + f piqj ] + ½ [gamete from male]
= ½ [ (1-f )Pn-1(Ai Bj) + f piqj ] + ½ [ (1-m )Pn-1(Ai Bj) + m piqj ]
= (1- )Pn-1(Ai Bj) + piqj
So, Pn(Ai Bj) - piqj = (1- ) [Pn-1(Ai Bj) – piqj]=…= (1- )n[P0(Ai Bj) – piqj]
In short, we have established, n  0 1  n. For loci far on a
chromosome, the deviation from linkage is halved each generation.
For close loci with small , convergence is slow.
Exercise: Repeat this analysis for three loci (Problem 7, with
guidance, in Kenneth Lang’s book).
16
Ramifications for Association studies
Many diseases are thought to been caused by a single random
mutation that survived and propagated to offspring, generation after
generation.
Suppose there is a close marker:
Marker
Mutated locus
n  0 1  
n
Would we see association at random population samples?
If the mutation happened many generations ago, no trace will be
significant. Allele frequency will reach linkage equilibrium ! We
need a combination of close markers and recent allele age of the
disease. Association studies like that are also called linkage
disequilibrium mapping or LD mapping in short.
17
Selection and Fitness
Fitness of a genotype is the expected genetic contribution of that
genotype to the next generation, or to how many offspring it
contributes an allele. Let the fitness of the three genotypes of an
autosomal bi-allelic locus be denoted by wA/A, wA/a and wa/a .
If pn and qn are the allele frequencies of A and a, then the average
fitness under HW equilibrium, is wA/Apn2 + wA/a 2pnqn + wa/a qn2.
Conventions: Since only the ratios of fitness of various
genotypes matter, namely, wA/A /wA/a and wa/a /wA/a, we
arbitrarily set wA/a =1 and define wA/A = 1-r, wa/a = 1-s, where
r  1 and s  1.
Interpretation: When s=r=0, there is no selection.
When r is negative A/A has advantage over A/a. Similarly with negative s.
When r is positive (must be fraction), A/A has a disadvantage over A/a.
When both s and r are positive, there is a heterozygous advantage.
18
Assuming selection exists …
Our goal is to study the equilibrium of allele frequencies under
various selection possibilities (namely, different values for r and s).
In our new notations the average fitness wn at generation n is given
by wn  (1-r)pn2 + 2pnqn + (1-s)qn2 = 1-rpn2 -sqn2
A/A
A/a
a/a
To find equilibrium we study the difference pn  pn+1 - pn
First, note that pn+1 = [(1-r)pn2 + pnqn] / wn (multiply by 2 to see why)
pn  pn+1 - pn = [(1-r)pn2 + pnqn] / wn - pn
= [(1-r)pn2 + pnqn- (1-rpn2 -sqn2)pn] / wn
= [pnqn (s- (r+s) pn)] / wn
19
Interpretation when r>0 and s0
We just derived pn = [pnqn (s- (r+s) pn)] / wn
Convergence occurs when pn=0, namely, when pn=0, pn=1
(i.e., qn=0) or pn=s/(r+s). Where should it converge to ?
Claim: When (r>0 and s  0), pn  0, i.e., allele A disappears. In the
opposite case (r0 and s>0), allele a should be driven to extinction.
(Why is this extinction process sometimes halted in real life ? )
Proof: When (r>0 and s  0), the linear function g(p)=s-(r+s) p
satisfies g(0)  0 and g(1) < 0, hence it is negative at (0,1).
Thus, pn monotonically decreases at each step. So pn must
approach 0 at equilibrium. Similarly, with the other case.
20
when r and s have the same sign
pn 1 
s
s
 pn  pn 
rs
rs

s
)
rs  p  s
n
1  rpn2  sqn2
rs
( r  s ) pn qn ( p n 
1  rpn2  sqn2  (r  s) pn qn 
s 

p



n
2
2
1  rpn  sqn
rs


1  rpn  sqn 
s 
p

 n

1  rpn2  sqn2 
rs
s 

  ( pn ) pn 

rs

Conclusion I (for negative sign): If r and s are negative, (pn ) > 1,
so pn  1 for p0 above s/(r+s), and pn  0 for p0 below s/(r+s).
In other words, s/(r+s) is an unstable equilibrium.
21
when r and s are both positive
If both r and s are positive (Heterozygous advantage), then
0   ( pn ) 
Hence
pn 1 
s
rs
1  rpn  sqn
1
2
2
1  rpn  sqn
has a constant sign and declines in magnitude.
Conclusion II: If both r and s are positive, pn  s/(r+s) and this
point is a stable equilibrium.
Conclusion III (rate of convergence): If p0  s/(r+s), namely
the starting point is near equilibrium, then,
 s  r  s  2rs
0   ( pn )   

r

s

 r  s  rs
and we get (locally) a geometric convergence
n
s
s 
 s  
pn 
 
  p0 

rs
rs
rs 
22
Heterozygous advantage
If we observe a recessive disease that is maintained in high
frequency, how can we explain it ? Intuition says that it should
disappear.
However, if the A/a genotype has an advantage over other
genotypes, then the defective allele would be kept around.
Technically, if both r and s are positive, then the A/a genotype has
the best fit.
The best evidence for such phenomena is the sickle cell anemia.
In some part of Africa, this anemia, despite being a recessive
disease, is kept in high frequency. It turns out that the A/a
genotype appears to provide protection against malaria ! (so it has
high fit in swamp-like areas).
23
Sickle cell anemia - ‫אנמיה חרמשית‬
Medical Encyclopedia
Red blood cells, sickle cell
Sickle cell anemia is an inherited autosomal recessive blood disease in
which the red blood cells produce abnormal pigment (hemoglobin).
The abnormal hemoglobin causes deformity of the red blood cells into
crescent or sickle-shapes, as seen in this photomicrograph.
The sickle cell mutation is a single nucleotide substitution (A  T) at
codon 6 in the beta-hemoglobin gene, resulting in the following
substitution of amino acids: GAG (Glu)  GTG (Val).
Source (Edited): http://www.nlm.nih.gov/medlineplus/ency/imagepages/1212.htm
24
Facts about Sickle cell Disease
•Sickle Cell Disease is much more common in certain ethnic groups affecting
approximately one out of every 500 African Americans.
•Although sickle cell disease is inherited and present at birth, symptoms usually
don't occur until after 4 months of age.
•Blocked blood vessels and damaged organs can cause acute painful episodes.
These painful crises, which occur in almost all patients at some point in their lives.
Some patients have one episode every few years, while others have many episodes
per year. The crises can be severe enough to require admission to the hospital for
pain control.
•Sickle cell anemia may become life-threatening when damaged red blood cells
break down (and other circumstances). Repeated crises can cause damage to the
kidneys, lungs, bones, eyes, and central nervous system.
25
Balance of Mutation and Selection
Most mutations are neutral or deleterious. We discuss balance
between deleterious mutations and selection. Let  denote the
mutation rate from a to A. Suppose the equilibrium frequency of
allele A is p and of a is q=1-p.
When is a balance achieved between selection (say, preferring
allele a ) and mutation that changes allele a back to allele A ?
The frequencies p and q must satisfy the equilibrium condition:
(r>0, s=0)
pq  q 2
pq  (1  s )q 2
q
(1   )
q
(1   )
2
2
2
1  rp
(recessive disease)
1  rp  sq
This yields 1- rp2 = 1- and thus p2 = /r and a balance is
achieved that retains both alleles.
26
Finite Population Genetic Drift
1000
Alelle 10
900
800
Allele Frequency
700
600
500
Alelle 5
400
300
200
100
Source: Gideon Greenspan
0
0
100
200
300
400
500
600
700
800
900
1000
Generation
After 800 generations, by simulation, from the ten
alleles only two remain: numbered 5 and number 7.
27