2002-09-10: Segregation Analysis I

Download Report

Transcript 2002-09-10: Segregation Analysis I

Lecture 5: Segregation Analysis I
Date: 9/10/02
 Counting number of genotypes, mating types
 Segregation analysis: dominant, codominant,
estimating segregation ratio
 Testing populations: polymorphism,
heterogeneity, heterozygosity, allele
frequency.
Probability: The Need for
Permutations and Combinations
 Often, particularly in genetics, the sample
space consists of all orders or arrangements
of groups of objects (usually genes or alleles
in genetics).
 Permutations, combinations, and
combinations with repetition exist to handle
this elegantly.
Probability: Permutation
 Definition: A permutation is the number of ways
one can order r elements out of n elements. It is
often written nPr and is calculated as
n!
n pr 
n  r !
 Example: How many different types of
heterozygotes exist when there are l alleles and we
distinguish order (e.g. paternal vs. maternal)?
Probability: Combination
 Definition: A combination is the number of
ways you can select r objects from n objects
without regard to order. It is written as nCr
and has value
 n
n!
n
Cr    
 r  r!n  r !
 Example: How many different
heterozygotes exist without regard to order
when there are l types of alleles?
Probability: Combination with
Repetition
 Definition: Suppose there are n different types of
elements and r are selected with replacement, then
the number of combinations is given by C’(n, r) =
n+r-1Cr.
 Examples:


How many genotypes are possible when there are l
alleles?
How many mating types are possible when there are
l alleles?
Review: Segregation Ratio
 Recall that the law of segregation states that
one of the two alleles of a parent is randomly
selected to pass on to the offspring.
 Definition: The segregation ratios are the
predictable proportions of genotypes and
phenotypes in the offspring of particular
parental crosses. e.g. 1 AA : 2 AB : 1 BB
following a cross of AB X AB.
Segregation Ratio Distorition
 Definition: Segregation ratio distortion is a
departure from expected segregation ratios.
The purpose of segregation analysis is to
detect significant segregation ratio distortion.
A significant departure would suggest one of
our our assumptions about the model wrong.
Segregation Analysis: What it
Teaches Us
 Genetic model for a single locus gene: dominant,
codominant, truly single locus
 Other genetic information: selection-free,
completely penetrant.
 Data quality: systematic error, non-random
sampling.
Few important genes are single-locus. Often single
locus analysis is used to verify marker systems.
Segregation Analysis:
Experimental Design
 Run a controlled cross with known expected
segregation ratios. OR
 Sample offspring of particular mating type
with known expected segregation ratios.
 Verify segregation ratios.
Autosomal Dominant
Mating
Type
Genotype
DD
Dd
dd
DDxDD
1
0
0
0.5
0.5
0
A DDxDd
DDxdd
0
1
0
B DdxDd 0.25
0.5
0.25
C Ddxdd
0
0.5
0.5
ddxdd
0
0
1
Phenotype
Dominant Recessive
1
0
1
0
1
0
0.75
0.25
0.5
0.5
0
1
Autosomal Dominant: The
Data and Hypothesis
 Obtain a random sample of matings between
affected (Dd) and unaffected (dd)
individuals.
 Sample n of their offspring and find that r are
affected with the disease (i.e. Dd).
 H0: proportion of affected offspring is 0.5
Autosomal Dominant:
Binomial Test
 H0: p = 0.5
 If r n/2

observe 29
p-value = 2P(X  r)
 If r > n/2
p-value = 2P(X n-r)
n
c
n
 P(X  c) =   1 

 x 2
x  0   

p-value = 0.32
Autosomal Dominant: Standard
Normal Test
 m = np
 s2 = np(1-p)
 Z  X  np ~ N np, np1  p 
np1  p 1/ 2
 Under H0, X ~ N(n/2,n/4)
r n/2

z
n / 4
1/ 2
 1.13
observe 29
p-value = 0.26
Autosomal Dominant: Pearson
Chi-Square Test
 The distribution of the sum of k squares of iid
standard normal variables is defined as a chi-square
distribution with k degree of freedom.
2


X

np
 Z2 
~ 2
np1  p 
1
2
2








X

np
n

X

n
1

p
 Z2 

np
n1  p 
2


r

n
/
2
 z2 
n/4
 1.28
p-value = 0.26
Continuity Correction
 Both the normal and chi-square are
continuous distributions, but our data is not.
 Continuity correction for Normal: r = 28.5
corrected p-value = 0.32
 Continuity correction for Chi-Square:
r = 28.5; n-r = 21.5
corrected p-value = 0.32
Autosomal Dominant:
Likelihood Ratio Test
 n r
nr






L
p

p
1

p
 Write likelihood:
r
 
r
 Calculate the MLE under HA:
 Calculate the G statistic:
pˆ 
n
c
oi
G  2log LA  log L0   2 oi log
ei
i 1
r
nr

 2 r log
 n  r  log
0.5
0.5 

2
 Determine G distribution: G ~ 1
 Calculate p-value = 0.26
Estimating Segregation Ratio:
MOM
 first moment = np
 sample moment = r
 MOM: np = r
 MOM estimate: p  r
n
Estimating Segregation Ratio:
Likelihood Method
 Set score to 0:
r nr

0
pˆ 1  pˆ
 Solve for mle:
r
pˆ 
n
Estimating Confidence Interval
for Segregation Ratio
 Our estimate is X/n, where X is the random variable
representing the number of “successes” observed
and n is the sample size.
 E(X/n) = E(X)/n = np/n = p
 Var(X/n) = Var(X)/n2 = np(1-p)/n2 = p(1-p)/n
1/ 2
ˆ
ˆ




p
1

p
/
n
 SE(X/n) =
 Therefore, X/n is unbiased and we can obtain a
confidence interval using a normal approximation
with SE(X/n).
Estimating Confidence Interval
for Segregation Ratio
29
pˆ 
 0.58
50
1/ 2
 29 21 


SE  pˆ    50 50 
 50 




 0.0698
 pˆ 1.96SE, pˆ 1.96SE   0.443,0.717
Segregation Analysis:
Codominant Loci I
Mating Type
DDxDD
DDxDd
DDxdd
DdxDd
Ddxdd
ddxdd
DD
1
0.5
0
0.25
0
0
Genotype
Dd
0
0.5
1
0.5
0.5
0
dd
0
0
0
0.25
0.5
1
Segregation Analysis:
Codominant Loci II
 All 6 mating types are identifiable.
 Each mating type can be tested for agreement with
expected segregation ratios.
 Some mating types result in 3 types of offspring.
Must use Chi-Square or likelihood ratio test.
Multiple Populations: Testing
for Heterogeneity
 Suppose you observe segregation ratios in samples
of size n in m populations.
 Calculate a total chi-square:
m n  o  e 2 
ij
ij
2
 total   

i 1 j 1 

 eij
 Calculate a pooled chi-square: 2

n
2
pooled

j 1
m
 m

  oij   eij 
i 1
 i 1

m
e
i 1
ij
Multiple Populations: Testing
for Heterogeneity
 Then,

2
total

2
pooled
~
2
n ( m1)
Multiple Populations: Testing
for Heterogeneity
 Alternatively, one may calculate G statistics.
2
 Then, Gtotal –Gpooled is also distributed as  n ( m1)

 oij 
Gtotal  2 oij log  
 e 
i 1 j 1 
 ij 


 m

  oij 
n  m

  i 1 

Gpooled  2   oij  log m
j 1  i 1
  e 
  ij 

 i 1 

m
n
Multiple Populations: Example
 In Mendel’s F2 cross of smooth and wrinkled
inbred pea lines, he sampled 10 plants and
counted the number of smooth and wrinkled
peas produced by each of those plants.
 Is there heterogeneity between plants?
 Further tests show that
 single gene controls smooth vs. wrinkled
 smooth is dominant to wrinkled
Screening Markers for
Polymorphism
 An important step in designing mapping studies is
to find markers that show polymorphism. We are
interested in tests for polymorphism.
 A false negative would result if the marker was
truly polymorphic, but our test showed it to be
monomorphic.
 A false positive would result if the marker was truly
monomorphic, but our test showed it to be
polymorphic.
Testing for Polymorphism:
Backcross 1:1
 You design a backcross experiment to test for
polymorphism at a marker of interest. You
sample n offspring of the backcross.
 P(monomorphic) = 2(0.5)n
Testing for Polymorphism: F2
codominant 1:2:1
 You design a F2 cross with a marker that is
codominant. You sample n F2 individuals.
 P(monomorphic) = 2(0.25)n + (0.5)n
Testing for Polymorphism: F2
dominant marker
 You design an F2 cross, but this time observe
a dominant marker. You sample n F2
individuals.
 P(monomorphic) = (0.75)n + (0.25)n
Power of Test for
Polymorphism
Power to Detect Polymorphism
1.2
0.8
1:1
0.6
1:2:1
0.4
3:1
0.2
Sample Size
19
17
15
13
11
9
7
5
3
0
1
Power
1
Estimating Heterozygosity
l
H  1  p
i 1
2
i
n 
2
ˆ
H
1   pˆ i 
n  1  i 1 
l
2
l
l


n


3
2
Var Hˆ 
p

p
 i  
2  i
n  1  i 1
 i 1  
 
Estimating Allele Frequency
 It is often assumed that alleles have equal
frequencies when there are many alleles at a
locus. This assumption can result in false
positives for linkage, so it is important to test
allele frequencies.
 Suppose there are l possible alleles A1, A2,
…. You observe nij genotypes AiAj.
 You estimate genotypes frequencies p̂ij
Estimating Allele Frequencies
1 l
pˆ i  pˆ ii   pˆ ij
2 j i

1
Var  pˆ i  
pi 1  pi   pi2  pii
2n
pi 1  pi 

under HWE
2n
1
 pij  4 pi p j 
Cov pˆ i , pˆ j  
4n
1

pi p j under HWE
2n

Probability of Observing an
Allele
 Suppose there is an allele Ai with frequency
pi. What is the probability of sampling at
least one allele of type Ai?
Pobserving at least one allele Ai   1  1  pi 
2n
sample
size
calculation
log 1   i 
n
2 log 1  pi 
Probability of Observing
Multiple Alleles
 Let i be the probability of observing at least one
allele of type i.
l
 There are jm   m ways of selecting m different
 
alleles and an associated probability (jm) of
detecting at least one of each calculated from the i.
 Then we can calculate the probability of observing
k or more alleles by summing over these
probabilities for k, k+1, …, l.
Approximate Probability of
Observing k or More Alleles
 The above procedure becomes computationally
difficult when there are many alleles and the
frequencies are unequal.
 There is a Monte Carlo approximation.
 Select a random variable Ii to be 1 with probability
i and 0 otherwise.
 Compute I   I for b bootstrap trials. The
proportion of trials with Ik is an estimate of the
probability of observing k or more alleles.
l
i 1
i
Summary
 Permutation and combinations: knowing how to
count number of genotypes, mating types, etc.
 Testing segregation ratios for dominant and
codominant loci.
 Testing for population heterogeneity.
 Screening for polymorphism.
 Estimating heterozygosity, probability of observing
and allele.