2002-09-12: Segregation Analysis II

Download Report

Transcript 2002-09-12: Segregation Analysis II

Lecture 7: Linkage Analysis I
Date: 9/16/02
 Clean up segregation analysis
Steps of Segregation Analysis
 Identify mating type(s) where the trait is expected
to segregate in the offspring.
 Sample families with the given mating type from
the population.
 Sample and score the children of sampled families.
 Test H0: “expected segregation ratio” or estimate
segregation ratio.
But, you knew it wouldn’t
always be that easy…
 Appropriate mating types may not be identifiable.


Offspring of Dr x Dr cross segregate, but this mating
type is indistinguishable from DD x DD and others.
An incompletely penetrant trait. Some appropriate
mating types fail to be detected because the trait is
invisible.
 The trait is rare and you need to enrich your sample
for affecteds. You collect a nonrandom sample.
Segregation Analysis for
Autosomal Recessive Genes
Mating
Type
Genotype
Phenotype
DD
Dr
rr
Dominant Recessive
DDxDD 1
0
0
1
0
DDxDr 0.5
0.5
0
1
0
DrxDr 0.25
0.5
0.25
0.75
0.25
DDxrr
0
1
0
1
0
Drxrr
0
0.5
0.5
0.5
0.5
rrxrr
0
0
1
0
1
Avoid Contaminating Mating
Types
 Controlled crosses.
 Select only the appropriate mating types by
ascertaining them through their offspring.
Ascertainment Procedure
 Definition: An ascertainment procedure is
the way in which families come to be
included in a study.
 The ascertainment procedure may result in
incomplete selection or over-selection.


Exclude those families that happen to have all
normal offspring.
Include those families which don’t segregate.
Ascertainment Terminology I
 Definition: Affected individuals that are identified
independently of all other individuals are called
probands.
 Definition: Other affected individuals in a family
with a proband are called secondary cases.
 Definition: The ascertainment probability () is the
probability that an affected individual in the
population is identified as a proband.
Ascertainment Terminology II
 Definition: Complete ascertainment or more
appropriately, truncated ascertainment, is the case
where =1, so all families with affected children are
identified and all their children are probands. Note
that this does not imply complete selection.
 Definition: Single ascertainment is the situation
where all ascertained families will have just one
proband.
Ascertainment Bias: Sex-Based
 Dizygotic twins: Each twin is independently
assigned sex with equal probability of being
male or female.
PFM or MF 
1
1
Psister one male twin  
PMM     
2
 2
4
2
1
1
PMF or FM   2  
2
 2
2
1
1
PFF    
4
 2
PFM or MF or MM 
1/ 2 2


3/ 4 3
1
Pbrother one male twin  
3
Ascertainment Bias: Sex-Based
(contd)
Phas sister AND sample male 
Phas sister sample male  
Psample male 
1/ 4 1


1/ 2 2
 By randomly sampling males we only
identify only ½ of the cases where a male
twin has a sister.
Ascertainment Bias: SizeBased
 Suppose a proportion kx of families have x
siblings.
   xkx
x
   x    k x
2
2
x
 A random individual derives from a sibship
of size x with probability:
xkx
xkx
qx 

 zk z 
z
Ascertainment Bias: SizeBased (contd)
 The expected number of sibs of a random
individual is:
 q x 1  
x
x
x
k x x1  x 

2
   1 

 Thus, by sampling random individuals, we
tend to sample more from large families and
we increase the average number of sibs.
Ascertainment Bias: Genetic
Samples
 In genetic samples, the ascertainment bias is
trait-based.
 Enrich for a rare trait by sampling an
individual who is affected and those who are
related to him/her.
 Obviously, among those who are related, the
probability of being affected is higher than in
an unbiased sample from the population.
Family Ascertainment
Probability
pr  1  1   
r
Complete ascertainm ent :   1  pr  1
Small  : 1-1-π   1  1  r   r
r
Truncated Ascertainment
( =1)
 Consider only families of size s.
 Let random variable Xi be the number of
affected offspring in the ith family.
 Xi~Bin(s, 0.25)
 In complete ascertainment, all families with
Xi>0 are included in the study.
 We seek the distribution of Xi | Xi>0.
Distribution of Xi | Xi > 0
(Truncated Binomial)
 Call this random variable on the new sample
space {1,2,…,s}Yi.
PYi  r   PX i  r X i  0

P X i  r , X i  0 
P X i  0 
 s r
s r
  p 1  p 
r


s
1  1  p 
Expected Segregation Ratio
with Truncated Ascertainment
 Let pt be the segregation ratio given the
truncated ascertainment procedure.
 s r
s r




r
p
1

p
 
Y  1 s r
E pt   E   
s
s
s


1

1

p
 
r 1
p

s
1  1  p 
biased!
Example: Truncated
Ascertainment
Number
Number Affected
of
1
2 3 4 5 Total
Probands
1
140 80 35 4 0 259
2
52 12 7 1 72
3
7 0 0
7
4
2 0
2
5
0
0
Total
140 132 54 13 1 340
pt = 0.33
623
1700
 0.3665
pˆ 
Truncated Ascertainment:
Estimating p
 The expression for pt gives us a means of
estimating p. We observe pt, assume truncated
ascertainment and use p to estimate p.
 Indeed, since p is a function of pt, we can use
previous variance formula results to get an
approximate variance for p.
 Unfortunately, the equation is not analytically
solvable.
Estimating p: EM Algorithm
(E)
 Incomplete data: the number of
unascertained families with 0 affected
offspring. Call this Ui at iteration i.
 Expectation Step: Assume pi and calculate:
EU i   P0 affected EDr  Dr mating types 
 1  pi  ns  EU i 
s
ns 1  pi 
EU i  
s
1  1  pi 
s
Estimating p: EM Algorithm
(M)
 Compute the maximum likelihood estimate
pi+1:
Observed number of affected offspring
pˆ i 1 
Total number of observed offspring
s

 ra
r 1
r
sns  sU i
Incomplete Ascertainment: The
Norm
 Any time there are affecteds in the study that
are NOT probands, the assumption of
truncated ascertainment does not apply.
 Instead we have incomplete ascertainment.
The Proband Method
 Use the proband to identify the mating type,
but then leave it out of subsequent
calculations.
n
~
p
 b r  1
i 1
n
i
i
 b s  1
i 1
i
i
The Proband Method:
Estimating 
 Again, use the proband only to identify the
mating type. Count only the other siblings.
n
~ 
 b 1  b 
i 1
n
i
i
 b r  1
i 1
i
i
Example: Proband Method
Number Number
Total
Affected Probands Siblings
1
140
(s-1)*140
2
Affected
Siblings
0
80+2*52 (s-1)*184 (2-1)*184
Proband
Siblings
0
2*52
:
:
:
:
:
Total:
260
520
210
131
430
~
p
 0.2488
1728
210
~
 
 0.488
430
Singles Method
 A single is a proband who is the only
proband in a family. Singles are not
considered effective observations because
they are observed only through their affected
status.
 Let d be the number of singles in a sample of
n families.
Singles Method Estimates
n
n
p
r d
i 1
n
i
s d
i 1
i
 
b  d
i 1
n
i
r d
i 1
i
Singles Method Example
 There are 259 singles.
623  259
p
 0.2526
1700  259
434  259
 
 0.481
623  259
Variance For Proband and
Singles Method
 The proband and singles method both give
estimators that are the quotient of two
random variables. Approximate equations fr
variance exist:
 X    X   Var  X  Var Y  2Cov X , Y  
Var    E  



2
2




Y
Y
E
X
E
Y








E
X
E
Y
     

 X  E X  Cov X , Y  E X Var Y 
E  


2
E X 
EY 2
 Y  EY 
2
Likelihood Method
 P(offspring is proband) = p.
 P(family ascertained) = 1-(1- p)s.
 P(r affected and b probands | ascertained)
then is:
PB  b X  r;  PX  r; s, p 
PX  r , B  b B  0; s,  , p  
PB  0; s,  , p 
r b
r b  s  r
  1      p 1  p s  r
b
r
 
s
1  1  p 
Likelihood Method (contd)
 Each family is an independent observation
(assuming they are not related).
 Newton-Raphson multiple parameter update:
 pm 1   pm 
     S  p,  
I  p,  
  m 1    m 
Likelihood Method: Testing
Hypotheses
 Nested models can be tested using the log
likelihood ratio.
 Interesting hypotheses include:



Complete ascertainment: =1
Recessive inheritance: p=0.25
Complete ascertainment and recessive
inheritance: =1 and p=0.25
Likelihood Method: Sample
Results
Model
general
ca
recessive
p

0.25 0.48
0.31
1
0.25 0.47
chisquare
df
p-value
0
0
-
11736
1
0
0.04
1
0.84
Cannot rejective the recessive disease hypothesis. Can reject
complete ascertainment hypothesis.
Rejecting a Null Hypothesis
 Not a single locus.
 Ascertainment procedure.
 Selection
 Environmental effects mimicing phenotype.
 Incomplete penetrance.
More Complex Ascertainment
Models
 We have considered ascertainment
procedures where the probability of
ascertainment was of the form:
r
pr  1  1   
 Allowing  to vary covers a wide number of
cases, but not all. Still imposes a funcional
relationship.
Summary
 Ascertainment procedure and the impact of
sampling.
 Segregation analysis when the ascertainment
procedure is nonrandom. Specifically, recessive
trait.
 Truncated ascertainment vs. incomplete
ascertainment.
 Proband method; singles method; likelihood
method.