2002-09-12: Segregation Analysis II
Download
Report
Transcript 2002-09-12: Segregation Analysis II
Lecture 7: Linkage Analysis I
Date: 9/16/02
Clean up segregation analysis
Steps of Segregation Analysis
Identify mating type(s) where the trait is expected
to segregate in the offspring.
Sample families with the given mating type from
the population.
Sample and score the children of sampled families.
Test H0: “expected segregation ratio” or estimate
segregation ratio.
But, you knew it wouldn’t
always be that easy…
Appropriate mating types may not be identifiable.
Offspring of Dr x Dr cross segregate, but this mating
type is indistinguishable from DD x DD and others.
An incompletely penetrant trait. Some appropriate
mating types fail to be detected because the trait is
invisible.
The trait is rare and you need to enrich your sample
for affecteds. You collect a nonrandom sample.
Segregation Analysis for
Autosomal Recessive Genes
Mating
Type
Genotype
Phenotype
DD
Dr
rr
Dominant Recessive
DDxDD 1
0
0
1
0
DDxDr 0.5
0.5
0
1
0
DrxDr 0.25
0.5
0.25
0.75
0.25
DDxrr
0
1
0
1
0
Drxrr
0
0.5
0.5
0.5
0.5
rrxrr
0
0
1
0
1
Avoid Contaminating Mating
Types
Controlled crosses.
Select only the appropriate mating types by
ascertaining them through their offspring.
Ascertainment Procedure
Definition: An ascertainment procedure is
the way in which families come to be
included in a study.
The ascertainment procedure may result in
incomplete selection or over-selection.
Exclude those families that happen to have all
normal offspring.
Include those families which don’t segregate.
Ascertainment Terminology I
Definition: Affected individuals that are identified
independently of all other individuals are called
probands.
Definition: Other affected individuals in a family
with a proband are called secondary cases.
Definition: The ascertainment probability () is the
probability that an affected individual in the
population is identified as a proband.
Ascertainment Terminology II
Definition: Complete ascertainment or more
appropriately, truncated ascertainment, is the case
where =1, so all families with affected children are
identified and all their children are probands. Note
that this does not imply complete selection.
Definition: Single ascertainment is the situation
where all ascertained families will have just one
proband.
Ascertainment Bias: Sex-Based
Dizygotic twins: Each twin is independently
assigned sex with equal probability of being
male or female.
PFM or MF
1
1
Psister one male twin
PMM
2
2
4
2
1
1
PMF or FM 2
2
2
2
1
1
PFF
4
2
PFM or MF or MM
1/ 2 2
3/ 4 3
1
Pbrother one male twin
3
Ascertainment Bias: Sex-Based
(contd)
Phas sister AND sample male
Phas sister sample male
Psample male
1/ 4 1
1/ 2 2
By randomly sampling males we only
identify only ½ of the cases where a male
twin has a sister.
Ascertainment Bias: SizeBased
Suppose a proportion kx of families have x
siblings.
xkx
x
x k x
2
2
x
A random individual derives from a sibship
of size x with probability:
xkx
xkx
qx
zk z
z
Ascertainment Bias: SizeBased (contd)
The expected number of sibs of a random
individual is:
q x 1
x
x
x
k x x1 x
2
1
Thus, by sampling random individuals, we
tend to sample more from large families and
we increase the average number of sibs.
Ascertainment Bias: Genetic
Samples
In genetic samples, the ascertainment bias is
trait-based.
Enrich for a rare trait by sampling an
individual who is affected and those who are
related to him/her.
Obviously, among those who are related, the
probability of being affected is higher than in
an unbiased sample from the population.
Family Ascertainment
Probability
pr 1 1
r
Complete ascertainm ent : 1 pr 1
Small : 1-1-π 1 1 r r
r
Truncated Ascertainment
( =1)
Consider only families of size s.
Let random variable Xi be the number of
affected offspring in the ith family.
Xi~Bin(s, 0.25)
In complete ascertainment, all families with
Xi>0 are included in the study.
We seek the distribution of Xi | Xi>0.
Distribution of Xi | Xi > 0
(Truncated Binomial)
Call this random variable on the new sample
space {1,2,…,s}Yi.
PYi r PX i r X i 0
P X i r , X i 0
P X i 0
s r
s r
p 1 p
r
s
1 1 p
Expected Segregation Ratio
with Truncated Ascertainment
Let pt be the segregation ratio given the
truncated ascertainment procedure.
s r
s r
r
p
1
p
Y 1 s r
E pt E
s
s
s
1
1
p
r 1
p
s
1 1 p
biased!
Example: Truncated
Ascertainment
Number
Number Affected
of
1
2 3 4 5 Total
Probands
1
140 80 35 4 0 259
2
52 12 7 1 72
3
7 0 0
7
4
2 0
2
5
0
0
Total
140 132 54 13 1 340
pt = 0.33
623
1700
0.3665
pˆ
Truncated Ascertainment:
Estimating p
The expression for pt gives us a means of
estimating p. We observe pt, assume truncated
ascertainment and use p to estimate p.
Indeed, since p is a function of pt, we can use
previous variance formula results to get an
approximate variance for p.
Unfortunately, the equation is not analytically
solvable.
Estimating p: EM Algorithm
(E)
Incomplete data: the number of
unascertained families with 0 affected
offspring. Call this Ui at iteration i.
Expectation Step: Assume pi and calculate:
EU i P0 affected EDr Dr mating types
1 pi ns EU i
s
ns 1 pi
EU i
s
1 1 pi
s
Estimating p: EM Algorithm
(M)
Compute the maximum likelihood estimate
pi+1:
Observed number of affected offspring
pˆ i 1
Total number of observed offspring
s
ra
r 1
r
sns sU i
Incomplete Ascertainment: The
Norm
Any time there are affecteds in the study that
are NOT probands, the assumption of
truncated ascertainment does not apply.
Instead we have incomplete ascertainment.
The Proband Method
Use the proband to identify the mating type,
but then leave it out of subsequent
calculations.
n
~
p
b r 1
i 1
n
i
i
b s 1
i 1
i
i
The Proband Method:
Estimating
Again, use the proband only to identify the
mating type. Count only the other siblings.
n
~
b 1 b
i 1
n
i
i
b r 1
i 1
i
i
Example: Proband Method
Number Number
Total
Affected Probands Siblings
1
140
(s-1)*140
2
Affected
Siblings
0
80+2*52 (s-1)*184 (2-1)*184
Proband
Siblings
0
2*52
:
:
:
:
:
Total:
260
520
210
131
430
~
p
0.2488
1728
210
~
0.488
430
Singles Method
A single is a proband who is the only
proband in a family. Singles are not
considered effective observations because
they are observed only through their affected
status.
Let d be the number of singles in a sample of
n families.
Singles Method Estimates
n
n
p
r d
i 1
n
i
s d
i 1
i
b d
i 1
n
i
r d
i 1
i
Singles Method Example
There are 259 singles.
623 259
p
0.2526
1700 259
434 259
0.481
623 259
Variance For Proband and
Singles Method
The proband and singles method both give
estimators that are the quotient of two
random variables. Approximate equations fr
variance exist:
X X Var X Var Y 2Cov X , Y
Var E
2
2
Y
Y
E
X
E
Y
E
X
E
Y
X E X Cov X , Y E X Var Y
E
2
E X
EY 2
Y EY
2
Likelihood Method
P(offspring is proband) = p.
P(family ascertained) = 1-(1- p)s.
P(r affected and b probands | ascertained)
then is:
PB b X r; PX r; s, p
PX r , B b B 0; s, , p
PB 0; s, , p
r b
r b s r
1 p 1 p s r
b
r
s
1 1 p
Likelihood Method (contd)
Each family is an independent observation
(assuming they are not related).
Newton-Raphson multiple parameter update:
pm 1 pm
S p,
I p,
m 1 m
Likelihood Method: Testing
Hypotheses
Nested models can be tested using the log
likelihood ratio.
Interesting hypotheses include:
Complete ascertainment: =1
Recessive inheritance: p=0.25
Complete ascertainment and recessive
inheritance: =1 and p=0.25
Likelihood Method: Sample
Results
Model
general
ca
recessive
p
0.25 0.48
0.31
1
0.25 0.47
chisquare
df
p-value
0
0
-
11736
1
0
0.04
1
0.84
Cannot rejective the recessive disease hypothesis. Can reject
complete ascertainment hypothesis.
Rejecting a Null Hypothesis
Not a single locus.
Ascertainment procedure.
Selection
Environmental effects mimicing phenotype.
Incomplete penetrance.
More Complex Ascertainment
Models
We have considered ascertainment
procedures where the probability of
ascertainment was of the form:
r
pr 1 1
Allowing to vary covers a wide number of
cases, but not all. Still imposes a funcional
relationship.
Summary
Ascertainment procedure and the impact of
sampling.
Segregation analysis when the ascertainment
procedure is nonrandom. Specifically, recessive
trait.
Truncated ascertainment vs. incomplete
ascertainment.
Proband method; singles method; likelihood
method.