04/02/09 Overview to genetic model based

Download Report

Transcript 04/02/09 Overview to genetic model based

Biomath-HG 207B/ Biostat 237
February 5, 2004
Lecture 5
Linkage Analysis = Gene Mapping
Think of the markers as genetic guideposts along the
chromosome.
1
2 3 4
56
7
8 9
10 11
= marker
= Location of disease susceptibility gene
1
The distortion in the expected segregation pattern of 1/4:1/4:1/4:1/4
depends on the frequency of crossovers between the two loci. The
frequency is dependent on the distance between the two loci.
The two members of the same autosomal pair, duplicate and
pair up
A2
A1
B2
B1
A1
A2
A1
B1 B2
B1
B2
B2
Let the crossover
frequency be 10%
Recombination
Fraction q = 10%
in gametes
A1
A1
A2
A2
B1
B2
B1
B2
45%
5%
5%
45%
2
LOD = log base 10 of the Odds of recombination.
= the log base 10(probability of the data when
recombination fraction equals q / probability of the data
the recombination fraction equals ½)
The pedigrees are independent of one another so we
sum their contribution to the overall LOD.
LOD = LOD(family 1)+ LOD(family 2) + ...
Calculate LOD for different values of q. The value of q
that maximizes the LOD is the most likely value of q.
In certain cases it can be easy to calculate the LOD and
find the maximum q:
3
For example, when the disease is due to a single locus
acting in a Mendelian manner with complete
penetrance.
Particularly easy when the phase, whether the disease
allele and a marker allele were inherited from the same
parent or not, is known. In this case, just need to count
the number of crossovers, r, in the number of
informative meioses, n.
The contribution to the overall lod score for this family
for specified q is
  q  r  1  q  nr 



   

 r
nr 




q
1

q
2
2




  log 

LOD  log10 
10
n
  1 / 2  r  1  1 / 2  nr 


1



 
 
 
2
2
2
 
 
 




 log10 2 n q  1  q 
r
nr

4
Example of calculating the LOD when phase is known
Let the disease be rare, dominant and fully penetrant
2
1
A-B
A-A
4
3
B-B
A-B
5
6
7
8
9
A-B
B-B
B-B
B-B
B-B
We know that 3 inherited disease allele,D, and marker B allele
from father, and normal allele,d, and marker allele A from
mother. Father 4 is homozygous, d-d and B-B.
3 children inherited D and B from mother, 1 child inherited d and
A from mother. Child 7 inherited d and B from mother, a
recombinant. If q=0.20 then the contribution to the lod is
log10={25(1/5)(4/5)4}=0.42
5
Example of calculating the LOD when phase is unknown
Let the disease be rare, dominant and fully penetrant
4
3
B-B
A-B
5
6
7
8
9
A-B
B-B
B-B
B-B
B-B
We no longer know if mother 3 inherited D and B from the
same parent. There are two equally probable possibilities:
mother got D and B from one parent and d and A
from the other or she got D and A on one parent and
d and B from the other. These are mutually exclusive
events.
Prob(data)=Prob(data|mom is D,B/d,A)P(D,B/d,A)+
Prob(data|mom is D,A/d,B)P(D,A/d,B)
If mom got D and B from one parent, then 1 recombinant
and 4 nonrecombinant informative meioses.
If mom got D and A from one parent, then 4 recombinants
and 1 nonrecombinant.
6
Pr ob(data)
 Pr ob(data | 3 is D, B / d , A) Pr ob( D, B / d , A) 
Pr ob(data | 3 is D, A/d, B)Prob(D, A/d, B)
q  1q 
  

2
2
  

r
nr
1 1q  q 

  
2  2  2
r
nr
1
2
The contribution to the overall lod score for this family is
r
nr

q  1 q 
  


2
2




LOD  log 10 
r
  1 / 2   1  1 / 2  nr

 

 2   2 
r
nr

1 1 q  q  1

  

2  2  2 2

r
nr
1 1 1/ 2  1/ 2  1 

 


2  2   2  2


 r


q  1  q nr  1  q r q nr 

 log 10
n 1


1

 

2


 
 log 10 2 n1 q  1  q 
r
nr
 1  q  q 
r
nr

For our family with 5 sibs, the contribution to the overall lod
when q is 0.20 is
7
Plot LOD by recombination fraction.
LOD
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0.1
0.2
q
0.3
0.4
0.5
 prob(Y | q ) 
lod  log10 
1 )
prob
(
Y
|
q


2 
8
Comments about using pedigrees to locate genes:
(1) Statistical mapping, Linkage analysis, is the first
step in the strategy of positional cloning of disease
loci. Prior to cloning linkage analysis can be used
for risk prediction.
(2) An unqualified success for mapping disease genes
with regular Mendelian patterns of inheritance,
e.q. cystic fibrosis and hypercholesterolemia
(3) Based on the fact that genes close in distance
violate Mendel’s law of independent assortment of
loci. The actual relationship between physical
distance and genetic distance is complicated.
(4) Trait loci are positioned (mapped) relative to nondisease markers.
9
The Genetic Model based methods have been
successful in localizing genes in single gene traits with
high penetrance (Mendelian traits):
In genetic model based linkage analysis need to know or
estimate:
(1) Penetrance: What is the probability of having the
trait given a particular genotype? What is the
probability of the marker phenotype given the
marker genotype?
(2) Priors: Marker and gene genotype frequencies
must be determined.
(3) Transmission Probabilities: How much
recombination is there between genes and
markers?
LY, M | Ω    ... P(Y, M | g, m) P(g, m)
g1
m1
s
gs
ms
  ...  P(Yi | g i ) P( M i | mi )
g1
m1
gs
ms
i 1

j founders
P( g i ) P(mi )
 P( g
k
, mk |g l , ml , g n , mn ,q )
{ k , l , n}
koffspring
10
Does genetic model based linkage work with
complex traits?
Yes, provided the model is correctly (nearly correctly)
specified. It has worked with complex diseases (example:
Alzheimer's disease, breast cancer, hemolytic disease of
the newborn) if one can identify a population, the
environmental factors involved etc. In some cases, there
is a subpopulation where the trait seems to be segregating
in a Mendelian (single gene) manner.
Model Based Linkage Analysis has worked best when the
disease gene is necessary and sufficient for disease
expression.
Genetic model based linkage analysis can give misleading
results if
(1) Incorrect ascertainment correction is used;
(2) Phenotypes are incorrectly classified (the penetrance
is misspecified);
(3) The degree of heterogeneity is misspecified
(populations differ in the causal gene); or
(4) The mode of transmission is misspecified.
(Clerget-Darpoux, 1986; J. Ott, 1991)
All of these problems can occur in linkage analysis of a
complex trait.
11