Complex” inheritance - CSC's mainpage — CSC

Download Report

Transcript Complex” inheritance - CSC's mainpage — CSC

Linkage analysis, linkage disequilibrium
analysis, and joint analysis
Linkage and linkage disequilibrium analyses
time
LD
analysis
linkage
analysis
= simplified and approximate linkage
analysis on extremely large pedigree
(i.e., population) of unknown
structure
Possible tests of linkage and/or LD
linkage in
absence of LD
no linkage
linkage
no LD
no LD
LD in absence
of linkage ?
LD given
linkage
no linkage
LD
?
linkage
linkage given LD LD
?
Why joint analysis of
linkage and LD?
•
to use as much information as possible from a dataset to map the
position of a locus
Linkage and linkage disequilibrium analysis
Why joint analysis of
linkage and LD?
•
to use as much information as possible from a dataset to map the
position of a locus
•
If a significant linkage signal has been obtained in a dataset, then the
locus to be mapped obviously plays a substantial role in the etiology of
the studied trait in that dataset. Therefore, the same (rather than a
different) dataset should be used for fine mapping, e.g. by LD analysis.
•
joint analysis is often more than sum of parts
Evidence of linkage can provide genotype
and phase information to LD analysis
(and vice versa)
D
+
1
2
phase
probability
w/o linkage
probability w/
linkage
I
or
D
+
D
+
2
1
3
3
II
Pr(I) = Pr(II) D
D
Pr(I)  Pr(II) 1
3
Why joint analysis of
linkage and LD?
•
to use as much information as possible from a dataset to map the
position of a locus
•
If a significant linkage signal has been obtained in a dataset, then the
locus to be mapped obviously plays a substantial role in the etiology of
the studied trait in that dataset. Therefore, the same (rather than a
different) dataset should be used for fine mapping, e.g. by LD analysis.
•
joint analysis is often more than sum of parts
•
Differences in typical ascertainment protocol: While ascertaining
affected singletons or trios with an affected offspring (as is typical in LD
analysis) does not normally enrich for an underlying genetic etiology of
the trait, ascertainment on the basis of multiple affected individuals per
family (as is typical in linkage analysis) often does.
Why not dispense with linkage analysis
altogether and go straight for genomewide association analysis?
This is what proponents of the haplotype mapping
(HapMap) project essentially suggest.
There are at least 2 big problems, however.
Genetic heterogeneity
locus homogeneity,
allelic homogeneity
time
locus homogeneity,
allelic heterogeneity
locus heterogeneity,
allelic homogeneity
(at each locus)
locus heterogeneity,
allelic heterogeneity
(at each locus)
time
Association analysis is much more susceptible to
allelic heterogeneity than linkage analysis
LD analysis?
not okay
D
linkage analysis?
okay if D/D/D are
alleles of the same
locus
D/+
D
D
D/+
D/+
D/+
D/+ D/+
D/+ D/+
D/+ D/+
D/+
D/+
D/+
D/+
D/+
RP1
RP15
RP2
RP3
RP7
RP11
CHM
RP9
RP12
Peripherin-RDS
ROM1
RP13
Rhodopsin
RP14
CNCG
LCA1
PDEA
PDEB
ABCR
Other
Genetic
Factors
Sex-Linked
Dominant
Alleles
Sex-Linked
Recessive
Alleles
Autosomal
Dominant
Alleles
Autosomal
Recessive
Alleles
Retinitis
Pigmentosa
Mendelian
Disease
Quality of
Life
Quantity of
Life
Environmental
and Cultural
Factors
Sample size requirements to detect a RP gene
by affected sib-pair analysis
5000
100000
4000
80000
70000
3000
60000
ADRP
ARRP
50000
2000
40000
30000
1000
20000
10000
0
0
0
0.05
0.1
0.15
0.2
0.25
Proportion of families with disease alleles in a given gene
0.3
(autosomal dominant RP)
Number of sib-pairs required
for 95% power at lod score 3
(autosomal recessive RP)
90000
Sample size requirements to detect a rhodopsin allele
as a risk factor for RP by TDT analysis
Number of triads required
for 95% power and p-value 0.0001
500000
400000
0.5
0.4
300000
0.2
0.3
200000
100000
0
0
0.05
0.1
0.15
0.2
Relative frequency of given rhodopsin risk allele
0.25
allelic heterogeneity:
examples of
COL1A1&2 genes
From: Weiss KM (1993) Genetic variation and
human disease: principles and evolutionary
approaches. Cambridge University Press,
Cambridge
Genome-wide number of tests is much larger for
association analysis than linkage analysis
 more stringent test criterion is required
linkage analysis
linkage analysis
= LD analysis
R?
R?
R?
R?
R?
R?
R?
examining
recombination status of
individual meioses
R in ≥1 meiosis?
examining history of
recombination status
multiple meioses
R: recombination
Haplotype conservation:
example of hereditary hemochromatosis
Thomas et al (1998) A haplotype and linkage disequilibrium
analysis of the hereditary hemochromatosis gene region. Hum
Genet 102:517
Decay of LD by recombination
max
0 generations
decay rate:
degree of LD
1   
5
10
20
50
100
0
-0.2
-0.1
0
genetic distance (M)
0.1
0.2
Definition of linkage disequilibrium (LD) and
allelic association
Either term refers to the situation where alleles at different
loci do not occur independently of each other on
haplotypes, irrespective of the underlying cause of the
non-independence.
Let aij denote allele j at locus i. The two alleles a11 and a21
at locus 1 and 2 are in LD/allelic association if and only if
Pa11a21   Pa11 Pa21 
Pa21 | a11   Pa21 .
Forces creating and destroying LD
Terwilliger, Weiss (1998) Linkage disequilibrium analysis of complex
disease: fantasy or reality? Current Opinion in Biotechnology 9:578
Sources of LD
•“founder effect” (the allele of the trait locus,
along with the surrounding haplotype, in individuals
with the trait is shared IBD from a common ancestor)
•drift (random fluctuation of haplotype frequencies
from generation to generation)
•admixture (migration between populations with
different allele frequencies at the loci of interest)
•interaction between alleles at different loci
(epistasis)
•poor matching of case and control samples
(difference in allele frequencies is unrelated to the
alleles at the trait locus; “comparison of apples and
oranges”)
presence/amount
of LD is a
function of the
genetic distance
between the loci
A
1
B
1
C
1
C
1
a mutation occurs
A
1
B
1
B
2
Complete
disequilibrium
Complete disequilibrium
A
1
B
1
B
2
C
1
C
1
recombination occurs
A
1
A
2
B
1
B
Incomplete
disequilibrium
2
A
1
A
Incomplete
2disequilibrium
B 1
B
2
C
1
time passes, more recombination occurs
Equilibrium
the haplotype frequencies are the product
of the allele frequencies
p(A1) = p(A) p(1)
A
1
B
1
C
1
C
1
Complete
disequilibrium
C
1
Incomplete
disequilibrium
a mutation occurs
A
1
B
1
B
2
recombination occurs
A
1
A
2
B
1
B
2
time passes, more recombination occurs
Equilibrium
“Founder effect”
Initially, when the initial copy of
the trait allele is introduced into
the population, the allele is
present on a particular haplotype.
As the allele is passed on
through generations, alleles at
neighboring marker loci are cotransmitted in a hitchhiking
effect. Recombination
occasionally breaks the
haplotype, reducing the length of
the conserved haplotype and the
amount of LD.
D
Haplotype sharing due to a “founder effect”
The apparently unrelated individuals in the sample of individuals with
the trait received the same disease allele from a common ancestor;
these individuals are therefore very distant relatives in reality.
example
genealogy
time
Decay of LD by recombination
max
0 generations
decay rate:
degree of LD
1   
5
10
20
50
100
0
-0.2
-0.1
0
genetic distance (M)
0.1
0.2
Principle behind LD mapping based on
admixture
Assume that 2 populations, both genetically homogeneous but genetically very different from
each other, colonize a previously uninhabited island. Assume that the alleles at different loci in
each populations are in linkage equilibrium, and that a rare “Mendelian” trait, with causative
allele(s) “D”, is only present in one of the two populations.
If one sampled case and control individuals from the joint population (in the initial generation,
before mating between the two colonizing population has taken place), one would be able to
detect LD between the trait and many markers, irrespective of genetic distance between the loci.
This is because all cases would have been ascertained from the population harboring the trait, and
the marker allele frequencies between cases and controls would differ for any marker with
different allele frequencies in the two colonizing populations. (This is equivalent to getting “false
positives” due to poorly matching case and control groups.)
Assume that subsequently there is random mating in the joint population. The initial LD will
decay rapidly due to recombination for all markers but those tightly linked to the trait locus. If one
sampled cases and controls after several generations of random mating, one would therefore
detect LD only with markers near the trait locus, demonstrating the potential usefulness of
admixture-based LD mapping.
Be aware that LD between a pair of loci will only result if the founding populations have different
allele frequencies at both loci.
Ideal population for LD mapping based on
“founder effect”
•very small, homogenous founder population
•rapid subsequent population growth
•for detection of LD: few generations since population was founded
•for fine mapping: many generation since population was founded
•panmixia
•no admixture
•homogeneous environment
•large enough population to have a sufficient number of individuals
with trait of interest
•availability of genealogical records, high medical standards, favorable
public and private attitudes towards genetic research
Ideal population for LD mapping based on
drift
•small population size
•no population growth
•many generation since population was founded
•panmixia
•no admixture
•homogeneous environment
•large enough population to have a sufficient number of individuals
with trait of interest
•availability of genealogical records, high medical standards, favorable
public and private attitudes towards genetic research
Ideal population for LD mapping based on
admixture
•admixing populations are each homogenous and genetically very
different from each other
•for detection of LD: few generations since population was founded
•for fine mapping: many generation since population was founded
•panmixia in admixed population
•no admixture after initial mixing of populations
•homogeneous environment
•large enough population to have a sufficient number of individuals
with trait of interest
•availability of genealogical records, high medical standards, favorable
public and private attitudes towards genetic research
Measures of strength of LD
alleles of locus B
(marker)
alleles of locus A
(trait)
1
pD 2  pD p2  
2
D p D1 pD2 pD
 p1 p 2 p
p1
p2
pD1  pD p1  
1
p1  p p1  
p 2  p p2  
  pD1 p 2  pD 2 p1
   min ,  max ,
 min  max pD p1 , p p2   1
 max  min pD p2 , p p1   1
 /  min if   0
 
 /  max if   0
   0,1
Testing for presence of LD
cases (“affected”)
controls (“unaffected”)
marker alleles
1
2
n11
n12
n1
n21 n22
n2
n2
n
n1
Do the marker alleles occur in equal proportions among the cases and
controls? If not, and there is a significant difference in allele
frequencies, the marker locus is probably in close genetic distance from
the trait locus of interest.
null hypothesis (H0):
proportions are equal
alternative hypothesis (H1):
proportions are not equal
(2-sided alternative)
Testing for presence of LD
cases
(“affected”)
controls
(“unaffected”)
marker alleles
1
2
n11
n12
n1
n21 n22
n2
n2
n
n1
1) Fisher’s “exact” test
computes exact p-values based on hypergeometric distribution;
computationally intensive
2) chi-squared test
uses continuous distribution (c2 to (approximately) represent
categorical data; is therefore only appropriate when all cell counts are
large, say, > 5; not computationally intensive
Chi-squared test
I
J
X 2  
i 1 j 1
I
J
 
i 1 j 1
marker alleles
obs  exp 
2
ij
n
ij
cases
(“affected”)
expij
ij  nin j / n 
2
controls
(“unaffected”)
nin j / n
~ c 2 I 1 J 1 under H0 .
1
2
n11
n12
n1
n21 n22
n2
n2
n
n1
n n n  n n 
For 2  2 table, X 2   11 22 12 21 ~ c 21 under H0 .
n1n2n1n2
2
Often,a " continuitycorrection" is appliedon a 2  2 table:
n  n11n22  n12 n21  n / 2 
2
X 
~ c 21 under H0 .
n1n2n1n2
2
Chi-squared test: multi-allelic marker case
marker alleles

m
1
2
3
cases
(“affected”)
n11
n12
n13
n1m n1
controls
(“unaffected”)
n21
n22 n23
n2 m n2
n1
n2
n3
nm
n
Either perform a separate test for each allele individually (by collapsing
all other alleles): m tests on 2x2 tables, requiring correction for multiple
testing (e.g. Bonferroni correction),
or perform one chi-squared test on whole table (with m-1 degrees of
freedom).
Measured genotype analysis: A fixed effects
model in which genotype-specific means are
estimated
Quantitative Trait Linkage
Analysis: Variance Component
Approach
Modeling the Phenotype:
p   i xi qj  a e
 Baseline mean
 Regression coefficients
x
q
a
e
Scaled covariates
QTL effects
Residual genetic effects
Random environmental effects
Genotypes as covariates
If effect of QTL is
modeled as additive:
Genotype
AA
Aa
aa
Cov
-1
0
1
To allow for non-additive
models:
Cov1 Cov2
Genotype Add Dom
AA
-1
0
Aa
0
1
aa
1
0
FXII levels by FXII 46C/T
genotype
CC
CT
TT
FXII levels
128.88
92.23
55.58
p < 110 -7
Prothrombin activity levels (%)
Prothrombin levels by G20210A
genotype
190
170
150
130
110
90
p < 110-7
70
50
G/G
G/A
A/A
Disequilibrium is unpredictable.
A QTL may be in equilibrium with
the other polymorphisms
surrounding it. Disequilibrium
need not be present.
LD within F7 gene
POMC: Pattern of LD
Caution!
Negative results in an association study
have implications only for the marker you
have tested, not necessarily for the entire
candidate gene.