Transcript Intro-link

Introduction to Linkage Analysis
Pak Sham
Twin Workshop 2003
Human Genome
• 22 autosomes, XY
• 3 109 base-pairs (2 metres long)
•  2% coding sequences, rest regulatory & “junk”
•  30,000 - 40,000 genes
• Much communality with other species
Genetic Variation
• Chromosomal abnormalities
• Duplication (e.g. Down’s)
• Deletion (e.g. Velo-cardio-facial syndrome)
• Major deleterious mutations
• Usually Rare (e.g. Huntington’s)
• Polymorphisms
• Single nucleotide polymorphisms (SNPs)
• Variable length repeats (e.g. microsatellites)
• Some are functional (“normal variation”)
• Most are non-functional (neutral markers)
Genetic Mapping of Disease
• Levels of Genetic Analysis
•
•
•
•
Estimate heritability (family, twins, adoption)
Find chromosomal locations (linkage)
Identify risk variants (association)
Understand mechanisms (cell biology, etc)
• Applications
• Prediction of genetic risk
• More accurate prediction of genetic risk
• Even more accurate prediction of genetic risk;
prediction of prognosis and treatment response
• Development of new drug targets
Strategies of Gene Mapping
• Functional
• Uses knowledge of disease to identify candidate genes
• Finds variants in candidate genes
• Looks for association between variants and disease
• Positional
• Systematic screen of whole genome
• Uses a set of  400 evenly-spaced markers
• Looks for markers which con-segregate with disease
Co-segregation
A3A4
A1A2
A1A3
A1A2
A1A4
A2A4
A3A4
A2A3
A3A2
Marker allele A1
cosegregates with
dominant disease
Linkage Co-segregation
Parent
Gametes
Alleles on the same chromosome tend to be stay
together in meiosis; therefore they tend be co-transmitted.
Crossing over between
homologous chromosomes
Map Distance
Map distance between two loci (Morgans)
= Expected number of crossovers per meiosis
(1 Morgan = 100 centiMorgans)
Note: Map distances are additive
Heterogeneity in recombination frequencies
Total map length:  33
(1 cM  106 base pairs)
Recombination
A1
Q1
Parental genotypes
A2
Q2
A1
Q1
A2
Q2
A1
Q2
A2
Q1
Non-recombinants
1-
Recombinants

Recombination Fraction
Recombination fraction () between two loci
=
Proportion of gametes that are recombinant
with respect to the two loci
Recombination & map distance
0.5
Recombination fraction
0.45
0.4
0.35
0.3
Haldane map
function
0.25
0.2
2m
1 e

2
0.15
0.1
0.05
0
0
0.2
0.4
0.6
Map distance (M)
0.8
1
Double Backcross :
Fully Informative Gametes
aabb
AABB
AaBb
AaBb
aabb
Non-recombinant
aabb
Aabb
Recombinant
aaBb
Linkage Analysis :
Fully Informative Gametes
Count Data
Parameter
Recombinant Gametes: R
Non-recombinant Gametes: N
Recombination Fraction: 
Likelihood
L() = R (1- )N
Estimation
ˆ  R ( N  R)
Chi-square


 R log   N log( 1   ) 

  2


(
R

N
)
log(.
5
)


2
Phase Unknown Meioses
AaBb
AaBb
aabb
Either : Non-recombinant
Or :
Recombinant
aabb
Aabb
aaBb
Recombinant
Non-recombinant
Mixture distribution likelihood
The probability of observed data X depend on the
status of descrete variable G
P(X|G)
The status of G is not observed but the probability
distribution of G is available
P(G)
Then the likelihood of the observed data X is
L   P( X | G) P(G)
G
Linkage Analysis :
Phase-unknown Meioses
Count Data
or
Likelihood
Recombinant Gametes: X
Non-recombinant Gametes: Y
Recombinant Gametes: Y
Non-recombinant Gametes: X
L() = X (1- )Y + Y (1- )X
An example of incomplete data :
Mixture distribution likelihood function
Parental genotypes unknown
AaBb
aabb
Aabb
aaBb
Likelihood will be a function of
allele frequencies (population parameters)
 (transmission parameter)
Complex Phenotypes
Penetrance parameters
Phenotype
Genotype
f2
AA
Aa
aa
Disease
f1
1- f2
f0
1- f1
1- f0
Normal
Each phenotype is compatible with multiple genotypes.
General Pedigree Likelihood
Likelihood is a sum of products
(mixture distribution likelihood)
n
f
n
1
1
f 1
L   pen( xi | gi ) pop( gi ) trans( gi | gif , gim)
G
number of terms = (m1 m2 …..mk)2n
where mj is number of alleles at locus j
Elston-Stewart algorithm
Reduces computations by peeling:
Step 1
Condition likelihoods of
family 1 on genotype of
X.
1
X
2
Step 2
Joint likelihood of
families 2 and 1
Lod Score: Morton (1955)
L 
Lod    log
L  0.5
Lod > 3  conclude linkage
Prior odds
1:50
linkage ratio
1000
Lod <-2  exclude linkage
Posterior odds
20:1
Lod Score Curves
lod
0

0.5
Lod score curves are additive over pedigrees
Lods, chi-squares & p-values
In large samples
2  loge(10)  Max lod ~ 21
In small samples
P  10 -Max lod
Problems with parametric linkage
• Requires parameters of the disease model to be
specified
• Allele frequency
• Penetrances
These are generally unknown for a complex trait
• Disease model assumes that a single locus is the
only source of familial resemblance
This is generally unrealistic
Linkage Analysis
Admixture Test (CAB Smith)
Model
Probability of linkage in family = 
Likelihood
L(, ) =  L() + (1- ) L(=1/2)
Note: Another example of mixture likelihood
Linkage Analysis:
MOD
• Maximise lod score over several sets of disease
models, e.g. dominant, recessive, additive
• Make correction for multiple (k) models
• Adjusted lod = lod – log10(k)
Allele sharing
(non-parametric) methods
Penrose (1935): Sib Pair linkage
For rare disease
Concordant affected
Concordant normal
Discordant
IBD
Therefore affected sib pair (ASP) design efficient
Test H0: Proportion of alleles IBD =1/2
HA: Proportion of alleles IBD >1/2
Correlation between IBD of two loci
• For sib pairs
Corr(A, B) = (1-2AB)2
•  attenuation of linkage signal with increasing
genetic distance from disease locus
Joint distribution of Pedigree IBD
• IBD of relative pairs are not independent
• e.g If IBD(1,2) = 2 and IBD (1,3) = 2 then IBD(2,3) = 2
• Inheritance vector gives joint IBD distribution
• Each element indicates whether
• paternally inherited allele is transmitted (1)
• or maternally inherited allele is transmitted (0)
• Vector of 2N elements (N = # of non-founders)
Inheritance Vector: An Example
Ordered genotype notation
1st allele = paternally inherited
2nd allele = maternally inherited
1/2
1/3
3/4
2/3
1/4
2/4
Inheritance vector = (1, 1, 1, 0, 1, 0)
Pedigree allele-sharing methods
APM: Affected Pedigree Members: Uses IBS
very sensitive to allele frequency mis-specification
less powerful than IBD-based methods
NPL: Non-Parametric Linkage (Genehunter)
Conservative at positions between markers
LRT: “Delta parameter” (Genehunter+, Allegro)
•All these methods consider affected members only
Variance Components Linkage
• Models trait values of pedigree members jointly
• Assumes multivariate normality conditional on IBD
• Covariance between relative pairs
= Vr + VQ [-E()]
•
Where V = trait variance
r = correlation (depends on relationship)
VQ= QTL additive variance
E() = expected proportion IBD
Path Diagram for Sib-Pair QTL model
•
1
[0 / 0.5 / 1]
N
S
n
s
PT1
Q
q
Q
S
q
s
PT2
N
n
Incomplete Marker Information
• IBD sharing cannot always be deduced from
marker genotypes with certainty
• Obtain probabilities of IBD values (Z0, Z1, Z2)
Finite mixture likelihood
L   Zi L X | IBD  i 
Pi-hat likelihood
ˆ  z2  z1 / 2
L  L X | IBD  2ˆ 
Pi-hat Model
1
ˆ
N
S
n
s
PT1
Q
q
Q
S
q
s
PT2
N
n
Parametric / Allele Sharing
Parametric
Trait Data
Allele sharing
Marker Data
IBD sharing