Quantitative Trait Loci, QTL An introduction to

Download Report

Transcript Quantitative Trait Loci, QTL An introduction to

Quantitative Trait Loci, QTL
An introduction to quantitative genetics
and common methods for mapping of
loci underlying continuous traits:
Why study quantitative traits?
• Many (most) human traits/disorders are complex
in the sense that they are governed by several
genetic loci as well as being influenced by
environmental agents;
• Many of these traits are intrinsically continuously
varying and need specialized statistical
models/methods for the localization and
estimation of genetic contributions;
• In addition, in several cases there are potential
benefits from studying continuously varying
quantities as opposed to a binary
affected/unaffected response:
For example:
• in a study of risk factors the underlying
quantitative phenotypes that predispose disease
may be more etiologically homogenous than the
disease phenotype itself;
• some qualitative phenotypes occur once a
threshold for susceptibility has been exceeded,
e.g. type 2 diabetes, obesity, etc.;
• in such a case the binary phenotype
(affected/unaffected) is not as informative as the
actual phenotypic measurements;
A pedigree representation
Variance and variability
• methods for linkage analysis of QTL in humans rely
on a partitioning of the total variability of trait values;
• in statistical theory, the variance is the expected
squared deviation round the mean value,
Y  E (Y ) :
V (Y )  E[(Y  Y ) 2 ];
• it can be estimated from data as:
s2 
1 n
2
(
y

y
)
;

i
i 1
n
• the square root of the variance is called the standard
deviation;
A simple model for the phenotype
Y=X+e
where
• Y is the phenotypic value, i.e. the trait value;
• X is the genotypic value, i.e. the mean or
expected phenotypic value given the genotype;
• e is the environmental deviation with mean 0.
• We assume that the total phenotypic variance is
the sum of the genotypic variance and the
environmental variance, V (Y ) = V (X ) + V (e),
i.e. the environmental contribution is assumed
independent of the genotype of the individual;
Distribution of Y : a single biallelic locus
A single biallelic locus: genetic effects
Genotype
Genotypic value
• a is the homozygous effect,
• k is the dominance coeffcient
• k = 0 means complete additivity,
• k = 1 means complete dominance (of A2),
• k > 1 if A2 is overdominant.
Example: The pygmy gene, pg
• From data we have the following mean
values of weight:
X++ = 14g, X+pg = 12g, Xpgpg = 6g,
• 2a = 14 -6 = 8 implies a =4,
• (1 + k)a = 12 - 6 = 6 implies k = 0.5.
Data suggest recessivity (although not
complete) of the pygmy gene.
Decomposition of the genotypic value, X
• Xij is the mean of Y for AiAj-individuals;
• when k = 0 the two alleles of a biallelic locus
behaves in a completely additive fashion: X is a
linear function of the number of A2-alleles;
• we can then think of each allele contributing a
purely additive effect to X ;
• this can be generalized to k ≠ 0 by
decomposition of X into additive contributions of
alleles together with deviations resulting from
dominance;
• the generalization is accomplished using leastsquares regression of X on the gene content;
Least-squares linear regression
ˆ +  , i.e. fittedvalue  residual deviation;
X=X
 minimizethesum of squared residuals;
ˆ )  V ( ), variancedecomposition
 V ( X )  V (X
Model 1
 X i j  Xˆ ij   ij
    i   j   ij
  is t hepopulat ionmean phenot ype,
 i is t headdit iveeffectof allele Ai ,
 ij is t heresidual deviat iondue t o dominance;
  Xˆ ij    1 N1   2 N 2 , wit h N k t henumber
of Ak - allelesin t hegenot ype;

  21

ˆ
X ij    1   2
  2
2

for A1 A1 ,
for A1 A2 ,
for A2 A2 .
1 p1   2 p2  0
   2  1
 1   p2
 2  p1
Interpretations
• in the linear regression X  Xˆ  
Xˆ is the heritable component of the genotype,
δis the non-heritable part;
• the sum of an individuals additive allelic effects, αi+αj is
called the breeding value and is denoted Λij
• under random mating αican be interpreted as the average
excess of allele Ai
• this is defined as the difference between the expected
phenotypic value when one allele (e.g. the paternally
transmitted) is fixed at Ai and the population average, μ;
Linear Regression
 pk  proportion of Ak - alleles in population;
 the expected additive effect of a randomly drawn
allele is 0, i.e.
 1 p1   2 p2  0 ;
 which implies the corresponding population
varianc e
 12 p1   22 p2
 since for a biallelic locus N1  2-N 2 ,
X  ~  N  
ij
where
~    2 1 ,
   2  1.
2
ij
Graphically
Linear Regression Model solving 
• X ij  ~    N2  ij
X
N2
prob.
0
0
p12
a(1+k)
1
2 p1 p2
2a
2
cov(X , N 2 )
• 
var(N 2 )
p
2
2
E ( X )  a(1  k )  2 p1 p2  2ap22  2ap2 (1  p1k )
V ( X )  a (1  k )  2 p1 p2  4a p  4a p (1  p1k )
2
2
2
2
2
2
E ( N 2 )  2 p2
Var ( N 22 )  2 p1 p2
E ( XN2 )  a(1  k ) 1 2 p1 p2  2a  2 p
 2ap2 (2 p2  p1 (1  k ))
2
2
2
2
2
COV ( X , N 2 )  2ap2 [2 p2 (1  p1k )  2 p2  p1 (1  k )]
 2ap1 p2 [1  k  2 p2 k ]
 2ap1 p2 [1  k ( p1  p2 )]
   a  [1  k ( p1  p2 )]
average excesses
 i*  E ( X | onealleleis i )   X
1*  X 12 p(anotheroneis 2 | 1)
 X 11 p(anotheroneis 1 | 1)   X
random mating

X 12 p2  X 11 p1   X
 (1   2 ) p2  (21 ) p1  1
Interpretations under random mating
• α= a [1+ k (p1-p2)] ;
α= - p2 α;
α= p1 α,
Population parameters for k≠0
• α is called the average effect of allelic substitution:
substitute A1 A2for a randomly chosen
A1 –allele
• then the expected change in X is,
(X12 -X11) p1 + (X22 -X12) p2 ;
• which equals α. (simple calculations).
 : Average effect of allelic substitution
A1
A2
A2
A1
A2
A1
p1 ( X 12  X 11 )  p2 ( X 22  X 12 )
 p1  a(1  k )  p2  a(1  k )
 a  (1  k ( p1  p2 ))
α is a function of p2 and k :
Partitioning the genetic variance
• the variance, V (X ), of the genotypic values in
a population is called the genetic variance:
V ( X )  V ( Xˆ   )
 V ( Xˆ )  V ( )
 V A  VD
•
VA  2 p1 p2 2  2( p112  p2 22 )
is the additive
genetic variance, i.e. variance associated with
additive allelic effects;
• VD  (2 p1 p2ak)2
dominance genetic
variance, i.e. due to dominance deviations;
VA
VA  2( p112  p2 22 )
 p11  p2 2  0
VA  2 p1 p2 2  p22 4 2 ( Linear regression)
 (2 p1 p2  2p22 ) 2
 2 p1 p2 2
 2 p1 p2 a 2 [1  k ( p1  p2 )]2
V (X); VA; VD are functions of p2 and k:
VA  [dashed]  2 p1 p2 [a(1  k ( p1  p2 ))]2 ;
VD  [dotted]  (2 p1 p2 ak) 2 ;
Example: The Booroola gene, (Lynch and Walsh, 1998)
In summary
• The homozygous effect a, and the dominance
coefficient k are intrinsic properties of allelic
products.
• The additive effect αi, and the average excess
αi* are properties of alleles in a particular
population.
• The breeding value is a property of a particular
individual in reference to a particular population.
It is the sum of the additive effects of an
individual's alleles.
• The additive genetic variance, VA, , is a property
of a particular population. It is the variance of the
breeding values of individuals in the population.
Multilocus traits
• Do the separate locus effects combine in an
additive way, or do there exist non-linear
interaction between different loci: epistasis?
• Do the genes at different loci segregate
independently?
• Do the gene expression vary with the
environmental context: gene by environment
interaction?
• Are specic genotypes associated with particular
environments: covariation of genotypic values
and environmental effects?
Example: epistasis
Average length of vegetative internodes in the lateral branch
(in mm) of teosinte. Table from Lynch and Walsh (1998).
Two independently segregating loci
• Extending the least-squares decomposition of X :
X    1  1  2   2  
• Λk is the breeding value of the k'th locus,
δk is the dominance deviation of the k'th locus,
ε is a residual term due to epistasis;
• if the loci are independently segregating
V ( X )  V (1 )  V ( 2 )  V (1 )  V ( 2 )  V ( )
 VA,1  VA,2  VD,1  VD, 2  V ( )
 VA  VD  V ( )
Neglecting V (ε)
• the epistatic variance components contributing
to V (ε) are often small compared to VA and VD;
• in linkage analysis it is this often assumed that
V (ε) = 0;
• note however: the relative magnitude of the
variance components provide only limited insight
into the physiological mode of gene action;
• epistatic interactions, can greatly inflate the
additive and/or dominance components of
variance;
Resemblance between relatives
A model for the trait values of two relatives:
Yk = Xk + ek, k = 1 , 2,
where for the k’th relative
• Yk is the phenotypic value,
• Yk is the genotypic value,
• ek is the mean zero environmental deviation.
• the ek’s are assumed to be mutually independent
and also independent of k. Hence, the covariance
of the trait values of two relatives is given by the
genetic covariance, C(X1; X2), i.e.
C(Y1; Y2) = C(X1; X2)
A (preliminary) formula for C(X1 ,X 2)
For a single locus trait
C(X1; X2) = c1VA + c2VD
• c1 and c2 are constants determined by the type
of relationship between the two relatives.
• same formula applies for multilocus traits if no
epistatic variance components are included in
the model, i.e. V (ε) = 0.
• in this latter case and are given by summation of
the corresponding locus-specific contributions.
Joint distribution of sibling trait values
Single biallelic, dominant (k =1 ) model. Correlation 0.46.
Measures of relatedness
• N = the number of alleles shared IBD by
two relatives at a given locus;
• the kinship coefficient, θ , is given by
2 θ = E(N) / 2;
i.e. twice the kinship coefficient equals the expected
proportion of alleles shared IBD at the locus.
• The coefficient of fraternity, Δ, is defined
as
Δ = P(N = 2).
Some examples
• Siblings
(z0; z1; z2) = (1/4; 1/2; 1/4) implying E(N) = 1.
Thus θ= 1/4 and Δ = 1/4:
• Parent-offspring
(z0; z1; z2) = (0; 1; 0) implying E(N) = 1.
Thus θ = 1/4 and Δ = 0:
• Grandparent - grandchild
(z0; z1; z2) = (1/2; 1/2; 0) implying E(N) = 1=2.
Thus θ = 1/8 and Δ = 0:
Covariance formula for a single locus
Under the assumed model
X 1     i1   1j   ij1
X 2     i2   2j   ij2
Cov( X 1 , X 2 )  Cov( i1   1j ,  i2   2j )
 Cov( ij1 ,  ij2 )
C (Y1 , Y2 )  C ( X 1 , X 2 )
 2θVA  VD
E( N )

VA  P( N  2)VD
2
A single locus; perfect marker data
N
C(Y1,Y2|N)  VA  I  N  2 VD
2
with
1 if N  2
I {N  2}  
0 if N  0 or N  1
i.e.
if N  0
0

C (Y1,Y2|N)  VA / 2
if N  1
V  V if N  2
D
 A
Covariance formula for multiple loci
n independently segregating loci assuming no
epistatic interaction, i.e. putting V (ε) = 0
C (Y1 , Y2 )  C ( X 1 , X 2 )
 2 V A  VD
 2

l
V A , l    l VD , l
 E( Nl )

 l 
V A , l  P ( N l  2) VD , l  ;
 2

N l is t hemumber of alleles shared IBD at locus l ;
V A,l , VD ,l are locus - specificaddit ive- and dominacevariance
cont ribut ons,
i
respect ively.
Covariance formula for multiple loci
n independently segregating loci assuming no
epistatic interaction, i.e. putting V (ε) = 0
C (Y1 , Y2 )  C ( X 1 , X 2 )
 2 V A  VD
 2

l
V A , l    l VD , l
 E( Nl )

 l 
V A , l  P ( N l  2) VD , l  ;
 2

N l is t hemumber of alleles shared IBD at locus l ;
V A,l , VD ,l are locus - specificaddit ive- and dominacevariance
cont ribut ons,
i
respect ively.
Covariance... continued
Define for every pair of relatives
 (x)  E[ Nx | MDx ] / 2;
and

(x)
2
 P( Nx  2 | MDx );
For two related individuals we then have,
C (Y1 , Y2 | MDx ) 
 l
E[ N l | MDx ]
(
VA,l  P( N l  2 | MDx )VD ,l ) ;
2
  VA, x   2 VD , x  2VA, x  VD , R
( x)
( x)
Haseman-Elston method
• Uses pairs of relatives of the same type: most
often sib pairs;
• for each relative pair calculate the squared
phenotypic difference: Z = (Y1 –Y2)2;
• given MDx regress the Z's on the expected
proportion of alleles IBD, π(x) = E [Nx |MDx]/2, at
the test locus;
• a slope coefficient β< 0, if statistically significant,
is considered as evidence for linkage;
HE: an example
0.5
Proportion of marker alleles identical by decent
Solid line is the tted regression line;
Dotted line indicates true underlying relationship
HE: motivation
E[(Y1  Y2 ) 2 ]  V [Y1  Y2 ]
 V (Y1 )  V (Y2 )  2C (Y1  Y2 )
 2V (Y )  2C (Y1  Y2 )
Assume strictly additive gene action at each locus,
i.e.VD = 0. Then, for a putative QTL at x,
E[(Y1  Y2 ) 2 | MDx ]  2V (Y )  2C (Y1  Y2 | MDx )
 2V (Y )  2[ ( x )VA, x  2VA, R ]
NOT E: T hisis a linear functionin  ( x ) !
HE: linkage test
E[Y1 , Y2|MDx ]     
( x)
where
  2[V (Y )  2VA, R ]
  2VA, x
T helinkage test is
H0 : 
 0, ( VA, x  0)
vs
H1: 
 0
HE: examples with simulated data
simulated data from n = 200 sib-pairs;
top to bottom: h2 = 0:50; 0:33; 0:25.
Heritability and power
• for a given locus we may define the locus-specific
heritability as the proportion of the total variance
'explained' by that particular site, e.g. (in the narrowsense),
V
h2 
A
V (Y )
• the locus-specific heritability is the single most
important parameter for the power of QTL linkage
methods;
• heritabilities below  10% leads, in general, to
unrealistically large sample sizes.
HE: two-point analysis
~ ( m)
~
E[(Y1  Y2 ) | markergenotypes]    
2
where  is the expected proportion of marker
alleles shared IBD.
~
•  depends on the type of relatives considered;
~
• for sib pairs   2(1  2 ) 2VA,l ;
• recombination fraction (θ) and effect size (VA;l )
are confounded and cannot be separately
estimated;
(m )
HE: in summary
Simple, transparent and comparatively robust but:
•
•
•
•
poor statistical power in many settings;
different types of relatives cannot be mixed;
parents and their offspring cannot be used in HE;
assumptions of the statistical model not generally
satisfied;
• Remedy:
• use one of several suggested extensions of HE;
• alternatively, use VCA instead
VCA
QTL
Polygenes
Independent
environment
Mathematically:
Yi=+Tai+gi+qi+ei
Trait value
where  is the population mean, a are the “environmental”
predictor variables, q is the major trait locus, g is the
polygenic effect, and e is the residual error.
VCA: an additive model
p
n
i 1
l 1
Y      i zi   X l  e
E (Y )    i 1  i zi ;
p
V (Y )  VA  VD  V (e)
 VA, x  VD , x  VA, R  VD , R  V (e)
C (Y1 , Y2 | MDx )   VA, x   V
( x)
( x)
2
D,x
 2VA, R  VD , R
VCA: major assumption
The joint distribution of the phenotypic values in a
pedigree is assumed to be multivariate normal with
the given mean values, variances and covariances;
• the multivariate normal distribution is completely
specified by the mean values, variances and
covariances;
• the likelihood, L, of data can be calculated and
we can estimate the variance components
VA;x; VD;x ; VA;R; VD;R;
VCA: linkage test
The linkage test of
H0 : VA;x = VD;x = 0
uses the LOD score statistic
L(fullmodel)
LODx  log10
L(VA, x  VD, x  0)
When the position of the test locus, x, is varied over
a chromosomal region the result can be
summarized in a LOD score curve.
VCA vs HE: LOD score proles
From Pratt et al.; Am. J. Hum. Genet. 66:1153-1157, (2000)
Linkage methods for QTL
• Fully parametric linkage approach is difficult;
• Model-free tests comprise the alternative choice;
• We will discuss
Haseman-Elston Regression (HE);
Variance Components Analysis (VCA);
Both can be viewed as two-step procedures:
1. use polymorphic molecular markers to extract
information on inheritance patterns;
2. evaluate evidence for a trait-influencing locus
at specified locations;
Similarities and differences
• HE and VCA are based on estimated IBDsharing given marker data;
• both methods require specification of a statistical
model!
('model-free' means 'does not require
specification of genetic model')
• similarity in IBD-sharing is used to evaluate trait
similarity using either
linear regression (HE) or
variance components analysis (VCA);