No Slide Title

Transcript No Slide Title

Gene Mapping Quantitative
Traits using IBD sharing
References:
Introduction to Quantitative Genetics, by D.S. Falconer
and T. F.C. Mackay (1996) Longman Press
Chapter 5, Statistics in Human Genetics by P. Sham (1998)
Arnold Press
Chapter 8, Mathematical and Statistical Methods for Genetic
Analysis by K. Lange (2002) Springer
What is a Quantitative Trait?
A quantitative trait has numerical values that can be ordered
highest to lowest. Examples include height, weight, cholesterol
level, reading scores etc. There are discrete values where the
values differ by a fixed amount and continuous values where the
difference in two values can be arbitrarily small. Most methods for
quantitative traits assume that the data are continuous (at least
approximately).
Why use quantitative traits?
(1) More power. Fewer subjects may need to be examined (phenotyped)
if one uses the quantitative trait rather than dichotomizing it to create
qualitative trait.
0.045
0.04
0.035
0.03
0.025
0.02
0.015
0.01
0.005
unaffecteds affecteds
y
x w
vz
0
Individuals w and x have similar trait values, yet w is grouped with z and
x is grouped with y. Note that even among affecteds, knowing the trait
value is useful (v and z are more similar than v and w).
Why use Quantitative Traits?
(2) The genotype to phenotype relationship may be more direct. Affection
with a disease could be the culmination of many underlying events
involving gene products, environmental factors and gene-environment
interactions. The underlying events may differ among people, resulting in
heterogeneity.
Some quantitative traits are more likely under the
control of a single gene than others. An example:
Intermediate traits like factor IX levels are
influenced by fewer genes than clotting times. Genes
influencing factor IX level will be easier to map than
genes influencing clotting times.
Why use quantitative traits?
(3) End stage disease may be too late. If the
disease is late onset, then parents may not be
available anymore. However if there is a
quantitative trait that is known to predict
increased risk of the disease, then it might be
measured earlier in a person’s lifetime. Their
parents may also be available for genotyping
resulting in more information.
Why not use quantitative traits?
(1) The quantitative trait doesn’t meet the assumptions of the
proposed statistical method. For example many methods
assume the quantitative traits are unimodal but not all
quantitative traits are unimodal.
(2) The values of the quantitative trait might be very
unreliable.
(3) There are no good intermediate quantitative phenotypes
for a particular disease. The quantitative traits available
aren’t telling the whole story.
Components of the Phenotypic Variance of a Quantitative Trait
shared
environment
Independent
environment
genes
genes
Trait value
Polygenes
The total variance in a quantitative trait, termed the phenotypic
variance, can be partitioned into the variance due to genetic
components, the environmental components and geneenvironment interaction components.
Components of Phenotypic Variance of a Quantitative Trait
Often we make simplifying assumptions, for example that there is no
variance component due to interactions, that there is no shared
environment and that all genes are acting independently.
In this case we can write
the phenotypic variance,
VP, as the sum of the
genetic variance, VG,
and the environmental
variance, VE.
VP = VE+VG
Independent
environment
genes
Trait value
The Additive and Dominance Components
of Variance
VG = VA + VD
VA, the additive genetic variance is attributed
the inheritance of individual alleles.
VD, the dominance genetic variance is attributed
to the alleles acting together as genotypes.
VG / VP= heritability in the broad-sense.
VA /VP= heritability in the narrow-sense.
The degree of correlation between two relatives
depends on the theoretical kinship coefficient
• An important measure of family relationship is the
theoretical Kinship coefficient.
• It is the probability that two alleles, at a randomly chosen
locus, one chosen randomly from individual i and one from
j are identical by descent.
• The kinship coefficient does not depend on the observed
genotype data.
Covariance between relatives under an polygenic model
depends on the theoretical kinship coefficient and the
probability that, at any arbitrary autosomal locus, the pair share
both genes IBD
Relationship
kinship coefficient
parent-offspring
1/4
full siblings
1/4
uncle-nephew
1/8
first cousins
1/16
P(IBD=2)
0
1/4
0
0
covariance
1/2*VA
1/2*VA+1/4*VD
1/4*VA
1/8*VA
Note: This doesn’t depend on any measured genotype effects
(marker information).
Covariance among relatives also depends upon the
allele sharing at a trait locus
Allele Sharing: Identity-by-Descent (IBD)
1/2
3/4
Parental genotypes
Proportion of
Alleles shared
IBD
0
1/3
2/4
Alleles shared
IBD
0
1/3
1/3
1/4
2/3
1
0.5
1/3
1/3
2
1.0
The proportion of alleles shared IBD is
equivalent to twice the conditional kinship
coefficient.
The conditional kinship coefficient is the probability that
a gene chosen randomly from person i at a specific locus
matches a gene chosen randomly from person j given the
available genotype information at markers.
We expect two siblings with similar, extreme trait values to share
more alleles IBD at the trait locus than two siblings who have
dissimilar extreme trait values.
1
2
1-2
Y=0.2
1-2
Y=0.35
3
5
4
6
1-1
1-1
2-2
1-2
Y = 3.22
Y = 1.06
Y = -3.01
Y = 3.78
IBD34 = 0
IBD56 = 1
IBD45 = 2
=0
ˆ 0

ij
1
ˆ
 ij 
2
1
ˆ
 ij 
4
The dependence of the trait’s covariance on the IBD sharing
at a marker is a function of the distance between the trait and
the marker loci as well as the strength of the QTL.
As the map distance increases, the covariance of the trait values
becomes less dependent on IBD sharing at the marker and so
the apparent QTL variance component will decrease.
We expect two siblings with similar, extreme trait values to share more alleles IBD
at the trait locus than two siblings who have dissimilar extreme trait values. Or
another to think about it, we expect that the correlation among trait values will
depend on IBD sharing.
If a marker is strongly linked to the trait locus
-4
-2
4
3
2
1
0
-1 0
-2
-3
-4
4
3
2
1
0
0
2
4
-4
-2
-2
0
2
4
-4
-3
-1 -1 0
-2
2 2
2
2
-4 -4
0
0
-4
-2
4
IBD = 2
IBD = 1
4
4 4
3
-4
4
2 2
2
-3
4 4
0 0
-2 -2
1
-2
-4
0 0
-2 -2
4
2
IBD = 0
If nono
linkage
Under
linkagewe
we expect
expect
-4 -4
j
0
2
4
-4
-2
0
-2
-2
-4
-4
2
4
Haseman-Elston Regression:
Let Y1k denote the trait value for sibling 1 in sibling pair
k and Y2k the value for sibling 2.
If the trait and the marker are linked then, the
difference in the trait values for the pair will be related
to the number of genes shared IBD by the pair.
More precisely, let k denote the proportion of alleles
shared IBD at the marker for sibling pair k then,
Y1k  Y2 k 
2
    k  ek
ek is the variation due to measurement error or
uncontrolled environment differences.
If there is no linkage between the trait and marker then
=0. If the marker and trait locus are linked then >0.0.
square diff
Haseman Elston Regression
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
square diff
0
0.5
1
IBD
1.5
2
*********************************************
*
*
* ANALYSIS FOR TRAIT NUMBER 01 ( TRAIT )
*
*
*
*********************************************
SIMPLE
Trait
----TRAIT
LINEAR
REGRESSION ANALYSIS
Effective
Regress Y on Pi
Locus
D.F. t-value P-values Intercept Slope()
-------- ------- -------- --------- ------M1
102
-3.0735 .001357** 2.4431
1.5060
t-value and p-value are for the slope
INTERPRETING THE RESULTS: Under the null
hypothesis of no linkage between the trait and the marker
the probability of observing a T-stat as negative or more
negative than -3.0735 is 0.001357
CAVEATS:
(1) Method subject to the same cautions as linear
regression. For example, severe non-normality of the trait
can influence the estimates.
(2) The caveats listed for qualitative, model free linkage also
apply.
QTL mapping using a variance component model
Another way to test whether the covariance among
relatives trait values is correlated with the IBD
sharing at a locus is to use a variance component
model.
A simple variance component model has one major trait
locus, a polygenic effect, environmental factors that are
independent of genetic effects and independent across
family members (no household effects). The major gene
and polygenic effects are also independent
QTL
Polygenes
Independent
environment
Trait value
QTL
Polygenes
Mathematically:
Independent
environment
Trait value
Yi=m+Tai+gi+qi+ei
where m is the population mean, a are the “environmental”
predictor variables, q is the major trait locus, g is the
polygenic effect, and e is the residual error.
The variance of Y is:
var(Y)= VA +VD+VG +VE
and for relatives i and j:
ˆ ijV A  ˆ ijVD  2 ijVG
Cov (Yi , Y j )  2
where
VA= additive genetic variance, VD = dominance genetic
variance, VG= polygenic variance,
ij = the theoretical kinship coefficient for i and j,
ˆ ij = the conditional kinship coefficient for i and j at a map
location
ˆ ij = the probability that i and j share both alleles ibd at a
map location,
Bottom line: the trait covariance increases as ibd sharing
increases.
Some things to consider:
The estimates of VA and VD are not the actual variances
due to the QTL - they depend on how far the map location
is from the QTL and the sampled data.
For a parent-child pair,
cov(Yi,Yj)=1/2 VA + 1/2 VG
for any map location.
Why is the conditional kinship coefficient always 1/4?
Why is the dominance variance missing from this
equation?
For two siblings i and j,
1
ˆ
ˆ
Cov (Yi , Y j )  2 ijV A   ijVD  VG
2
Often VD is assumed to be negligible (VD = 0):
Then for any relative pair i and j
ˆ ijV A  2 ijVG
Cov (Yi , Y j )  2
Under the null hypothesis of no linkage to the map location,
Cov(Yi ,Y j )  2ijVA  2ijVG  2ij VA  VG 
Variance component methods of linkage analysis example
overview:
(1) Estimate the IBD sharing at specified locations along the
genome using marker data.
(2) Estimate the variance components VG and VE, under the null
model by maximizing the likelihood.
(3) Given the IBD sharing, estimate the variance components,
VA,VG and VE by maximizing the likelihood using the IBD
sharing at specified map positions Z.
(4) Calculate the location score for each map position Z,
Log10 L(Z ) L(Z  )
Identify the map positions where the location score is large.
As the number of traits increases the complexity of the loglikelihood also increases
The loglikelihood is
maximized using a
steepest ascent
algorithm.
It becomes more and
more difficult to find the
global maximum as
multiple local maximum
exist.
One “solution” is to use
several starting points
for the maximization.
f1
s1 s2
f2
QTL Example:
The mystery trait example from the Mendel manual:
Besides the usual commands:
PREDICTOR = Grand :: Trait1
PREDICTOR = SEX :: Trait1
PREDICTOR = AGE :: Trait1
PREDICTOR = BMI :: Trait1
COEFFICIENT_FILE = Coefficient19b.in <ibd info from sibwalk
QUANTITATIVE_TRAIT = Trait1
COVARIANCE_CLASS = Additive <polygenic
COVARIANCE_CLASS = Environmental
COVARIANCE_CLASS = Qtl <now specify an additive qtl
GRID_INCREMENT = 0.005 <spacing of the map points
ANALYSIS_OPTION = Polygenic_Qtl
VARIABLE_FILE = Variable19b.in
PROBAND = 1
PROBAND_FACTOR = PROBAND
Results
• Get a summary file and a full output file
• The summary file looks like:
MARKER
Marker01
Marker02
----Marker03
Marker04
---
MAP
DISTANCE
0.0000
0.0010
0.0050
0.0100
0.0150
0.0200
0.0228
0.0238
0.0250
0.0300
LOCATION
SCORE
1.5892
1.5679
1.6693
1.8603
2.1112
2.4028
2.5740
2.5757
2.5666
2.4896
AIC
6.6816
6.7798
6.3126
5.4329
4.2778
2.9346
2.1463
2.1383
2.1804
2.5351
NUMBER OF
FACTORS
1
1
1
1
1
1
1
1
1
1
AIC = -2*ln(L(Z))+2n The smaller the AIC the better the fit
n = number of parameters – number of constraints
Factors will be explained in a little while.
There is more information in the output file including
parameter estimates. However, the estimates of locus
specific additive variance and narrow sense
heritability obtained from genome wide scans are
upwardly biased. Therefore these estimates could lead
one to over estimate the importance of the QTL in
determining trait values (Goring et al, 2001, AJHG 69:13571369).
The model we have been considering is very simple:
(1)When examining large pedigrees, it may be possible to
consider more realistic models for the environmental
covariance.
Example: Modeling common environmental effects using a
household indicator. H=1 if i and j are members of the same
household and H=0 if i and j are not members of the same
household.
ˆ V  ˆ V  2 V  H V
VP  2
ij A
ij D
ij G
ij c
(2) The variance component model can use more than one
quantitative trait at once as the outcome.
Using more than one quantitative trait in the analysis
• The model extends so that multiple traits can be
considered at the same time.
• The phenotypic variance is now a matrix.
• The variance components get more complicated.
Instead of one term per variance component, there
are (1+…+n) = (n+1)*n/2 terms where n is the
number of quantitative traits.
• As an example, consider two traits X and Y.
 VPX VPXY   VgX VgXY   VAX VAXY   VeX VeXY 
  


  
  

VPXY VPY  VgXY VgY  VAXY VAY  VeXY VeY 
For technical reasons it is better to reparameterize
the variances using factor analytic approach
• Factor refers to hidden underlying variables that capture the
essence of the data
• Each variance component is parameters in terms of factors.
• We will illustrate with the additive genetic variance matrix
for two traits X and Y (in principle any number of traits or
any of the components could have been used).
0  such that:
• There exists a matrix   A1


 A12  A 2 
VAX   A21 ,VAXY   A1 A12 ,VAY   A212   A22
Factors can be used to search for pleiotropic
effects?
Could a single factor explain QTL variance component?
A single factor is consistent with pleiotropy although there
may be other explanations a single factor.
When are more than two traits we could have reduced
numbers of factors.
Reduction in Parameters
Recall the original factor matrix for the QTL
Set A2 = 0
VAX   ,VAXY   A1 A12 ,VAY  
2
A1
2
A12
Modifications to the control file
QUANTITATIVE_TRAIT = Trait1
QUANTITATIVE_TRAIT = Trait2
PREDICTOR = Grand :: Trait1
PREDICTOR = SEX :: Trait1
PREDICTOR = AGE :: Trait1
PREDICTOR = BMI :: Trait1
PREDICTOR = Grand :: Trait2
PREDICTOR = SEX :: Trait2
PREDICTOR = AGE :: Trait2
PREDICTOR = BMI :: Trait2
COVARIANCE_CLASS = Additive
COVARIANCE_CLASS = Environmental
COVARIANCE_CLASS = Qtl
One factor explains the results as well as two
MARKER
Marker01
Marker02
---
MAP
DISTANCE
0.0000
0.0010
0.0050
0.0100
LOCATION
SCORE
1.6533
1.6492
1.7558
1.9508
AIC
NUMBER OF
FACTORS
1
1
1
1
24.3863
24.4052
23.9143
23.0161
0.0931
0.0941
0.5852
0.5111
29.3049
29.6464
1
1
0.0000
0.0010
0.0050
0.0100
1.6605
1.6520
1.7568
1.9508
26.3529
26.3925
25.9098
25.0162
2
2
2
2
0.0931
0.0941
0.5914
0.5179
31.2764
31.6149
2
2
.
.
.
Marker10
Marker11
2 factors:
Marker01
Marker02
--.
.
.
Marker10
Marker11
Summary
• Variance component models can be used to
understand the correlations among traits in
families
• They can also be used to map QTLs
• Variance component models provide a
powerful approach for multivariate
quantitative trait data.

No Slide Title

Transcript No Slide Title

Directory