Introduction to Genetics

Download Report

Transcript Introduction to Genetics

Introduction to Genetic
Analysis
Bruce Walsh
[email protected]
Ecology and Evolutionary Biology,
University of Arizona
Adjunct Appointments
Molecular and Cellular Biology
Plant Sciences
Epidemiology & Biostatistics
Animal Sciences
Outline
• Mendelian Genetics
–
–
–
–
Genes, Chromosomes & DNA
Mendel’s laws
Linkage
Linkage disequilibrium
–
–
–
–
Fisher’s decomposition of Genetic value
Fisher decomposition of Genetic Variances
Resemblance between relatives
Searching for the underlying genes
• Quantitative Genetics
Mendelian Genetics
Following a single (or several)
genes that we can directly score
Phenotype highly informative
as to genotype
Mendel’s Genes
Genes are discrete particles, with each parent passing
one copy to its offspring.
Let an allele be a particular copy of a gene. In Diploids,
each parent carries two alleles for every gene, one
from each parent
Each parent contributes one of its two alleles (at
random) to its offspring
For example, a parent with genotype Aa (a heterozygote
for alleles A and a) has a 50% probability of passing an
A allele onto its offspring and a 50% probability of
passing along an a allele.
Example: Pea seed color
Mendel
YY (Y found
homozygote)
that his-->
pea
yellow
lines phenotype
differed in seed color,
with
Yg a(heterozygote)
single locus (with
--> alleles
yellowYphenotype
and g) determining
green
gg (gvs.
homozygote)
yellow
--> green phenotype
Note that in this simple case, each genotype maps
Y is dominant to g, g is recessive to Y
to a single phenotype
Likewise, the phenotype can tell us about the underlying
Genotype. Green = gg, Yellow = carries Y allele (Y-)
Cross Yg x Yg. Offspring are 1/4 YY, 1/2 Yg, 1/4 gg
3/4 yellow peas, 1/4 green peas
Cross Yg x gg. Offspring are 1/2Yg, 1/2 gg,
1/2 yellow, 1/2 green
Dealing with two (or more) genes
For 7 pea traits, Mendel observed Independent Assortment
The genotype at one locus is independent of the second
RR, Rr - round seeds, rr - wrinkled seeds
YY, Yg - yellow seeds, gg - green seeds
Pure round, green (RRgg) x pure wrinkled yellow (rrYY)
F1 --> RrYg = all round, yellow (Rg/rY)
What about the F2?
Let R- denote RR and Rr. R- are round. Note in F2,
Pr(R-) = 1/2 + 1/4 = 3/4, Pr(rr) = 1/4
Likewise, Y- are YY or Yg, and are yellow
Phenotype
Genotype
Frequency
Yellow, round
Y-R-
(3/4)*(3/4) = 9/16
Yellow, wrinkled
Y-rr
(3/4)*(1/4) = 3/16
Green, round
ggR-
(1/4)*(3/4) = 3/16
Green, wrinkled
ggrr
(1/4)*(1/4) = 1/16
Or a 9:3:3:1 ratio
Mendel was wrong: Linkage
Bateson and Punnet looked at
flower color: P (purple) dominant over p (red )
pollen shape: L (long) dominant over l (round)
PPLL x ppll --> PL/pl F1
Phenotype Genotype
Observed
Expected
Purple long
284
215
Purple round P-ll
21
71
Red long
ppL-
21
71
Red round
ppll
55
24
P-L-
Excess of PL, pl gametes over Pl, pL
Departure from independent assortment -- why?
Chromosomal theory of inheritance
Early light microscope work on dividing cells revealed
small (usually) rod-shaped structures that appear to
pair during cell division. These are chromosomes.
It was soon postulated that Genes are carried
on chromosomes, because chromosomes behaved in a
fashion that would generate Mendel’s laws.
We now know that each chromosome consists of a
single double-stranded DNA molecule (covered with
proteins), and it is this DNA that codes for the genes.
Linkage
If genes are located on different chromosomes they
(with very few exceptions) show independent assortment.
Indeed, peas have only 7 chromosomes, so was Mendel lucky
in choosing seven traits at random that happen to all
be on different chromosomes? Ans: compute this probability.
However, genes on the same chromosome, especially if
they are close to each other, tend to be passed onto
their offspring in the same configuration as on the
parental chromosomes.
Consider the Bateson-Punnet pea data
Let PL / pl denote that in the parent, one chromosome
carries the P and L alleles (at the flower color and
pollen shape loci, respectively), while the other
chromosome carries the p and l alleles.
Unless there is a recombination event, one of the two
parental chromosome types (PL or pl) are passed onto
the offspring. These are called the parental gametes.
However, if a recombination event occurs, a PL/pl
parent can generate Pl and pL recombinant gametes
to pass onto its offspring.
Linkage --> excess of parental gametes
Let c (or q) denote the recombination frequency --- the
probability that a randomly-chosen gamete from the
parent is of the recombinant type (i.e., it is not a
parental gamete).
For a PL/pl parent, the gamete frequencies are
Gamete type
Frequency
Expectation under
independent assortment
PL
(1-c)/2
1/4
pl
(1-c)/2
1/4
pL
c/2
1/4
Pl
c/2
1/4
2,
In Parental
Bateson data,
Freq(ppll)
=deficiency,
55/381as
=0.144.
Freq(ppll)
=for
[(1-c)/2]
Recombinant
gametes
gametes
ininexcess,
(1-c)/2
as c/2
> <1/4
1/4for
c c< <1/2
1/2
Solving gives c = 0.24
Linkage is our friend
While linkage (at first blush) may seem a complication, it
is actually our friend, allowing us to map genes --determining which genes are on which chromosomes and
also fine-mapping their position on a particular chromosome
Historically, the genes that have been mapped have
direct effects on phenotypes (pea color, fly eye color,
any number of simple human diseases, etc. )
In the molecular era, we are often concerned with
molecular markers, variations in the DNA sequence that
typically have no effect on phenotype
Genetic Maps and Mapping Functions
The unit of genetic distance between two markers is
the recombination frequency, c (also called q)
If the phase of a parent is AB/ab, then 1-c is the
frequency of “parental” gametes (e.g., AB and ab),
while c is the frequency of “nonparental” gametes
(e.g.. Ab and aB).
A parental gamete results from an EVEN number of
crossovers, e.g., 0, 2, 4, etc.
For a nonparental (also called a recombinant) gamete,
need an ODD number of crossovers between A & b
e.g., 1, 3, 5, etc.
Hence, simply using the frequency of “recombinant”
(i.e. nonparental) gametes UNDERESTIMATES
the m number of crossovers, with E[m] > c
In particular, c = Prob(odd number of crossovers)
Mapping functions attempt to estimate the expected
number of crossovers m from observed recombination
frequencies c
When considering two linked loci, the phenomena
of interference must be taken into account
The presence of a crossover in one interval typically
decreases the likelihood of a nearby crossover
Suppose the order of the genes is A-B-C.
If there is no interference (i.e., crossovers occur
independently of each other) then
cA C = cA B (1 ° cB C ) + (1 ° cA B ) cB C = cA B + cB C ° 2cA B cB C
Probability(odd number of crossovers btw A and C)
Odd
We need
number
Even
tonumber
assume
of crossovers
inindependence
A-B, btw
odd number
A &ofB crossovers
and
in B-C
even in
number
orderinterference
tobetween
multiplyBthese
&
two probabilities
When
is Cpresent,
we can write this as
cA C = cA B + cB C ° 2(1 ° ±)cA B cB C
Interference parameter
d=0
No interference.
Crossovers
occur of
1 --> complete
interference:
The presence
of each nearby
other crossovers
aindependently
crossover eliminates
Mapping functions. Moving from c to m
Haldane’s mapping function (gives Haldane map
distances)
Assume
the the
numberk k of crossovers
in a region
This
makes
of NO INTERFERENCE
Pr(Poisson
= k) assumption
= l Exp[-l]/k!
follows a Poisson distribution with parameter m
l = expected number of successes
1
X1
X
m 2k + 1
1 °- e° 2m
-° m
c=
p(m; 2k + 1) = e
=
(2k + 1)!
2
k= 0
k= 0
Odd number
Prob(Odd number
of crossovers)
This gives the estimated Haldane distance as
ln(1 ° 2c)
m= °
2
Usually
in m
units
of Morgans
or m
Centimorgans
(Cm)
Onereported
morgan -->
= 1.0.
One Cm -->
= 0.01
Molecular Markers
You and your neighbor differ at roughly 22,000,000
nucleotides (base pairs) out of the roughly 3 billion
bp that comprises the human genome
Hence, LOTS of molecular variation to exploit
SNP -- single nucleotide polymorphism. A particular
position on the DNA (say base 123,321 on chromosome 1)
that has two different nucleotides (say G or A) segregating
STR -- simple tandem arrays. An STR locus consists of
a number of short repeats, with alleles defined by
the number of repeats. For example, you might have
6 and 4 copies of the repeat on your two chromosome 7s
SNPs
SNPs vs STRs
Cons: Less polymorphic (at most 2 alleles)
Pros: Low mutation rates, alleles very stable
Excellent for looking at historical long-term
associations (association mapping)
STRs
Cons: High mutation rate
Pros: Very highly polymorphic
Excellent for linkage studies within an extended
Pedigree (QTL mapping in families or pedigrees)
Linkage disequilibrium
At LE, alleles in gametes are independent of each other:
freq(AB
C)) == freq(A)
) freq(C)
freq(AB
freq(A)freq(B
freq(B
)
When linkage disequilibrium (LD) present, alleles are no
longer independent --- knowing that one allele is in the
gamete provides information on alleles at other loci
freq(AB ) 6
= freq(A) freq(B )
The disequilibrium between alleles A and B is given by
D A B = freq(AB ) ° freq(A) freq(B )
Forces that Generate LD
•
•
•
•
•
Selection
Drift
Migration (admixture)
Mutation
Population structure (stratification)
The Decay of Linkage Disequilibrium
The frequency of the AB gamete is given by
freq(AB ) = freq(A) freq(B ) + D A B
Departure from
If recombination frequency
between theLE
A and B loci
LE value
is c, the disequilibrium in generation t is
D (t ) = D (0)(1 ° c)
t
Note that D(t) ->Initial
zero, LD
although
value the approach can be
slow when c is very small
Not surprising that very tightly-linked markers
(c <<0.01) are often in LD
Key Mendelian Concepts
• Genes, Chromosomes & DNA
• “Classical” vs Molecular markers
• Linkage
– Parental gametes in excess. Alleles at nearby
loci tend to segregate together
• Linkage disequilibrium (LD)
– Excess of parental gametes seen in any
particular cross
– LD implies in the population that there is a nonrandom association of allele
– Unlinked alleles can show LD due to population
structure
Quantitative Genetics
The analysis of traits whose
variation is determined by both
a number of genes and
environmental factors
Phenotype is highly uninformative as to
underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation
in the trait
• May be a single gene strongly influenced by
environmental factors
• May be the result of a number of genes of equal
(or differing) effect
• Most likely, a combination of both multiple genes
and environmental factors
• Example: Blood pressure, cholesterol levels
– Known genetic and environmental risk factors
• Molecular traits can also be quantitative traits
– mRNA level on a microarray analysis
– Protein spot volume on a 2-D gel
Consider Phenotypic
a specific locus
influencing
trait
distribution
of athe
trait
For this locus, mean phenotype = 0.15, while
overall mean phenotype = 0
Goals of Quantitative Genetics
• Partition total trait variation into genetic (nature)
vs. environmental (nurture) components
• Predict resemblance between relatives
– If a sib has a disease/trait, what are your odds?
• Find the underlying loci contributing to genetic
variation
– QTL -- quantitative trait loci
• Deduce molecular basis for genetic trait variation
• eQTLs -- expression QTLs, loci with a quantitative
influence on gene expression
– e.g., QTLs influencing mRNA abundance on a microarray
Dichotomous (binary) traits
Presence/absence traits (such as a disease) can
(and usually do) have a complex genetic basis
Consider a disease susceptibility (DS) locus underlying a
disease, with alleles D and d, where allele D significantly
increases your disease risk
In particular, Pr(disease | DD) = 0.5, so that the
Penetrance of genotype DD is 50%
Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05
dd individuals can rarely display the disease, largely
because of exposure to adverse environmental conditions
dd individuals can give rise to phenocopies 5% of the time,
showing the disease but not as a result of carrying the
risk allele
If freq(d) = 0.9, what is Prob (DD | show disease) ?
freq(disease) = 0.12*0.5 + 2*0.1*0.9*0.2 + 0.92*0.05
= 0.0815
From Bayes’ theorem,
Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease)
= 0.12*0.5 / 0.0815 = 0.06 (6 %)
Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497
Thus about 50% of the diseased individuals are phenocopies
Basic model of Quantitative Genetics
Basic model: P = G + E
Genotypic
Phenotypic
Environmental
valuevalue
-- we will
value
occasionally
also use z for
this
G = average phenotypic
value
forvalue
that genotype
if we are able to replicate it over the universe
of environmental values, G = E[P]
G x E interaction --- G values are different
across environments. Basic model now
becomes P = G + E + GE
Contribution of a locus to a trait
Q1Q1
Q2Q1
Q2Q2
C
C
C -a
C + a(1+k)
C+a+d
C+d
C + 2a
C + 2a
C+a
d measures
dominance,
with
dG(Q
=+ 0
if) the
heterozygote
d = ak =G(Q
) - [G(Q
Q
)
G(Q
Q
)
]/2
2a
1Q
=2 G(Q
Q
)
2
2
Q
1
1
2 2
1 1
is exactly intermediate to the two homozygotes
k = d/a is a scaled measure of the dominance
Example: Apolipoprotein E &
Alzheimer’s
Genotype
ee
Average age of onset 68.4
Ee
EE
75.5
84.3
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
k = d/a = 0.10
Only small amount of dominance
Covariances
• Cov(x,y) = E [x*y] - E[x]*E[y]
Cov(x,y)
Cov(x,y)
>=<0,
0,
negative
(linear)
(linear)
association
association
between
between
Cov(x,y)
Cov(x+y,z)
Cov(x,y)
0,positive
no
==0Cov(x,z)
linear
DOES
association
NOT
+ Cov(y,z)
imply
between
no assocation
x & y x x&&y y
Cov(x,x) = Var(x)
cov(X,Y)
cov(X,Y)
> 0=<00= 0
cov(X,Y)
cov(X,Y)
Var(x+y) = Cov(x+y,x+y)
Y Y
Y Y
= Cov(x,x) + Cov(y,y) + 2Cov(x,y)
= Var(x) + Var(y)
+ 2 Cov(x,y)
X X X
X
Fisher’s (1918) Decomposition of G
One of Fisher’s key insights was that the genotypic value
consists of a fraction that can be passed from parent to
offspring and a fraction that cannot.
Consider the genotypic value Gij resulting from an
Gi j = πG + Æi + Æj + ±i j
AiAj individual
Xdifference (for genotype
Dominance
deviations
--the
Mean
value,
with
Average
Since
parents
contribution
passpredicted
along
toG genotypic
for
their
allele
i
π
=single
Galleles
¢freq(Q
i j value
iQ
j )
The
genotypic
value
from
the to
individual
Aioffspring,
Aj) between
the
genotypic value predicted from the
the
allelic
effects
isathus
i (the average effect of allele i)
b iactual
two
single alleles
the
genotypic
value,
G
Æj
represent
theseand
contributions
j = π
G + Æi +
bi j = ±i j
Gi j ° G
Fisher’s decomposition is a Regression
Gi j = πG + Æi + Æj + ±i j
Predicted
valueResidual
A notational change
clearly shows
this is a error
regression,
Gi j = πG + 2Æ1 + (Æ2 ° Æ1)N + ±i j
IndependentIntercept
(predictor)
variable Nslope
=Regression
# of Q2 alleles
residual
8Regression
>
< 2Æ1
2Æ1 + (Æ2 ° Æ1)N = Æ1 + Æ1
>
: 2Æ
1
forN = 0; e.g, Q1Q1
forN = 1; e.g, Q1Q2
forN = 2; e.g, Q2Q2
Allele
Q112 common,
a
common,
a21 > a12 a1 = a2 = 0
Both Q
and
Q2 frequent,
G21
Slope = a2 - a1
G22
G
G11
0
1
N
2
Consider a diallelic locus, where p1 = freq(Q1)
Genotype
Q1Q1
Q2Q1
Q2Q2
Genotypic
value
0
a(1+k)
2a
Mean
Allelic effects
πG = 2p2 a(1 + p1 k)
Æ2 = p1 a [ 1 + k ( p1 ° p2 ) ]
Æ1 = ° p2a [ 1 + k ( p1 ° p2 ) ]
Dominance deviations
±i j = Gi j ° πG ° Æi ° Æj
Average effects and Additive Genetic Values
The a values are the average effects of an allele
A key concept is the Additive Genetic Value (A) of
an individual
X ≥
n
( Æ+
AA(G=i j ) =
(k)
Æi i+
¥
Æj
)
(k )
Æk
k= 1
Why all the fuss over A?
Suppose father has A = 10 and mother has A = -2
for (say) blood pressure
KEY:
parentsblood
only pass
single
to their offspring.
Expected
pressure
inalleles
their offspring
is (10-2)/2
Hence,
theyabove
only pass
the Amean.
part of
their genotypic
= 4 units
the along
population
Offspring
A=
Value
G
Average
of parental A’s
Genetic Variances
Gi j = πg + (Æi + Æj ) + ±i j
2n
2
æ2 (G) = æ2 (πg X
+n (Æi + Æ
(Æ
+
Æ
)
+
æ
(±
jk ) + ±i jk) = æ
i
j
ij )
X
k
( )
( )
2
2
2 ( )
æ (G) =
æ (Æi + Æj As) +Cov(a,d)
æ=(±0i j )
k= 1
2
æG
k= 1
=
2
æA
+
2
æD
Dominance
Genetic Variance
Additive Genetic
Variance
(or simplyVariance)
dominance variance)
(or simply Additive
Key concepts (so far)
•
ai = average effect of allele i
– Property of a single allele in a particular population
(depends on genetic background)
• A = Additive Genetic Value (A)
– A = sum (over all loci) of average effects
– Fraction of G that parents pass along to their offspring
– Property of an Individual in a particular population
• Var(A) = additive genetic variance
– Variance in additive genetic values
– Property of a population
• Can estimate A or Var(A) without knowing any of
the underlying genetical detail (forthcoming)
æ2A = 2E [Æ2 ] = 2
Xm
Æ2i pi
i= 1
One locus, 2 alleles:
Q1Q1
Q1Q2
Q2Q2
Since E[a] = 0,
2]
Var(a)0= E[(aa(1+k)
-ma)2] = E[a2a
æA2 = 2p1 p2 a2 [ 1+ k ( p1 ° p2 ) ]2
When dominance present,
Dominance effects
asymmetric function of allele
m m
additive variance
X X 2
2
2
æD = 2E [± ] =
±i j pi pj frequencies
i=1 j=1
One locus, 2 alleles:
æD2 = (2p1 p2 ak)2
Equals zero
if k = of
0
This is a symmetric
function
allele frequencies
Additive variance, VA, with no dominance (k = 0)
VA
Allele frequency, p
Complete dominance (k = 1)
VA
VD
Allele frequency, p
Epistasis
Gi j kl = πG + (Æi + Æj + Æk + Æl ) + (±i j + ±k j )
+ (ÆÆi k + ÆÆi l + ÆÆj k + ÆÆj l )
+ (Ʊi k l + Ʊj k l + Ʊki j + Ʊl i j )
+ (±±i j k l )
= πG + A + D + AA + AD + DD
Additive
Additive
Dominance
xx Additive
Dominant
-interactions
interactions
interaction
---- --Dominance
x value
dominance
Additive
Genetic
value
These components
are
defined
to
be interaction
uncorrelated,
interactions
interactions
between
between
between
two
alleles
aansingle
allele
at dominance
aallele
at
locus
one
the
interaction
between
the
(or orthogonal),
so
that the
at
locus
onewith
locus
the
with
genotype
a single
at
allele
another,
another
e.g.
deviation
at
one
locus
with
theat
dominance
B2
deviation
at genotype
another.
kj
2
2 allele
2Ai and
2
2
æG = æA + æD + æA A + æA D + æD D
Heritability
• Central concept in quantitative genetics
• Proportion of variation due to additive genetic
values
– h2 = VA/VP
– Phenotypes (and hence VP) can be directly measured
– Breeding values (and hence VA ) must be estimated
• Estimates of VA require known collections of
relatives
Key observations
• The amount of phenotypic resemblance
among relatives for the trait provides an
indication of the amount of genetic
variation for the trait.
• If trait variation has a significant genetic
basis, the closer the relatives, the more
similar their appearance
Genetic Covariance between
relatives
Sharing
meansarise
having
allelestwo
thatrelated
are
Genetic alleles
covariances
because
Father
Mother
identical
by are
descent
both
copies
of than
individuals
more(IBD):
likely to
share
alleles
can two
be traced
backindividuals.
to a single copy in a
are
unrelated
recent common ancestor.
One
allele IBD
IBD
No alleles Both
IBD alleles
Parent-offspring genetic covariance
Cov(Gp, Go) --- Parents and offspring share
EXACTLY one allele IBD
Denote this common allele by A1
Gp = A p + D p = Æ1 + Æx + D 1x
Go = A o + D o = Æ1 + Æy + D 1y
IBD allele
Non-IBD alleles
C ov(G o; G p ) = Cov(Æ1 + Æx + D 1x ; Æ1 + Æy + D 1y
= Cov(Æ1; Æ1) + Cov(Æ1 ; Æy ) + Cov(Æ1 ; D 1y )
+ Cov(Æx ; Æ1 ) + Cov(Æx ; Æy ) + Cov(Æx ; D 1y )
+ Cov(D 1x ; Æ1) + Cov(D 1x ; Æy ) + Cov(D 1x ; D 1y )
All white covariance terms are zero.
• By construction, a and D are uncorrelated
• By construction, a from non-IBD alleles are
uncorrelated
• By construction, D values are uncorrelated unless
both alleles are IBD
Ω
Cov(Æx ; Æy ) =
0
V ar (A)=2
if x 6
= y; i.e., not IBD
if x = y; i.e., IBD
ar (A) =one
V ar
(Æ1 IBD
+ Æ2have
) = 2V
Hence, relativesVsharing
allele
a ar (Æ1 )
genetic covariance of Var(A)/2
so t hat
V ar (Æ1 ) = Cov(Æ1 ; Æ1 ) = Var (A )=2
The resulting parent-offspring genetic covariance
becomes Cov(Gp,Go) = Var(A)/2
Half-sibs
Each sib gets
exactly one allele
from common
father,
different alleles
from the different
mothers
2
1
o
1
o
2
Hence, the genetic
The half-sibs
covariance
share
of half-sibs
no
onealleles
alleleisIBD
just
(1/2)Var(A)/2 •= Var(A)/4
occurs with probability 1/2
Full-sibs
Father
Mother
Each sib gets
exact one allele
from each parent
Full Sibs
not IBD
[ Prob = 1/2
]
Paternal allele
[ Prob
Prob(exactly
oneIBD
allele
IBD)==1/2
1/2]
not IBD
[ Prob
= 1/2
[ Prob
= 1/2
] ]
= Maternal
1- Prob(0 allele
IBD) -IBD
Prob(2
IBD)
Prob(zero alleles IBD) = 1/2*1/2 = 1/4
-> Prob(both
Resulting Genetic Covariance between full-sibs
IBD alleles
IBD alleles
0
1
2
Probability
Probability
Contribution
Contribution
1/4
0
1
1/21/2
Var(A)/2
Var(A)/2
2
1/4
0
1/4
1/4
0
Var(A) + Var(D)
Var(A) + Var(D)
Cov(Full-sibs) = Var(A)/2 + Var(D)/4
Genetic Covariances for General Relatives
Let r = (1/2)Prob(1 allele IBD) + Prob(2 alleles IBD)
Let u = Prob(both alleles IBD)
General genetic covariance between relatives
Cov(G) = rVar(A) + uVar(D)
When epistasis is present, additional terms appear
r2Var(AA) + ruVar(AD) + u2Var(DD) + r3Var(AAA) +
Shared environmental values
Cov(P1, P2) = Cov(G1+E1, G2+E2) = Cov(G1,G2), + Cov(E1,E2)
In human, relatives (esp. family members) often
share environments as well as sharing genes
Shared material effects potentially important as well
Sample Covariances
Cov(monozygotic twins) = VA + VD + Cov(E)
Cov(dizygotic twins) = VA/2 + VD/4 + Cov(E)
Cov(parent, offspring) = VA/2
Hence, can estimate genetic variance components
From phenotypic covariances using known sets of relatives
More generally, use all comparisons between relatives in
a complex pedigree (REML estimate of variances)
Relative risks for binary traits
Let z1 and z2 denote the trait state (0,1) in two
relatives.
Recurrence risk, KR (for relatives of type R)
= Prob(z2 =1 | z1 = 1)
James’ identity: KR = K + Cov(z1,z2)/K
where K = Prob(z=1), i.e., the population prevalence
Relative risk, lR = KR/K
Risch’s identity: lR = 1 + Cov(z1,z2)/K2
Searching for QTLs: Marker-Trait Associations
I. Within a pedigree
Key: With linkage = excess of parential gametes
MQ/mq father -- M associated with QTL allele
Q (which increases trait value over q). Comparing
mean trait values in offspring for paternal-M vs.
paternal-m will show (for sufficiently large sample)
a significant difference.
Since the phase may differ across parents (e.g.,
mother might be Mq/mQ), critical to contrast
markers alleles from each parent separately
Searching for QTLs: Marker-Trait Associations
II. Population-level linkage disequilibrium
Key: With LD, covariance between alleles
For very tightly-linked markers (less than 1 cM), might
expect some population-level disequilibrium
Hence, can contrast (say) M vs. m grouped over all
individuals to look for a difference in trait value btw
the two groups.
If marker locus is sufficiently close to a QTL, LD might
be present and an marker-trait association detected.
Complication: Population structure can generate a
covariance btw unlinked markers
Key concepts
• P=G+E=A+D+I+E
• Var(G) = Var(A) + Var(D) + Var(I)
• Phenotypic covariances can be used to
estimate components of Var(G)
• h2 = Var(A)/Var(P) is the heritability of a
trait, measure of how parents & offspring
resemble each other
• Can use linkage (within a pedigree) or
linkage disequilibrium (within a population)
to search for QTLs via marker-trait
associations