Introduction to Genetics

Download Report

Transcript Introduction to Genetics

Introduction to Basic and
Quantitative Genetics
Darwin & Mendel
• Darwin (1859) Origin of Species
– Instant Classic, major immediate impact
– Problem: Model of Inheritance
•
•
•
•
Darwin assumed Blending inheritance
Offspring = average of both parents
zo = (zm + zf)/2
Fleming Jenkin (1867) pointed out problem
– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is
removed each generation and this must somehow be
replenished by mutation.
Mendel
• Mendel (1865), Experiments in Plant
Hybridization
• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread
copy in his library
– Why ignored? Perhaps too mathematical for
19th century biologists
• The rediscovery in 1900 (by three
independent groups)
• Mendel’s key idea: Genes are discrete
particles passed on intact from parent to
offspring
Mendel’s experiments with the Garden Pea
7 traits examined
Mendel crossed a pure-breeding yellow pea line
with a pure-breeding green line.
Let P1 denote the pure-breeding yellow (parental line 1)
P2 the pure-breed green (parental line 2)
The F1, or first filial, generation is the cross of
P1 x P2 (yellow x green).
All resulting F1 were yellow
The F2, or second filial, generation is a cross of two F1’s
In F2, 1/4 are green, 3/4 are yellow
This outbreak of variation blows the theory of blending
inheritance right out of the water.
Mendel also observed that the P1, F1 and F2 Yellow
lines behaved differently when crossed to pure green
P1 yellow x P2 (pure green) --> all yellow
F1 yellow x P2 (pure green) --> 1/2 yellow, 1/2 green
F2 yellow x P2 (pure green) --> 2/3 yellow, 1/3 green
Mendel’s explanation
Genes are discrete particles, with each parent passing
one copy to its offspring.
Let an allele be a particular copy of a gene. In Diploids,
each parent carries two alleles for every gene
Pure Yellow parents have two Y (or yellow) alleles
We can thus write their genotype as YY
Likewise, pure green parents have two g (or green) alleles
Their genotype is thus gg
Since there are lots of genes, we refer to a particular gene
by given names, say the pea-color gene (or locus)
Each parent contributes one of its two alleles (at
random) to its offspring
Hence, a YY parent always contributes a Y, while
a gg parent always contributes a g
An individual carrying only one type of an allele
(e.g. yy or gg) is said to be a homozygote
In the F1, YY x gg --> all individuals are Yg
An individual carrying two types of alleles is
said to be a heterozygote.
The phenotype of an individual is the trait value we
observe
For this particular gene, the map from genotype to
phenotype is as follows:
YY --> yellow
Yg --> yellow
gg --> green
Since the Yg heterozygote has the same phenotypic
value as the YY homozygote, we say (equivalently)
Y is dominant to g, or
g is recessive to Y
Explaining the crosses
F1 x F1 -> Yg x Yg
Prob(YY) = yellow(dad)*yellow(mom) = (1/2)*(1/2)
Prob(gg) = green(dad)*green(mom) = (1/2)*(1/2)
Prob(Yg) = 1-Pr(YY) - Pr(gg) = 1/2
Prob(Yg) = yellow(dad)*green(mom) + green(dad)*yellow(mom)
Hence, Prob(Yellow phenotype) = Pr(YY) + Pr(Yg) = 3/4
Prob(green phenotype) = Pr(gg) = 1/4
Dealing with two (or more) genes
For his 7 traits, Mendel observed Independent Assortment
The genotype at one locus is independent of the second
RR, Rr - round seeds, rr - wrinkled seeds
Pure round, green (RRgg) x pure wrinkled yellow (rrYY)
F1 --> RrYg = round, yellow
What about the F2?
Let R- denote RR and Rr. R- are round. Note in F2,
Pr(R-) = 1/2 + 1/4 = 3/4
Likewise, Y- are YY or Yg, and are yellow
Phenotype
Genotype
Frequency
Yellow, round
Y-R-
(3/4)*(3/4) = 9/16
Yellow, wrinkled
Y-rr
(3/4)*(1/4) = 3/16
Green, round
ggR-
(1/4)*(3/4) = 3/16
Green, wrinkled
ggrr
(1/4)*(1/4) = 1/16
Or a 9:3:3:1 ratio
Probabilities for more complex genotypes
Cross AaBBCcDD X aaBbCcDd
What is Pr(aaBBCCDD)?
Under independent assortment,
= Pr(aa)*Pr(BB)*Pr(CC)*Pr(DD)
= (1/2*1)*(1*1/2)*(1/2*1/2)*(1*1/2) = 1/25
What is Pr(AaBbCc)?
= Pr(Aa)*Pr(Bb)*Pr(Cc) = (1/2)*(1/2)*(1/2) = 1/8
Mendel was wrong: Linkage
Bateson and Punnet looked at
flower color: P (purple) dominant over p (red )
pollen shape: L (long) dominant over l (round)
Phenotype Genotype
Observed
Expected
Purple long
284
215
Purple round P-ll
21
71
Red long
ppL-
21
71
Red round
ppll
55
24
P-L-
Excess of PL, pl gametes over Pl, pL
Departure from independent assortment
Linkage
If genes are located on different chromosomes they
(with very few exceptions) show independent assortment.
Indeed, peas have only 7 chromosomes, so was Mendel lucky
in choosing seven traits at random that happen to all
be on different chromosomes? Problem: compute this probability.
However, genes on the same chromosome, especially if
they are close to each other, tend to be passed onto
their offspring in the same configuation as on the
parental chromosomes.
Consider the Bateson-Punnet pea data
Let PL / pl denote that in the parent, one chromosome
carries the P and L alleles (at the flower color and
pollen shape loci, respectively), while the other
chromosome carries the p and l alleles.
Unless there is a recombination event, one of the two
parental chromosome types (PL or pl) are passed onto
the offspring. These are called the parental gametes.
However, if a recombination event occurs, a PL/pl
parent can generate Pl and pL recombinant chromosomes
to pass onto its offspring.
Let c denote the recombination frequency --- the
probability that a randomly-chosen gamete from the
parent is of the recombinant type (i.e., it is not a
parental gamete).
For a PL/pl parent, the gamete frequencies are
Gamete type
Frequency
Expectation under
independent assortment
PL
(1-c)/2
1/4
pl
(1-c)/2
1/4
pL
c/2
1/4
Pl
c/2
1/4
Recombinant
Parental gametes
gametesininexcess,
deficiency,
as (1-c)/2
as c/2> <1/4
1/4for
forc c< <1/2
1/2
Expected genotype frequencies under linkage
Suppose we cross PL/pl X PL/pl parents
What are the expected frequencies in their offspring?
Pr(PPLL) = Pr(PL|father)*Pr(PL|mother)
= [(1-c)/2]*[(1-c)/2] = (1-c)2/4
Likewise, Pr(ppll) = (1-c)2/4
Recall from previous data that freq(ppll) = 55/381 =0.144
Hence, (1-c)2/4 = 0.144, or c = 0.24
A (slightly) more complicated case
Again, assume the parents are both PL/pl.
Compute Pr(PpLl)
Two situations, as PpLl could be PL/pl or Pl/pL
Pr(PL/pl) = Pr(PL|dad)*Pr(pl|mom) + Pr(PL|mom)*Pr(pl|dad)
= [(1-c)/2]*[(1-c)/2] + [(1-c)/2]*[(1-c)/2]
Pr(Pl/pL) = Pr(Pl|dad)*Pr(pL|mom) + Pr(Pl|mom)*Pr(pl|dad)
= (c/2)*(c/2) + (c/2)*(c/2)
Thus, Pr(PpLl) = (1-c)2/2 + c2 /2
Generally, to compute the expected genotype
probabilities, need to consider the frequencies
of gametes produced by both parents.
Suppose dad = Pl/pL, mom = PL/pl
Pr(PPLL) = Pr(PL|dad)*Pr(PL|mom)
= [c/2]*[(1-c)/2]
Notation: when PL/pl, we say that alleles P and L
are in coupling
When parent is Pl/pL, we say that P and L are in repulsion
Molecular Markers
You and your neighbor differ at roughly 22,000,000
nucleotides (base pairs) out of the roughly 3 billion
bp that comprises the human genome
Hence, LOTS of molecular variation to exploit
SNP -- single nucleotide polymorphism. A particular
position on the DNA (say base 123,321 on chromosome 1)
that has two different nucleotides (say G or A) segregating
STR -- simple tandem arrays. An STR locus consists of
a number of short repeats, with alleles defined by
the number of repeats. For example, you might have
6 and 4 copies of the repeat on your two chromosome 7s
SNPs
SNPs vs STRs
Cons: Less polymorphic (at most 2 alleles)
Pros: Low mutation rates, alleles very stable
Excellent for looking at historical long-term
associations (association mapping)
STRs
Cons: High mutation rate
Pros: Very highly polymorphic
Excellent for linkage studies within an extended
Pedigree (QTL mapping in families or pedigrees)
Quantitative Genetics
The analysis of traits whose
variation is determined by both
a number of genes and
environmental factors
Phenotype is highly uninformative as to
underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation in the
trait
• May be a single gene strongly influenced by environmental
factors
• May be the result of a number of genes of equal (or
differing) effect
• Most likely, a combination of both multiple genes and
environmental factors
• Example: Blood pressure, cholesterol levels
– Known genetic and environmental risk factors
• Molecular traits can also be quantitative traits
– mRNA level on a microarray analysis
– Protein spot volume on a 2-D gel
Consider Phenotypic
a specific locus
influencing
trait
distribution
of athe
trait
For this locus, mean phenotype = 0.15, while
overall mean phenotype = 0
Basic model of Quantitative Genetics
Basic model: P = G + E
Genotypic value Environmental value
Phenotypic value -- we will occasionally
also use z for this value
G = average phenotypic value for that genotype
if we are able to replicate it over the universe
of environmental values, G = E[P]
G x E interaction --- G values are different
across environments. Basic model now
becomes P = G + E + GE
Contribution of a locus to a trait
Q1Q1
Q2Q1
Q2Q2
C
C
C -a
C + a(1+k)
C+a+d
C+d
C + 2a
C + 2a
C+a
d measures
dominance,
with
dG(Q
=+ 0
if) the
heterozygote
d = ak =G(Q
) - [G(Q
Q
)
G(Q
Q
)
]/2
2a
1Q
=2 G(Q
Q
)
2
2
Q
1
1
2 2
1 1
is exactly intermediate to the two homozygotes
k = d/a is a scaled measure of the dominance
Example: Apolipoprotein E &
Alzheimer’s
Genotype
ee
Average age of onset 68.4
Ee
EE
75.5
84.3
2a = G(EE) - G(ee) = 84.3 - 68.4 --> a = 7.95
ak =d = G(Ee) - [ G(EE)+G(ee)]/2 = -0.85
k = d/a = 0.10
Only small amount of dominance
Example: Booroola (B) gene
Genotype
Average Litter size
bb
Bb
BB
1.48
2.17
2.66
2a = G(BB) - G(bb) = 2.66 -1.46 --> a = 0.59
ak =d = G(Bb) - [ G(BB)+G(bb)]/2 = 0.10
k = d/a = 0.17
Fisher’s (1918) Decomposition of G
One of Fisher’s key insights was that the genotypic value
consists of a fraction that can be passed from parent to
offspring and a fraction that cannot.
Consider the genotypic value Gij resulting from an
Gi j = πG + Æi + Æj + ±i j
AiAj individual
Xdifference (for genotype
Dominance
deviations
--the
Mean
value,
with
Average
Since
parents
contribution
passpredicted
along
toG genotypic
for
their
allele
i
π
=single
Galleles
¢freq(Q
i j value
iQ
j )
The
genotypic
value
from
the to
individual
Aioffspring,
Aj) between
the
genotypic value predicted from the
the
allelic
effects
isathus
i (the average effect of allele i)
b iactual
two
single alleles
the
genotypic
value,
G
Æj
represent
theseand
contributions
j = π
G + Æi +
bi j = ±i j
Gi j ° G
Fisher’s decomposition is a Regression
Gi j = πG + Æi + Æj + ±i j
Predicted
valueResidual
A notational change
clearly shows
this is a error
regression,
Gi j = πG + 2Æ1 + (Æ2 ° Æ1)N + ±i j
IndependentIntercept
(predictor)
variable Nslope
=Regression
# of Q2 alleles
residual
8Regression
>
< 2Æ1
2Æ1 + (Æ2 ° Æ1)N = Æ1 + Æ1
>
: 2Æ
1
forN = 0; e.g, Q1Q1
forN = 1; e.g, Q1Q2
forN = 2; e.g, Q2Q2
Allele
Q112 common,
a
common,
a21 > a12 a1 = a2 = 0
Both Q
and
Q2 frequent,
G21
Slope = a2 - a1
G22
G
G11
0
1
N
2
Consider a diallelic locus, where p1 = freq(Q1)
Genotype
Q1Q1
Q2Q1
Q2Q2
Genotypic
value
0
a(1+k)
2a
Mean
Allelic effects
πG = 2p2 a(1 + p1 k)
Æ2 = p1 a [ 1 + k ( p1 ° p2 ) ]
Æ1 = ° p2a [ 1 + k ( p1 ° p2 ) ]
Dominance deviations
±i j = Gi j ° πG ° Æi ° Æj
Average effects and Additive Genetic Values
The a values are the average effects of an allele
A key concept is the Additive Genetic Value (A) of
an individual
X ≥
n
(k)
Æi i+
¥
Æj
(k )
Æk
AA(G=i j ) = Æ +
k= 1
Why all the fuss over A?
Suppose father has A = 10 and mother has A = -2
for (say) blood pressure
KEY:
parentsblood
only pass
single
to their offspring.
Expected
pressure
inalleles
their offspring
is (10-2)/2
Hence,
theyabove
only pass
the Amean.
part of
their genotypic
= 4 units
the along
population
Offspring
A=
Value
G
Average
of parental A’s
Genetic Variances
Gi j = πg + (Æi + Æj ) + ±i j
2n
2
æ2 (G) = æ2 (πg X
+n (Æi + Æ
(Æ
+
Æ
)
+
æ
(±
jk ) + ±i jk) = æ
i
j
ij )
X
k
( )
( )
2
2
2 ( )
æ (G) =
æ (Æi + Æj As) +Cov(a,d)
æ=(±0i j )
k= 1
2
æG
k= 1
=
2
æA
+
2
æD
Dominance
Genetic Variance
Additive Genetic
Variance
(or simplyVariance)
dominance variance)
(or simply Additive
Key concepts (so far)
•
ai = average effect of allele i
– Property of a single allele in a particular population (depends on
genetic background)
• A = Additive Genetic Value (A)
– A = sum (over all loci) of average effects
– Fraction of G that parents pass along to their offspring
– Property of an Individual in a particular population
• Var(A) = additive genetic variance
– Variance in additive genetic values
– Property of a population
• Can estimate A or Var(A) without knowing any of the
underlying genetical detail (forthcoming)
æ2A = 2E [Æ2 ] = 2
Xm
Æ2i pi
i= 1
One locus, 2 alleles:
Q1Q1
Q1Q2
Q2Q2
Since E[a] = 0,
2]
Var(a)0= E[(aa(1+k)
-ma)2] = E[a2a
æA2 = 2p1 p2 a2 [ 1+ k ( p1 ° p2 ) ]2
When dominance present,
Dominance effects
asymmetric function of allele
m m
additive variance
X X 2
2
2
æD = 2E [± ] =
±i j pi pj frequencies
i=1 j=1
One locus, 2 alleles:
æD2 = (2p1 p2 ak)2
Equals zero
if k = of
0
This is a symmetric
function
allele frequencies
Additive variance, VA, with no dominance (k = 0)
VA
Allele frequency, p
Complete dominance (k = 1)
VA
VD
Allele frequency, p
Epistasis
Gi j kl = πG + (Æi + Æj + Æk + Æl ) + (±i j + ±k j )
+ (ÆÆi k + ÆÆi l + ÆÆj k + ÆÆj l )
+ (Ʊi k l + Ʊj k l + Ʊki j + Ʊl i j )
+ (±±i j k l )
= πG + A + D + AA + AD + DD
Additive
Additive
Dominance
xx Additive
Dominant
-interactions
interactions
interaction
---- --Dominance
x value
dominance
Additive
Genetic
value
These components
are
defined
to
be interaction
uncorrelated,
interactions
interactions
between
between
between
two
alleles
aansingle
allele
at dominance
aallele
at
locus
one
the
interaction
between
the
(or orthogonal),
so
that the
at
locus
onewith
locus
the
with
genotype
a single
at
allele
another,
another
e.g.
deviation
at
one
locus
with
theat
dominance
B2
deviation
at genotype
another.
kj
2
2 allele
2Ai and
2
2
æG = æA + æD + æA A + æA D + æD D
Resemblance Between
Relatives
Heritability
• Central concept in quantitative genetics
• Proportion of variation due to additive genetic
values (Breeding values)
– h2 = VA/VP
– Phenotypes (and hence VP) can be directly measured
– Breeding values (and hence VA ) must be estimated
• Estimates of VA require known collections of
relatives
AncestralCollateral
relatives relatives,
e.g., parent
offspring
e.g.and
sibs
1
X
o
o
o.
1
1
2
..
o
3
k
2
X
o
o
o.
2
1
2
..
o
3
3
X
o
o
o.
1
2
..
k
3
o
3
k
Half-sibs
Full-sibs
1
1
n
2
o* o*
o* o*
o. * o. *
o*
o*
o. *
o* o*
o*
..
1
1
2
2
3
3
k
..
k
n
1
2
..
3
k
1
...
n
2
o* o*
o* o*
o. * o. *
o*
o*
o. *
o* o*
o*
..
1
1
2
2
3
3
k
..
k
1
2
..
3
k
Key observations
• The amount of phenotypic resemblance
among relatives for the trait provides an
indication of the amount of genetic
variation for the trait.
• If trait variation has a significant genetic
basis, the closer the relatives, the more
similar their appearance
Genetic Covariance between
relatives
Sharing
meansarise
having
allelestwo
thatrelated
are
Genetic alleles
covariances
because
Father
Mother
identical
by are
descent
both
copies
of than
individuals
more(IBD):
likely to
share
alleles
can two
be traced
backindividuals.
to a single copy in a
are
unrelated
recent common ancestor.
One
allele IBD
IBD
No alleles Both
IBD alleles
Parent-offspring genetic covariance
Cov(Gp, Go) --- Parents and offspring share
EXACTLY one allele IBD
Denote this common allele by A1
Gp = A p + D p = Æ1 + Æx + D 1x
Go = A o + D o = Æ1 + Æy + D 1y
IBD allele
Non-IBD alleles
C ov(G o; G p ) = Cov(Æ1 + Æx + D 1x ; Æ1 + Æy + D 1y
= Cov(Æ1; Æ1) + Cov(Æ1 ; Æy ) + Cov(Æ1 ; D 1y )
+ Cov(Æx ; Æ1 ) + Cov(Æx ; Æy ) + Cov(Æx ; D 1y )
+ Cov(D 1x ; Æ1) + Cov(D 1x ; Æy ) + Cov(D 1x ; D 1y )
All white covariance terms are zero.
• By construction, a and D are uncorrelated
• By construction, a from non-IBD alleles are
uncorrelated
• By construction, D values are uncorrelated unless
both alleles are IBD
Ω
Cov(Æx ; Æy ) =
0
V ar (A)=2
if x 6
= y; i.e., not IBD
if x = y; i.e., IBD
ar (A) =one
V ar
(Æ1 IBD
+ Æ2have
) = 2V
Hence, relativesVsharing
allele
a ar (Æ1 )
genetic covariance of Var(A)/2
so t hat
V ar (Æ1 ) = Cov(Æ1 ; Æ1 ) = Var (A )=2
The resulting parent-offspring genetic covariance
becomes Cov(Gp,Go) = Var(A)/2
Half-sibs
Each sib gets exactly one
allele from common father,
different alleles from the
different mothers
2
1
o
1
o
2
Hence, the genetic
The half-sibs
covariance
share
of half-sibs
no
onealleles
alleleisIBD
just
(1/2)Var(A)/2 •= Var(A)/4
occurs with probability 1/2
Full-sibs
Father
Mother
Each sib gets
exact one allele
from each parent
Full Sibs
not IBD
[ Prob = 1/2
]
Paternal allele
[ Prob
Prob(exactly
oneIBD
allele
IBD)==1/2
1/2]
not IBD
[ Prob
= 1/2
[ Prob
= 1/2
] ]
= Maternal
1- Prob(0 allele
IBD) -IBD
Prob(2
IBD)
Prob(zero alleles IBD) = 1/2*1/2 = 1/4
-> Prob(both
Resulting Genetic Covariance between full-sibs
IBD alleles
IBD alleles
0
1
2
Probability
Probability
Contribution
Contribution
1/4
0
1
1/21/2
Var(A)/2
Var(A)/2
2
1/4
0
1/4
1/4
0
Var(A) + Var(D)
Var(A) + Var(D)
Cov(Full-sibs) = Var(A)/2 + Var(D)/4
Genetic Covariances for General Relatives
Let r = (1/2)Prob(1 allele IBD) + Prob(2 alleles IBD)
Let u = Prob(both alleles IBD)
General genetic covariance between relatives
Cov(G) = rVar(A) + uVar(D)
When epistasis is present, additional terms appear
r2Var(AA) + ruVar(AD) + u2Var(DD) + r3Var(AAA) +
Components of the Environmental Variance
E = Ec + Es
The Environmental variance can thus be written
in terms of variance components as
Total
environmental
value value experienced
Common
environmental
Specific
environmental
value,
by all any
members
of
a family, e.g.,effects
shared
unique
environmental
E
Ec
Es
maternal
effects by the individual
experienced
One can decompose the environmental further, if
desired. For example, plant breeders have terms
for the location variance, the year variance, and the
location x year variance.
V =V
+V
Shared Environmental Effects contribute
to the phenotypic covariances of relatives
Cov(P1,P2) = Cov(G1+E1,G2+E2)
= Cov(G1,G2) + Cov(E1,E2)
Shared environmental values are expected
when sibs share the same mom, so that
Cov(Full sibs) and Cov(Maternal half-sibs)
not only contain a genetic covariance, but
an environmental covariance as well, VEc
Cov(Full-sibs) = Var(A)/2 + Var(D)/4 + VEc