Introduction to Genetics - Bruce Walsh's Home Page
Download
Report
Transcript Introduction to Genetics - Bruce Walsh's Home Page
Introduction to Genetics
Topics
• Darwin and Mendel
• Probability
• Mendelian genetics
– Mendel's experiments
– Mendel's laws
• Introduction to Population Genetics
• Introduction to Quantitative Genetics
Darwin & Mendel
• Darwin (1859) Origin of Species
– Instant Classic, major immediate impact
– Problem: Model of Inheritance
•
•
•
•
Darwin assumed Blending inheritance
Offspring = average of both parents
zo = (zm + zf)/2
Fleming Jenkin (1867) pointed out problem
– Var(zo) = Var[(zm + zf)/2] = (1/2) Var(parents)
– Hence, under blending inheritance, half the variation is
removed each generation and this must somehow be
replenished by mutation.
Mendel
• Mendel (1865), Experiments in Plant
Hybridization
• No impact, paper essentially ignored
– Ironically, Darwin had an apparently unread
copy in his library
– Why ignored? Perhaps too mathematical for
19th century biologists
• The rediscovery in 1900 (by three
independent groups)
• Mendel’s key idea: Genes are discrete
particles passed on intact from parent to
offspring
Probability & Genetics
Since genes are passed on at random, an understanding
of probability is critical to understanding genetics
Let A denote an event of interest (getting a head on
the flip of a coil, rolling a 5 on a dice, getting a QQ
genotype)
Let Pr(A) denote the probability that event A occurs
• Pr(A) falls between 0 and 1
• The sum of the probabilities for all (non-overlapping)
events is one --- Probabilities sum to one
• Pr(not A) = 1-Pr(A)
Example
Consider the offspring in a cross of two Qq parents
What is the probability that an offspring is
Anything EXCEPT qq?
Pr(not qq) = 1- Pr(qq) = 1-1/4 = 3/4
The AND Rule
Suppose the events A and B are independent --knowing that A has occurred does not change the
probability that B occurs.
The Pr(A AND B) = Pr(A)*Pr(B)
“AND Rule” -- if see “AND”, multiply probabilities
Pr(A AND B AND C) = Pr(A)*Pr(B)*Pr©
The OR Rule
Suppose the events A and B are mutually exclusive --Non-overlapping
For example, A = roll even number of dice, B = roll
A six are NOT mutually exclusive, but if B = roll 5 they
are
Pr(A OR B) = Pr(A) + Pr(B)
“OR Rule” --- see OR = add probabilities
Genetics examples
Again consider offspring from Qq x Qq cross
Prob(Not qq) = Pr(QQ or Qq) = Pr(QQ) + Pr(Qq) = 3/4
Pr(QQ) = Pr(Q from father AND Q from mother)
= Pr(Q from father)*Pr(Q from mother)
= (1/2)*(1/2) = 1/4
Pr(Qq) = Pr([f = Q AND m = q] OR [f = q AND m = q])
= Pr(f = Q AND m = q) + Pr(f = q AND m = q)
= Pr(f = Q)*Pr(m = q) + Pr(f = q )*Pr( m = q)
= (1/2)*(1/2) + (1/2)*(1/2) = 1/2
Conditional probability
Let Pr(A | B) = Pr(A) given that we observe event B
Pr(A | B) = Pr(A and B) / Pr(B) = Pr(A,B)/Pr(B)
Pr(A,B) is called the joint probability of A & B
Example: Suppose QQ and Qq give purple offspring,
While qq = green offspring. What is the probability
At a purple offspring from a Qq x Qq cross is QQ?
Pr(QQ | F1 Purple) = Pr(QQ and Purple)/Pr(Purple)
= (1/4)/(3/4) = 1/3
Mendel’s experiments with the Garden Pea
7 traits examined
Mendel crossed a pure-breeding yellow pea line
with a pure-breeding green line.
Let P1 denote the pure-breeding yellow (parental line 1)
P2 the pure-breed green (parental line 2)
The F1, or first filial, generation is the cross of
P1 x P2 (yellow x green).
All resulting F1 were yellow
The F2, or second filial, generation is a cross of two F1’s
In F2, 1/4 are green, 3/4 are yellow
This outbreak of variation blows the theory of blending
inheritance right out of the water.
Mendel also observed that the P1, F1 and F2 Yellow
lines behaved differently when crossed to pure green
P1 yellow x P2 (pure green) --> all yellow
F1 yellow x P2 (pure green) --> 1/2 yellow, 1/2 green
F2 yellow x P2 (pure green) --> 2/3 yellow, 1/3 green
Mendel’s explanation
Genes are discrete particles, with each parent passing
one copy to its offspring.
Let an allele be a particular copy of a gene. In Diploids,
each parent carries two alleles for every gene
Pure Yellow parents have two Y (or yellow) alleles
We can thus write their genotype as YY
Likewise, pure green parents have two g (or green) alleles
Their genotype is thus gg
Since there are lots of genes, we refer to a particular gene
by given names, say the pea-color gene (or locus)
Each parent contributes one of its two alleles (at
random) to its offspring
Hence, a YY parent always contributes a Y, while
a gg parent always contributes a g
An individual carrying only one type of an allele
(e.g. yy or gg) is said to be a homozygote
In the F1, YY x gg --> all individuals are Yg
An individual carrying two types of alleles is
said to be a heterozygote.
The phenotype of an individual is the trait value we
observe
For this particular gene, the map from genotype to
phenotype is as follows:
YY --> yellow
Yg --> yellow
gg --> green
Since the Yg heterozygote has the same phenotypic
value as the YY homozygote, we say (equivalently)
Y is dominant to g, or
g is recessive to Y
Explaining the crosses
F1 x F1 -> Yg x Yg
Prob(YY) = yellow(dad)*yellow(mom) = (1/2)*(1/2)
Prob(gg) = green(dad)*green(mom) = (1/2)*(1/2)
Prob(Yg) = 1-Pr(YY) - Pr(gg) = 1/2
Prob(Yg) = yellow(dad)*green(mom) + green(dad)*yellow(mom)
Hence, Prob(Yellow phenotype) = Pr(YY) + Pr(Yg) = 3/4
Prob(green phenotype) = Pr(gg) = 1/4
Dealing with two (or more) genes
For his 7 traits, Mendel observed Independent Assortment
The genotype at one locus is independent of the second
RR, Rr - round seeds, rr - wrinkled seeds
Pure round, green (RRgg) x pure wrinkled yellow (rrYY)
F1 --> RrYg = round, yellow
What about the F2?
Let R- denote RR and Rr. R- are round. Note in F2,
Pr(R-) = 1/2 + 1/4 = 3/4
Likewise, Y- are YY or Yg, and are yellow
Phenotype
Genotype
Frequency
Yellow, round
Y-R-
(3/4)*(3/4) = 9/16
Yellow, wrinkled
Y-rr
(3/4)*(1/4) = 3/16
Green, round
ggR-
(1/4)*(3/4) = 3/16
Green, wrinkled
ggrr
(1/4)*(1/4) = 1/16
Or a 9:3:3:1 ratio
Probabilities for more complex genotypes
Cross AaBBCcDD X aaBbCcDd
What is Pr(aaBBCCDD)?
Under independent assortment,
= Pr(aa)*Pr(BB)*Pr(CC)*Pr(DD)
= (1/2*1)*(1*1/2)*(1/2*1/2)*(1*1/2) = 1/25
What is Pr(AaBbCc)?
= Pr(Aa)*Pr(Bb)*Pr(Cc) = (1/2)*(1/2)*(1/2) = 1/8
Mendel was wrong: Linkage
Bateson and Punnet looked at
flower color: P (purple) dominant over p (red )
pollen shape: L (long) dominant over l (round)
Phenotype Genotype
Observed
Expected
Purple long
284
215
Purple round P-ll
21
71
Red long
ppL-
21
71
Red round
ppll
55
24
P-L-
Excess of PL, pl gametes over Pl, pL
Departure from independent assortment
Linkage
If genes are located on different chromosomes they
(with very few exceptions) show independent assortment.
Indeed, peas have only 7 chromosomes, so was Mendel lucky
in choosing seven traits at random that happen to all
be on different chromosomes? Problem: compute this probability.
However, genes on the same chromosome, especially if
they are close to each other, tend to be passed onto
their offspring in the same configuation as on the
parental chromosomes.
Consider the Bateson-Punnet pea data
Let PL / pl denote that in the parent, one chromosome
carries the P and L alleles (at the flower color and
pollen shape loci, respectively), while the other
chromosome carries the p and l alleles.
Unless there is a recombination event, one of the two
parental chromosome types (PL or pl) are passed onto
the offspring. These are called the parental gametes.
However, if a recombination event occurs, a PL/pl
parent can generate Pl and pL recombinant chromosomes
to pass onto its offspring.
Let c denote the recombination frequency --- the
probability that a randomly-chosen gamete from the
parent is of the recombinant type (i.e., it is not a
parental gamete).
For a PL/pl parent, the gamete frequencies are
Gamete type
Frequency
Expectation under
independent assortment
PL
(1-c)/2
1/4
pl
(1-c)/2
1/4
pL
c/2
1/4
Pl
c/2
1/4
Recombinant
Parental gametes
gametesininexcess,
deficiency,
as (1-c)/2
as c/2> <1/4
1/4for
forc c< <1/2
1/2
Expected genotype frequencies under linkage
Suppose we cross PL/pl X PL/pl parents
What are the expected frequencies in their offspring?
Pr(PPLL) = Pr(PL|father)*Pr(PL|mother)
= [(1-c)/2]*[(1-c)/2] = (1-c)2/4
Likewise, Pr(ppll) = (1-c)2/4
Recall from previous data that freq(ppll) = 55/381 =0.144
Hence, (1-c)2/4 = 0.144, or c = 0.24
A (slightly) more complicated case
Again, assume the parents are both PL/pl.
Compute Pr(PpLl)
Two situations, as PpLl could be PL/pl or Pl/pL
Pr(PL/pl) = Pr(PL|dad)*Pr(pl|mom) + Pr(PL|mom)*Pr(pl|dad)
= [(1-c)/2]*[(1-c)/2] + [(1-c)/2]*[(1-c)/2]
Pr(Pl/pL) = Pr(Pl|dad)*Pr(pL|mom) + Pr(Pl|mom)*Pr(pl|dad)
= (c/2)*(c/2) + (c/2)*(c/2)
Thus, Pr(PpLl) = (1-c)2/2 + c2 /2
Generally, to compute the expected genotype
probabilities, need to consider the frequencies
of gametes produced by both parents.
Suppose dad = Pl/pL, mom = PL/pl
Pr(PPLL) = Pr(PL|dad)*Pr(PL|mom)
= [c/2]*[(1-c)/2]
Notation: when PL/pl, we say that alleles P and L
are in coupling
When parent is Pl/pL, we say that P and L are in repulsion
Allele and Genotype Frequencies
Given genotype frequencies, we can always compute allele
frequencies, e.g.,
1X
pi = freq(A i ) = freq(A i A i ) +
freq(A i A j )
2 i 6= j
The converse is not true: given allele frequencies we
cannot uniquely determine the genotype frequencies
For n alleles, there are n(n+1)/2 genotypes
If we are willing to assume random mating,
freq(A i A j ) =
Ω p2
i
2pi pj
for i = j
for i 6
=j
Hardy-Weinberg
proportions
Hardy-Weinberg
• Prediction of genotype frequencies from allele freqs
• Allele frequencies remain unchanged over generations,
provided:
• Infinite population size (no genetic drift)
• No mutation
• No selection
• No migration
• Under HW conditions, a single generation of random
mating gives genotype frequencies in Hardy-Weinberg
proportions, and they remain forever in these proportions
Gametes and Gamete Frequencies
When we consider two (or more) loci, we follow gametes
Under random mating, gametes combine at random, e.g.
freq(AAB B ) = freq(AB jfat her) freq(AB jmot her)
freq(AaB B ) = freq(AB jfather) freq(aB jmother)
+ freq(aB jfather) freq(AB jmot her)
Major complication: Even under HW conditions, gamete
frequencies can change over time
AB
AB
ab
ab
AB
ab
AB
ab
In the F1, 50% AB gametes
50 % ab gametes
If A and B are unlinked, the F2 gamete frequencies are
AB 25%
ab 25%
Ab 25%
aB 25%
Thus, even under HW conditions, gamete frequencies change
Linkage disequilibrium
Random mating and recombination eventually changes
gamete frequencies so that they are in linkage equilibrium (LE).
Once in LE, gamete frequencies do not change (unless acted
on by other forces)
At LE, alleles in gametes are independent of each other:
When linkage disequilibrium (LD) present, alleles are no
longer independent --- knowing that one allele is in the
freq(AB
)
=
freq(A)
freq(B
)
freq(AB
C)
=
freq(A)
freq(B
)
freq(C)
gamete provides information on alleles at other loci
freq(AB ) 6
= freq(A) freq(B )
The disequilibrium between alleles A and B is given by
D A B = freq(AB ) ° freq(A) freq(B )
The Decay of Linkage Disequilibrium
The frequency of the AB gamete is given by
freq(AB ) = freq(A) freq(B ) + D A B
Departure from
If recombination frequency
between theLE
A and B loci
LE value
is c, the disequilibrium in generation t is
D (t ) = D (0)(1 ° c)
t
Note that D(t) ->Initial
zero, LD
although
value the approach can be
slow when c is very small
Quantitative Genetics
The analysis of traits whose
variation is determined by both
a number of genes and
environmental factors
Phenotype is highly uninformative as to
underlying genotype
Complex (or Quantitative) trait
• No (apparent) simple Mendelian basis for variation
in the trait
• May be a single gene strongly influenced by
environmental factors
• May be the result of a number of genes of equal
(or differing) effect
• Most likely, a combination of both multiple genes
and environmental factors.
• Example: Blood pressure, cholesterol levels
– Known genetic and environmental risk factors
Consider Phenotypic
a specific locus
influencing
trait
distribution
of athe
trait
For this locus, mean phenotype = 0.15, while
overall mean phenotype = 0
Goals of Quantitative Genetics
• Partition total trait variation into genetic (nature)
vs. environmental (nurture) components
• Predict resemblance between relatives
– If a sib has a disease/trait, what are your odds?
• Find the underlying loci contributing to genetic
variation
– QTL -- quantitative trait loci
• Deduce molecular basis for genetic trait variation
• Prediction of selection response
• Prediction of the effects of selfing & assortative
mating
Dichotomous (binary) traits
Presence/absence traits (such as a disease) can
(and usually do) have a complex genetic basis
Consider a disease susceptibility (DS) locus underlying a
disease, with alleles D and d, where allele D significantly
increases your disease risk
In particular, Pr(disease | DD) = 0.5, so that the
Penetrance of genotype DD is 50%
Suppose Pr(disease | Dd ) = 0.2, Pr(disease | dd) = 0.05
dd individuals can rarely display the disease, largely
because of exposure to adverse environmental conditions
dd individuals can give rise to phenocopies 5% of the time,
showing the disease but not as a result of carrying the
risk allele
If freq(d) = 0.9, what is Prob (DD | show disease) ?
freq(disease) = 0.12*0.5 + 2*0.1*0.9*0.2 + 0.92*0.05
= 0.0815
From Bayes’ theorem,
Pr(DD | disease) = Pr(disease |DD)*Pr(DD)/Prob(disease)
= 0.12*0.5 / 0.0815 = 0.06 (6 %)
Pr(Dd | disease) = 0.442, Pr(dd | disease) = 0.497
Thus about 50% of the diseased individuals are phenocopies