Transcript ppt
Lecture 2: Foundations of
Genetic Variation
January 10, 2014
Last Time
u Class introduction
u Basic probability theory:
Sample space
Counting rules
Permutations
Combinations
Mathematical Tools for Population Genetics
Basic algebra
1
1
fe
4Ne 1 1
Basic calculus
Basic statistics
Probability
m
P Pk
k 1
PIDsibk
He
1
1
1
4
2
2 2
(1 pi ) [ pi ( pi ) ]
4
2 i
i
i
Population Genetics and Probability
Probability is at the core of much of population genetics
Reproduction is a sampling process
Effects of mutation, gene flow, selection,and nonrandom
mating must be seen as departures from expectations based on
random processes
Example: 1 genetic locus and two alleles in a forest of
20 trees determines color of foliage. Green is dominant.
What proportion of offspring will have white foliage?
: 4 copies
: 36 copies
Overview
Review of genetic variation and
Mendelian Genetics
Methods for detecting variation
Applications of probability
What is Genetic Variation?
Chromosome: structural
unit of genetic material,
containing DNA and
protein
Homologous: genetic
material that pairs
during meiosis in diploid
cells
Diploid: two sets of
homologous chromosomes
(one from each parent)
Haploid: one set of
chromosomes (the
Genome)
Locus: position on a
chromosome
Allele: different forms
of the same locus
Organelle Genomes
Mitochondria (most
Eukaryotes) and
chloroplasts (most
plants) are ancient
endosymbionts
Maintain their own
genomes, but with
greatly reduced
numbers of genes:
dependent on imports
from nucleus
Mostly maternally
inherited and haploid:
no recombination
Phenotypes versus Genotypes
Phenotype: Any observable
characteristic of an
organism
External morphology: height,
weight, color
Physiology: Metabolic rate,
photosynthetic rate, salt
sensitivity
Biochemical: Enzymatic rates,
chemical composition
Genotype: The hereditary
or genetic constitution of
an individual
http://en.wikipedia.org/wiki/
Why can’t you directly infer the
genotype from the phenotype?
Why can’t you directly infer the
phenotype from the genotype?
Genetics vs Environment
Many advances made in evolutionary theory based on
morphology
Problem was variation could be exaggerated
Only variable 'loci' scored
Phenotype vs Genotype
Var(phenotype) = Var(genotype) + Var(environment)
Heritability: Var(genotype) / Var(phenotype)
Phenotypic plasticity: organisms with the same
genotype have different phenotypes under different
conditions
Solution: control environmental variance by raising
organisms in common environment
Lamarck: inheritance of
acquired characteristics
http://en.wikipedia.org/
Early Models of Inheritance
1744-1829
Developed first fully
coherent evolutionary
theory
A “complexifying force”
drives organisms to higher
levels of complexity
http://morriscourse.com
Use and disuse of organs
affects their development
and inheritance
Early Models of Inheritance
http://en.wikipedia.org/
Blending Inheritance
Offspring have
phenotypes that are
intermediate between
that of their parents
Originally explored by
Francis Galton and
favored by the
“biometricians” such as
Pearson and Weldon
Origin of modern
statistics
1857-1936
Hamilton 2009
Darwin’s Theory: Pangenesis
Explains variation among
individuals, gradual
evolutionary change in
response to selection
Hereditary material
consists of “gemmules”
distributed throughout
body that accumulate in
reproductive organs
Elements of Lamarckian
inheritance
http://en.wikipedia.org/
Early Models of Inheritance
1809-1882
Early Models of Inheritance
Darwin’s cousin Galton
performed experiments
to disprove pangenesis
“Sports” or mutations
with large effects
were considered key
drivers of evolution by
Francis Galton, William
Bateson and others
http://en.wikipedia.org/
Discontinuous Variation
1822-1911
Mendel and Particulate Inheritance
Gregor Mendel conducted a large
number of experiments with peas
and other plants in the
Augustinian Abbey of St Thomas
in Brno between 1857 and 1863
Studied over 29,000 pea plants
to determine how traits were
inherited
Why peas? Self-fertile, little or
no outcrossing
Bred pure lines and then
intercrossed them and followed
advanced generations
http://www.schoolnotes.com/32233/tss8.htm
schoolnotes.com
Mendel’s Observations: F1 and F2
Pure bred lines will
produce only one
phenotype at F1 when
intercrossed
F2 generation has a
3:1 ratio of
dominant:recessive
phenotypes
Hamilton 2009
Two Types of F2s
When F2’s are selfed,
some breed true and
some of the
dominant phenotype
produce 3:1 ratios of
offspring phenotypes
Mendel’s “Law” of Independent Segregation
Based on analyzing
simply inherited
traits
During gamete
formation, two
members of a gene
pair (alleles)
segregate
separately so that
half of the gametes
carry one allele and
half carry the
other
Mendel’s “Law” of Independent Assortment
Based on analyzing
ratios of two traits
segregating
simultaneously
During gamete
formation, the
segregation of
alleles of one gene
is independent of
the segregation of
alleles of another
gene
Mendel’s “Laws” of Independent Segregation and
Assortment
Phenotype Ratio:
(3:1) x (3:1) =
9:3:3:1
Genotype Ratio:
(1:2:1) x (1:2:1) =
1:2:1:2:4:2:1:2:1
AABB:AABb:AAbb:AaBB:AaBb:Aabb:aaBB:aaBb:aabb
Morphological Markers
Traditionally used to measure
genetic variation
Mendel’s Laws derived from
simply-inherited morphological
markers in peas: genotype
directly inferred from phenotype
Genetic maps originally
constructed from such
characteristics (e.g., corn
genetic map at right)
Isozymes and Allozymes
Mutations can cause differences
in basic and acidic amino acid
composition, but no change in
enzyme function
Small changes in primary structure
can alter secondary and
quaternary structure
Isozymes: different forms of an
enzyme
Allozymes: Allelic isozymes:
different forms of an enzyme
that are coded at the same locus
Lactate Dehydrogenase
Dym et al 2000: PNAS 97:9413–9418
Detection
Separate through electrophoresis in starch
gels
Isozymes dected based on enzyme action
Stain contains substrate for enzyme,
cofactors, and oxidized salt (dye)
Resulting pattern is zymogram
Often a direct link between phenotype
(spots on gel) and genotype (genes encoding
the enzyme)
Hillis, D.M., C. Moritz and B. K. Mable. 1996.
Molecular Systematics, 2nd ed. Sinauer Assoc.
Inc., Sunderland, Mass
Allozymes revolutionized population genetics
Richard Lewontin
Landmark 1966 papers by Lewontin and
Hubby
Simple and unbiased way of detecting
genetic variation
Explosion of studies of genetic variation in
natural populations
Levels of diversity in natural populations
MUCH higher than predicted by prevailing
theory at the time
Role of selection not most important factor
determining genetic diversity: Neutral
Theory
http://www.patentdocs.us/patent_docs
/2007/05/the_as_yet_unfu.html
PCR and the Molecular Revolution
PCR: Polymerase Chain Reaction
Invented by Kary Mullis in 1983
Exponential amplification of a
specific sequence of DNA
Most important molecular marker
techniques involve PCR
Components: primers,
nucleotides, template,
thermostable polymerase
http://www.dnalc.org/ddnalc/resources/pcr.html
Molecular Markers
Molecular markers provide closer link
between phenotype and genotype
“Anonymous” molecular markers: RFLP,
RAPD, AFLP and GBS: no knowledge of
underlying sequence polymorphism or
location in genome
“Sequence-Tagged” markers like
microsatellites or SNPs derived from
defined locations in genome
Often reveal higher levels of
polymorphism than allozymes and
morphological markers
Allow studies of neutral variation in
natural populations
Anonymous and Sequence-Tagged Markers
Anonymous markers
often have short
“primer” sequences
(e.g., 10 bp primer
sequences in RAPD)
Randomly amplify
portions of genome
TCAAGTCTCA
AGTTCAGAGT agctggactacctctacgtcagcTGAGACTTGA
ACTCTGAACT
Sequence-Tagged
markers have longer
primers (e.g., 20 bp
for microsatellite
primers)
ATGCTGAGGTCGCTTAGCAGctctctctctctctctctctcctctctctctctctGGATCCTGAATGCTGACTG
ATGCTGAGGTCGCTTAGCAGctctctctctctctGGATCCTGAATGCTGACTG
DNA Sequencing
Direct determination of
sequence of bases at a
location in the genome
Shotgun versus PCR
sequencing
Dye terminators (Sanger)
and capillaries revolutionized
DNA sequencing
Modern sequencing methods
(sequencing by synthesis,
pyrosequencing) have
catapulted sequencing into
realm of population genetics
Human genome took 10 years
to sequence originally, and
hundreds of millions of
dollars
Now we can do it in a week
for <$2,000
SNPs
A Single Nucleotide Polymorphism
(SNP) is a single base mutation in
DNA.
The most common source of genetic
polymorphism (e.g., 90% of all
human DNA polymorphisms).
Identify SNP by screening a
sample of individuals from study
population: usually 16 to 48
Once identified, SNP are
assayed in populations using
high-throughput methods
Genotyping by Sequencing
New sequencing methods generate 10’s of millions of short sequences
per run
Combine restriction digests with sequencing and pooling to genotype
thousands of markers covering genome at very high density
Presence-Absence
Polymorphism
SNP
Generate 10’s of thousands of markers
for <$100 per sample
http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf
Genotyping by Sequencing Cost Example
http://www.maizegenetics.net/gbs-overview
If nucleotides occur randomly in a genome,
which sequence should occur more
frequently?
AGTTCAGAGT
AGTTCAGAGTAACTGATGCT
What is the expected probability of each
sequence to occur once?
How many times would each sequence be
expected to occur by chance in a 100 Mb
genome?
What is the expected probability of each
sequence to occur once?
AGTTCAGAGT
What is the sample space for the first position?
A
T
Probability of “A” at that position? 1
G
4
C
Probability of “A” at position 1, “G” at position 2, “T”
at position 3, etc.?
1 1 1 1 1 1 1 1 1 1
x x x x x x x x x 0.2510 9.54 x10 7
4 4 4 4 4 4 4 4 4 4
AGTTCAGAGTAACTGATGCT
0.2520 9.09x1013
How many times would each sequence be
expected to occur in a 100 Mb genome?
AGTTCAGAGT
9.54x10 10 95.4
7
8
AGTTCAGAGTAACTGATGCT
9.09x10 10 9.1x10
13
8
5
Why is this calculation wrong?