Transcript ppt

Lecture 2: Foundations of
Genetic Variation
January 10, 2014
Last Time
u Class introduction
u Basic probability theory:
Sample space
Counting rules
Permutations
Combinations
Mathematical Tools for Population Genetics
Basic algebra
1
1
fe 

4Ne   1   1
Basic calculus
Basic statistics
Probability
m
P   Pk
k 1
PIDsibk
He 

 1
1
1
4
2
2 2
 (1   pi )  [ pi   ( pi ) ]
4
2 i
i
i
Population Genetics and Probability
 Probability is at the core of much of population genetics
 Reproduction is a sampling process
 Effects of mutation, gene flow, selection,and nonrandom
mating must be seen as departures from expectations based on
random processes
 Example: 1 genetic locus and two alleles in a forest of
20 trees determines color of foliage. Green is dominant.
 What proportion of offspring will have white foliage?
: 4 copies
: 36 copies
Overview
Review of genetic variation and
Mendelian Genetics
Methods for detecting variation
Applications of probability
What is Genetic Variation?
 Chromosome: structural
unit of genetic material,
containing DNA and
protein
 Homologous: genetic
material that pairs
during meiosis in diploid
cells
 Diploid: two sets of
homologous chromosomes
(one from each parent)
 Haploid: one set of
chromosomes (the
Genome)
 Locus: position on a
chromosome
 Allele: different forms
of the same locus
Organelle Genomes
 Mitochondria (most
Eukaryotes) and
chloroplasts (most
plants) are ancient
endosymbionts
 Maintain their own
genomes, but with
greatly reduced
numbers of genes:
dependent on imports
from nucleus
 Mostly maternally
inherited and haploid:
no recombination
Phenotypes versus Genotypes
Phenotype: Any observable
characteristic of an
organism
 External morphology: height,
weight, color
 Physiology: Metabolic rate,
photosynthetic rate, salt
sensitivity
 Biochemical: Enzymatic rates,
chemical composition
Genotype: The hereditary
or genetic constitution of
an individual
http://en.wikipedia.org/wiki/
Why can’t you directly infer the
genotype from the phenotype?
Why can’t you directly infer the
phenotype from the genotype?
Genetics vs Environment
 Many advances made in evolutionary theory based on
morphology
 Problem was variation could be exaggerated
 Only variable 'loci' scored
 Phenotype vs Genotype
Var(phenotype) = Var(genotype) + Var(environment)
Heritability: Var(genotype) / Var(phenotype)
 Phenotypic plasticity: organisms with the same
genotype have different phenotypes under different
conditions
 Solution: control environmental variance by raising
organisms in common environment
Lamarck: inheritance of
acquired characteristics
http://en.wikipedia.org/
Early Models of Inheritance
1744-1829
 Developed first fully
coherent evolutionary
theory
 A “complexifying force”
drives organisms to higher
levels of complexity
http://morriscourse.com
 Use and disuse of organs
affects their development
and inheritance
Early Models of Inheritance
http://en.wikipedia.org/
Blending Inheritance
Offspring have
phenotypes that are
intermediate between
that of their parents
Originally explored by
Francis Galton and
favored by the
“biometricians” such as
Pearson and Weldon
Origin of modern
statistics
1857-1936
Hamilton 2009
Darwin’s Theory: Pangenesis
Explains variation among
individuals, gradual
evolutionary change in
response to selection
Hereditary material
consists of “gemmules”
distributed throughout
body that accumulate in
reproductive organs
Elements of Lamarckian
inheritance
http://en.wikipedia.org/
Early Models of Inheritance
1809-1882
Early Models of Inheritance
 Darwin’s cousin Galton
performed experiments
to disprove pangenesis
 “Sports” or mutations
with large effects
were considered key
drivers of evolution by
Francis Galton, William
Bateson and others
http://en.wikipedia.org/
Discontinuous Variation
1822-1911
Mendel and Particulate Inheritance
 Gregor Mendel conducted a large
number of experiments with peas
and other plants in the
Augustinian Abbey of St Thomas
in Brno between 1857 and 1863
 Studied over 29,000 pea plants
to determine how traits were
inherited
 Why peas? Self-fertile, little or
no outcrossing
 Bred pure lines and then
intercrossed them and followed
advanced generations
http://www.schoolnotes.com/32233/tss8.htm
schoolnotes.com
Mendel’s Observations: F1 and F2
Pure bred lines will
produce only one
phenotype at F1 when
intercrossed
F2 generation has a
3:1 ratio of
dominant:recessive
phenotypes
Hamilton 2009
Two Types of F2s
When F2’s are selfed,
some breed true and
some of the
dominant phenotype
produce 3:1 ratios of
offspring phenotypes
Mendel’s “Law” of Independent Segregation
Based on analyzing
simply inherited
traits
During gamete
formation, two
members of a gene
pair (alleles)
segregate
separately so that
half of the gametes
carry one allele and
half carry the
other
Mendel’s “Law” of Independent Assortment
Based on analyzing
ratios of two traits
segregating
simultaneously
During gamete
formation, the
segregation of
alleles of one gene
is independent of
the segregation of
alleles of another
gene
Mendel’s “Laws” of Independent Segregation and
Assortment
Phenotype Ratio:
(3:1) x (3:1) =
9:3:3:1
Genotype Ratio:
(1:2:1) x (1:2:1) =
1:2:1:2:4:2:1:2:1
AABB:AABb:AAbb:AaBB:AaBb:Aabb:aaBB:aaBb:aabb
Morphological Markers
Traditionally used to measure
genetic variation
Mendel’s Laws derived from
simply-inherited morphological
markers in peas: genotype
directly inferred from phenotype
Genetic maps originally
constructed from such
characteristics (e.g., corn
genetic map at right)
Isozymes and Allozymes
 Mutations can cause differences
in basic and acidic amino acid
composition, but no change in
enzyme function
 Small changes in primary structure
can alter secondary and
quaternary structure
 Isozymes: different forms of an
enzyme
 Allozymes: Allelic isozymes:
different forms of an enzyme
that are coded at the same locus
Lactate Dehydrogenase
Dym et al 2000: PNAS 97:9413–9418
Detection
 Separate through electrophoresis in starch
gels
 Isozymes dected based on enzyme action
 Stain contains substrate for enzyme,
cofactors, and oxidized salt (dye)
 Resulting pattern is zymogram
 Often a direct link between phenotype
(spots on gel) and genotype (genes encoding
the enzyme)
Hillis, D.M., C. Moritz and B. K. Mable. 1996.
Molecular Systematics, 2nd ed. Sinauer Assoc.
Inc., Sunderland, Mass
Allozymes revolutionized population genetics
Richard Lewontin
 Landmark 1966 papers by Lewontin and
Hubby
 Simple and unbiased way of detecting
genetic variation
 Explosion of studies of genetic variation in
natural populations
 Levels of diversity in natural populations
MUCH higher than predicted by prevailing
theory at the time
 Role of selection not most important factor
determining genetic diversity: Neutral
Theory
http://www.patentdocs.us/patent_docs
/2007/05/the_as_yet_unfu.html
PCR and the Molecular Revolution
PCR: Polymerase Chain Reaction
Invented by Kary Mullis in 1983
Exponential amplification of a
specific sequence of DNA
Most important molecular marker
techniques involve PCR
Components: primers,
nucleotides, template,
thermostable polymerase
 http://www.dnalc.org/ddnalc/resources/pcr.html
Molecular Markers
 Molecular markers provide closer link
between phenotype and genotype
 “Anonymous” molecular markers: RFLP,
RAPD, AFLP and GBS: no knowledge of
underlying sequence polymorphism or
location in genome
 “Sequence-Tagged” markers like
microsatellites or SNPs derived from
defined locations in genome
 Often reveal higher levels of
polymorphism than allozymes and
morphological markers
 Allow studies of neutral variation in
natural populations
Anonymous and Sequence-Tagged Markers
 Anonymous markers
often have short
“primer” sequences
(e.g., 10 bp primer
sequences in RAPD)
 Randomly amplify
portions of genome
TCAAGTCTCA
AGTTCAGAGT agctggactacctctacgtcagcTGAGACTTGA
ACTCTGAACT
 Sequence-Tagged
markers have longer
primers (e.g., 20 bp
for microsatellite
primers)
ATGCTGAGGTCGCTTAGCAGctctctctctctctctctctcctctctctctctctGGATCCTGAATGCTGACTG
ATGCTGAGGTCGCTTAGCAGctctctctctctctGGATCCTGAATGCTGACTG
DNA Sequencing
 Direct determination of
sequence of bases at a
location in the genome
 Shotgun versus PCR
sequencing
 Dye terminators (Sanger)
and capillaries revolutionized
DNA sequencing
 Modern sequencing methods
(sequencing by synthesis,
pyrosequencing) have
catapulted sequencing into
realm of population genetics
 Human genome took 10 years
to sequence originally, and
hundreds of millions of
dollars
 Now we can do it in a week
for <$2,000
SNPs
 A Single Nucleotide Polymorphism
(SNP) is a single base mutation in
DNA.
 The most common source of genetic
polymorphism (e.g., 90% of all
human DNA polymorphisms).
 Identify SNP by screening a
sample of individuals from study
population: usually 16 to 48
 Once identified, SNP are
assayed in populations using
high-throughput methods
Genotyping by Sequencing
 New sequencing methods generate 10’s of millions of short sequences
per run
 Combine restriction digests with sequencing and pooling to genotype
thousands of markers covering genome at very high density
Presence-Absence
Polymorphism
SNP
Generate 10’s of thousands of markers
for <$100 per sample
http://www.maizegenetics.net/images/stories/GBS_CSSA_101102sem.pdf
Genotyping by Sequencing Cost Example
http://www.maizegenetics.net/gbs-overview
If nucleotides occur randomly in a genome,
which sequence should occur more
frequently?
AGTTCAGAGT
AGTTCAGAGTAACTGATGCT
What is the expected probability of each
sequence to occur once?
How many times would each sequence be
expected to occur by chance in a 100 Mb
genome?
What is the expected probability of each
sequence to occur once?
AGTTCAGAGT
What is the sample space for the first position?
A
T
Probability of “A” at that position? 1
G
4
C
Probability of “A” at position 1, “G” at position 2, “T”
at position 3, etc.?
1 1 1 1 1 1 1 1 1 1
x x x x x x x x x  0.2510  9.54 x10 7
4 4 4 4 4 4 4 4 4 4
AGTTCAGAGTAACTGATGCT
0.2520  9.09x1013
How many times would each sequence be
expected to occur in a 100 Mb genome?
AGTTCAGAGT
9.54x10 10   95.4
7
8
AGTTCAGAGTAACTGATGCT
9.09x10 10   9.1x10
13
8
5
Why is this calculation wrong?