Transcript Lecture #12

Using DNA sequences
•
•
•
•
•
•
Obtain sequence
Align sequences, number of parsimony informative sites
Gap handling
Picking sequences (order)
Analyze sequences (similarity/parsimony/exhaustive/bayesian
Analyze output; CI, HI Bootstrap/decay indices
Good chromatogram!
Bad chromatogram…
Reverse reaction suffers same problems in opposite direction
Pull-up (too much signal)
Loss of fidelity leads to slips,
skips and mixed signals
Alignments (Se-Al)
Using DNA sequences
•
•
•
•
Testing alternative trees: kashino hasegawa
Molecular clock
Outgroup
Spatial correlation (Mantel)
• Networks and coalescence approaches
Using DNA sequences
• Bootstrap: the presence of a branch separating two groups of
microbial strains could be real or simply one of the possible
ways we could visualize microbial populations. Bootstrap tests
whether the branch is real. It does so by trying to see through
iterations if a similar branch can come out by chance for a given
dataset
• BS value over 65 ok over 80 good, under 60 bad
From Garbelotto and Chapela,
Evolution and biogeography of matsutakes
Biodiversity within species
as significant as between
species
Genetic analysis requires variation
at loci, variation of markers
(polymorphisms)
• How the variation is structured will tell us
– Does the microbe reproduce sexually or clonally
– Is infection primary or secondary
– Is contagion caused by local infectious spreaders or by a long-disance
moving spreaders
– How far can individuals move: how large are populations
– Is there inbreeding or are individuals freely outcrossing
CASE STUDY
• AAgrou
stand of
adjacent trees is infected by a disease:
How can we determine the way trees are infected?
CASE STUDY
• AAgrou
stand of
adjacent trees is infected by a disease:
How can we determine the way trees are infected?
BY ANALYSING THE GENOTYPE OF THE MICROBES: if the
genotype is the same then we have local secondary
tree-to-tree contagion. If all genotypes are different then primary
infection caused by airborne spores is the likely cause of
Contagion.
CASE STUDY
• AWE
grou
HAVE DETERMINED AIRBORNE SPORES (PRIMARY
INFECTION ) IS THE MOST COMMON FORM OF INFECTION
QUESTION: Are the infectious spores produced by a local
spreader, or is there a general airborne population of spores that
may come from far away ?
HOW CAN WE ANSWER THIS QUESTION?
If spores are produced by a local
spreader..
• Even if each tree is infected by different
genotypes (each representing the result of
meiosis like us here in this class)….these
genotypes will be related
• HOW CAN WE DETERMINE IF THEY ARE
RELATED?
HOW CAN WE DETERMINE IF
THEY ARE RELATED?
• By using random genetic markers we find out
the genetic similarity among these genotypes
infecting adjacent trees is high
• If all spores are generated by one individual
– They should have the same mitochondrial genome
– They should have one of two mating alleles
WE DETERMINE INFECTIOUS
SPORES ARE NOT RELATED
• QUESTION: HOW FAR ARE THEY COMING FROM?
….or……
• HOW LARGE IS A POPULATION?
Very important question: if we decide we want to wipe out
an infectious disease we need to wipe out at least the
areas corresponding to the population size, otherwise
we will achieve no result.
HOW TO DETERMINE WHETHER
DIFFERENT SITES BELONG TO
THE SAME POP OR NOT?
• Sample the sites and run the genetic markers
• If sites are very different:
– All individuals from each site will be in their own exclusive clade, if two
sites are in the same clade maybe those two populations actually are
linked (within reach)
– In AMOVA analysis, amount of genetic variance among populations will be
significant (if organism is sexual portion of variance among individuals will
also be significant)
– F statistics: Fst will be over ) 0.10 (suggesting sttong structuring)
– There will be isolation by distance
Levels of Analyses

Individual
•

identifying parents & offspring– very important in
zoological circles – identify patterns of mating between
individuals (polyandry, etc.)
In fungi, it is important to identify the "individual" -determining clonal individuals from unique individuals
that resulted from a single mating event.
Levels of Analyses cont…
• Families – looking at relatedness within colonies (ants,
bees, etc.)
• Population – level of variation within a population.
– Dispersal = indirectly estimate by calculating
migration
– Conservation & Management = looking for founder
effects (little allelic variation), bottlenecks (reduction
in population size leads to little allelic variation)
• Species – variation among species = what are the
relationship between species.
• Family, Order, ETC. = higher level phylogenies
What is Population Genetics?
 About microevolution (evolution of species)
 The study of the change of allele frequencies,
genotype frequencies, and phenotype
frequencies
Goals of population genetics
• Natural selection (adaptation)
• Chance (random events)
• Mutations
• Climatic changes (population expansions and contractions)
•…
To provide an explanatory framework to describe the evolution
of species, organisms, and their genome, due to:
Assumes that:
• the same evolutionary forces acting within species
(populations) should enable us to explain the differences we see
between species
• evolution leads to change in gene frequencies within populations
Pathogen Population Genetics
• must constantly adapt to changing environmental
conditions to survive
– High genetic diversity = easily adapted
– Low genetic diversity = difficult to adapt to changing
environmental conditions
– important for determining evolutionary potential of a
pathogen
• If we are to control a disease, must target a population
rather than individual
• Exhibit a diverse array of reproductive strategies that
impact population biology
Analytical Techniques
– Hardy-Weinberg Equilibrium
• p2 + 2pq + q2 = 1
• Departures from non-random mating
– F-Statistics
• measures of genetic differentiation in populations
– Genetic Distances – degree of similarity between OTUs
•
•
•
•
Nei’s
Reynolds
Jaccards
Cavalli-Sforza
– Tree Algorithms – visualization of similarity
• UPGMA
• Neighbor Joining
Allele Frequencies
• Allele frequencies (gene frequencies) =
proportion of all alleles in an all individuals in the
group in question which are a particular type
• Allele frequencies:
p + q = 1
• Expected genotype frequencies:
p2 + 2pq + q2
Evolutionary principles: Factors causing
changes in genotype frequency
• Selection = variation in fitness; heritable
• Mutation = change in DNA of genes
• Migration = movement of genes across populations
– Vectors = Pollen, Spores
• Recombination = exchange of gene segments
• Non-random Mating = mating between neighbors
rather than by chance
• Random Genetic Drift = if populations are small
enough, by chance, sampling will result in a different
allele frequency from one generation to the next.
The smaller the sample, the greater
the chance of deviation from an ideal
population.
Genetic drift at small population
sizes often occurs as a result of two
situations: the bottleneck effect or
the founder effect.
Founder Effects; typical of exotic
diseases
• Establishment of a population by a few individuals can
profoundly affect genetic variation
– Consequences of Founder effects
•
•
•
•
Fewer alleles
Fixed alleles
Modified allele frequencies compared to source pop
GREATER THAN EXPECTED DIFFERENCES AMONG
POPULATIONS BECAUSE POPULATIONS NOT IN EQUILIBRIUM
(IF A BLONDE FOUNDS TOWN A AND A BRUNETTE FOUND TOWN
B ANDF THERE IS NO MOVEMENT BETWEEN TOWNS, WE WILL
ISTANTANEOUSLY OBSERVE POPULATION DIFFERENTIATION)
Bottleneck Effect
• The bottleneck effect occurs when the numbers of
individuals in a larger population are drastically reduced
• By chance, some alleles may be overrepresented and
others underrepresented among the survivors
• Some alleles may be eliminated altogether
• Genetic drift will continue to impact the gene pool until
the population is large enough
Founder vs Bottleneck
Northern Elephant Seal:
Example of Bottleneck
Hunted down to 20 individuals in 1890’s
Population has recovered to over 30,000
No genetic diversity at 20 loci
Hardy Weinberg Equilibrium
and F-Stats
• In general, requires co-dominant marker
system
• Codominant = expression of heterozygote
phenotypes that differ from either
homozygote phenotype.
• AA, Aa, aa
Hardy-Weinberg Equilibrium
• Null Model = population is in HW Equilibrium
– Useful
– Often predicts genotype frequencies well
Hardy-Weinberg Theorem
if only random mating occurs, then allele frequencies
remain unchanged over time.
After one generation of random-mating, genotype frequencies
are given by
AA
Aa
aa
p2
2pq
q2
p = freq (A)
q = freq (a)
Expected Genotype Frequencies
• The possible range for an allele frequency or
genotype frequency therefore lies between ( 0 – 1)
• with 0 meaning complete absence of that allele or
genotype from the population (no individual in the
population carries that allele or genotype)
• 1 means complete fixation of the allele or genotype
(fixation means that every individual in the population
is homozygous for the allele -- i.e., has the same
genotype at that locus).
ASSUMPTIONS
1) diploid organism
2) sexual reproduction
3) Discrete generations (no overlap)
4) mating occurs at random
5) large population size (infinite)
6) No migration (closed population)
7) Mutations can be ignored
8) No selection on alleles
IMPORTANCE OF HW THEOREM
If the only force acting on the population is random
mating, allele frequencies remain unchanged and
genotypic frequencies are constant.
Mendelian genetics implies that genetic variability can
persist indefinitely, unless other evolutionary forces act to
remove it
Departures from HW Equilibrium
• Check Gene Diversity = Heterozygosity
– If high gene diversity = different genetic sources due
to high levels of migration
• Inbreeding - mating system “leaky” or breaks
down allowing mating between siblings
• Asexual reproduction = check for clones
– Risk of over emphasizing particular individuals
• Restricted dispersal = local differentiation leads
to non-random mating
Pop 3
Pop 4
FST = 0.30
Pop 2
Pop 1
FST = 0.02
Pop1
Pop2
Pop3
Sample
size
AA
20
20
20
10
5
0
Aa
4
10
8
aa
6
5
12
Pop1
Pop2
Pop3
Freq
p
(20 + 1/2*8)/40 = (10+1/2*20)/40 = (0+1/2*16)/40 =
0.60
.50
0.20
q
(12 + 1/2*8)/40 = (10+1/2*20)/40 = (24+1/2*16)/40 =
0.40
.50
0.80
Local Inbreeding Coefficient
• Calculate HOBS
– Pop1: 4/20 = 0.20
– Pop2: 10/20 = 0.50
– Pop3: 8/20 = 0.40
• Calculate HEXP (2pq)
– Pop1: 2*0.60*0.40 = 0.48
– Pop2: 2*0.50*0.50 = 0.50
– Pop3: 2*0.20*0.80 = 0.32
• Calculate F = (HEXP – HOBS)/ HEXP
• Pop1 = (0.48 – 0.20)/(0.48) = 0.583
• Pop2 = (0.50 – 0.50)/(0.50) = 0.000
• Pop3 = (0.32 – 0.40)/(0.32) = -0.250
F Stats
Proportions of Variance
• FIS = (HS – HI)/(HS)
• FST = (HT – HS)/(HT)
• FIT = (HT – HI)/(HT)
Pop
Hs
HI
p
q
1
0.48
0.20
0.60
0.40
2
0.50
0.50
0.50
0.50
3
0.32
0.40
0.20
0.80
Mean 0.43
0.37
0.43
0.57
HT
FIS
FST
0.49
-0.14 0.12
FIT
0.24
Important point
• Fst values are significant or not depending on
the organism you are studying or reading about:
– Fst =0.10 would be outrageous for humans, for
fungi means modest substructuring
Microsatellites or SSRs
• AGTTTCATGCGTAGGT CG CG CG CG CG
AAAATTTTAGGTAAATTT
• Number of CG is variable
• Design primers on FLANKING region, amplify DNA
• Electrophoresis on gel, or capillary
• Size the allele (different by one or more repeats; if number does
not match there may be polimorphisms in flanking region)
• Stepwise mutational process (2 to 3 to 4 to 3 to2 repeats)
Host islands within the California
Northern Channel
Islands create fine-scale genetic
structure in two sympatric
species of the symbiotic
ectomycorrhizal fungus
Rhizopogon
Rhizopogon occidentalis
Rhizopogon vulgaris
Rhizopogon sampling & study area
• Santa Rosa, Santa Cruz
– R. occidentalis
– R. vulgaris
• Overlapping ranges
– Sympatric
– Independent evolutionary
histories
Sampling
Bioassay – Mycorrhizal pine roots
Local Scale Population Structure
Rhizopogon occidentalis
FST = 0.26
N
5 km
T
B
Populations are similar
Grubisha LC, Bergemann SE, Bruns TD
Molecular Ecology in press.
FST = 0.24
FST
E
W
8-19 km
FST = 0.33
= 0.17
Populations are different
Local Scale Population Structure
Rhizopogon vulgaris
FST = 0.21
N
FST = 0.20
E
W
FST = 0.25
Populations are different
Grubisha LC, Bergemann SE, Bruns TD
Molecular Ecology in press
B.
Locus
Rvu24.9
Rvu20.80
Allele
234
237
240
Santa Cruz Island (SCI)
SCI East
SCI No rth
SCI West
0.267
0.458
0.576
0.467
0.479
0.424
0.267
0.063
144
153
156
159
162
165
168
0.033
0.383
0.133
0.400
195
198
201
204
207
210
0.050
Rvu20.46
Rvu21.83
Rvu19.80
Rvu21.13
0.033
0.017
0.156
0.323
0.281
0.104
0.135
0.033
0.076
0.065
0.739
0.087
Santa Rosa
Island (SRI)
SRI
1.000
0.833
0.167
0.100
0.017
0.817
0.017
0.167
0.042
0.125
0.010
0.615
0.042
0.054
0.033
0.663
0.228
0.022
1.000
144
147
0.017
0.983
0.042
0.958
0.478
0.522
0.417
0.583
291
294
297
300
303
306
309
0.433
0.300
0.050
0.200
0.017
0.021
0.646
0.125
0.010
0.115
0.073
0.010
0.587
0.043
0.370
1.000
261
264
0.983
0.017
0.865
0.135
0.989
0.01 1
1.000
How do we know that we are
sampling a population?
• We actually do not know
• Mostly we tend to identify samples from a
discrete location as a population, obviously
that’s tautological
• Assignment tests will use the data to define
population, that is what Grubisha et al. did
using the program STRUCTURE
Four phases of INVASION
• TRANSPORT
• SURVIVAL AND ESTABLISHMENT (LAG
PHASE)
• INVASION
• POST-INVASION
TRANSPORT
• Biology will determine how
• Normally very few organisms will make it
• Use phylogeographic approach to determine origin (
Armillaria, Heterobasidion)
• Use population genetic approach (Cryphonectria,
Certocystis fimbriata)
TRANSPORT-2
• Need to sample source pop or a pop that is close
enough
• Need markers that are polymorphic and will
differentiate genotypes haplotypes
• Need analysis that will discriminate amongst
individuals and identify relationships ( similarity
clusterying, parsimony, Fst & N, coalescent)
ESTABLISHMENT
• LAG PHASE; normally effects not noticed because
mortality are masked by background normal mortality
• By the time the introduction is discovered, normally too
late to eradicate
• Short lag phase= aggressive pathogen
• Long lag phase= less aggressive pathogen
ESTABLISHMENT
• NORMALLY REDUCED GENETIC VARIABILITY
INVASION
• Because of lack of equilibrium, high Fst values, I.e. strong
genetic structuring among populations
• Normally dominance of a few genotypes
• Spatial autocorrelation analyses to tell us exten of spread
INVASION-2
• Later phase: genetic differentiation
• Higher genetic difference in areas of older establishment