Human Population Structure
Download
Report
Transcript Human Population Structure
Human Evolution
Chelsea Bishop, Emily Bonnell, Leanne Dawe, Stephanie Mayne, Emily Porter
Outline
● Homo sapiens Background
● Human Dispersal: Out of Africa
● Human Population Structure
● Recombination and Linkage Disequilibrium
● Demographic Process
● Purifying Selection
● Adaption
● Why Do We Get Sick?
Homo sapiens
● Humans are primates, behaviorally, morphologically and genetically
● The only surviving members of the Homo genus
● There is little justification to have a genus to ourselves: very little
nucleotide divergence
Hominidae
● This family consists of our closest relatives, the
chimpanzee (Pan troglodytes), Bonobo (Pan paniscus),
Gorilla, Orangutan (Pongo), and Humans
● This family falls within the parvorder Catarrhini that
arose 65-85 million years ago
● Humans and Chimpanzees split (7mya) and 1% of the
time vertebrates were on earth and 2.5% of the time
mammals were on earth
● H. erectus
explored
Eurasia
● At 0.5 mya, the
lineage split to
Neanderthals
and Denisovans
(these inbred
after H. sapiens
migrated out of
Africa 50,000
ya)
Similarity between species
● Humans were not derived from chimpanzees: we share an ancestor
● 35 million sequence differences between humans and chimpanzees
● 3% of the two genomes don’t align: insertions/deletions
● This is 10x the number of single nucleotide polymorphisms in any one person
Similarity between species
● Implies the fixation of 5 new mutations in either lineage per year
● ~80% of common ancestor genes in humans were lost: half are olfactory, and
others are hair keratin and muscle myosin used for chewing
● A famous 2 amino acid sequence change in humans is FOXP2 for speech
● Human intelligence is so complex, it is not possible to create an artificial
selection scheme to derive its occurrence
Human Dispersal: Out of Africa
Out of Africa
● Archeological records of 2 basal hominid genera in Africa : Ardipithecus
and Australopithecus
● Both had small brains (like chimpanzees), but showed bipedalism and
adaptation to their environment (arborea and savanna)
Genus:Homo
● 2.5mya H. habilis and H. erectus lived side by side for 1my
⚫ They has 75% of the brain volume of modern humans, an upright posture, and
stone tools
⚫ These species gave rise to H. ergaster: A taller, more hairless species
● H. ergaster migrated to Eastern and western Europe over 1.5mya and to Asia,
0.7mya
⚫ However, geneticists don’t believe these are the direct ancestors of H. sapiens
Southern Africa
● More evidence suggests that H. sapiens emerged in South Africa and
disbursed more recently
● Population genetic analysis show that all humans mtDNA derived from
“Mitochondrial Eve”
⚫
Mitochondrial Eve lived ~ 120,00ya in a population of 10,000
● Autosomal and Y chromosome markers infer that our species is 10,000
generations old
Proto-Humans
● Cause complications in determining human evolution
● Lineages are more modern than H. erectus but genetically outside the range of
H. sapiens
● Bones from caves in Europe and Asia show that H. neanderthalensis and H.
sapiens cohabited for thousands of years
● H. neanderthalensis went extinct suddenly 33,000ya
Proto-Humans
● DNA from the Altai mountains of Siberia show a Denisovan different from
H. sapiens and H. neanderthalensis, while another from the same cave
resembled European H. neanderthalensis
⚫
It is highly likely that these species interbred
● 2% of an average Caucasian human DNA is Neanderthal
● Collectively, 25% of the Neanderthal genome is present in the Human
population
The Origins of Homo Sapiens
● Evidence showing H. sapiens originated in Africa
⚫
Patterns of DNA sequence diversity
⚫
Greatest in sub-Saharan Africa
The Origins of Homo Sapiens
● Population of new regions
◦ Waves of migration
⚫ Result?
⚫ smaller amounts of genetic variation due to less time to
accumulate new mutations
⚫ Population with greatest variation is oldest
Evidence of Genetic Variation
● Sequenced genomes of two people from adjacent villages (Khoi-San
bushmen)
⚫
Appeared to be as different from one another as two non-Africans!
● Genotyping of Hunter-Gatherer Populations
⚫
Confirmed southern Africa as origin
Figure 3.2 The Great Human Migrations
What does this Show?
• Human dispersal occurred in
multiple waves
• Homo erectus migrated out of
Africa
•
Replaced by Neanderthals
and Denisovans
• Homo Sapiens were new
species in southern Africa
•
Some interbreeding with
Neanderthals
What does this Show?
● Homo Sapiens underwent two major migrations:
◦ 1) 50 Kya established the ancestors of Australian Aborigines
◦ 2) 25-38 Kya colonized Europe
What About Gene Flow in Agriculture?
Luca Cavalli-Sforza suggests:
◦ “In Europe, the major component of variation
spreads from the Urals in an east-to-west gradient,
while a second component follows more of a southto-north gradient.”
What About Gene Flow in Agriculture?
● This pattern agrees with the spread of genes
involved in the spread of agriculture
⚫
Confirmed by determining genome of 5000
year-old remains of 3 hunter-gathers and 1
farmer from Sweden
What About Gene Flow in Agriculture?
● Results?
◦ Farmer’s genome
⚫
far closer to that of southern Europeans
◦ Hunter-gatherers
⚫
closely resemble northern Europeans
What About Gene Flow in Agriculture?
● Meaning?
◦ Hunter-gatherers and farmers lived side-by-side for thousands of years
⚫
LITTLE gene flow between the two
Human Population Structure (HPS)
● The composition of a population
⚫ Common to see strong patterns in human genome
⚫
Allow population geneticists to determine ancestry based on genes
● Approaches for Quantifying Population Structure:
⚫ 1) Structure software
⚫ 2) Statistical Principal Component Analysis (PCA)
Approaches for Quantifying Population
Structure
● 1) Structure Software
⚫ Assigns proportion of genome (of each individual) to one or more of
K population using a probabilistic manner
● Produces graphical picture of global structure in data set
⚫ When K=6, each major continental populations are shown
⚫
Africa, Europe, the Middle East, Central and East Asia, Oceania, and
America
Approaches for Quantifying Population Structure
Approaches for Quantifying Population
Structure
● 2) Statistical Principal Component Analysis (PCA)
● Quantification of population structure without assigning individual to
belong to one or another group
⚫
Thus, quantify which polymorphisms tend to be found in the same
individuals
● Yield series of “eigenvectors” that capture major components of
covariance
⚫
Accomplished by reduced matrix of hundreds of thousands of SNPs
Approaches for Quantifying Population Structure
Approaches for Quantifying Population
Structure
● 2) PCA continued…
● Principal Component Analysis
◦ Identified till all variance is
captured
⚫
Every SNP contributes to each
component
Approaches for Quantifying Population
Structure
● 2) PCA continued…
◦ Meaning?
⚫
% associated with each PC
⚫ how much variance in
genotype frequencies is
captured by that particular
component
⚫
Supports that most human
variation is shared among groups
and HPS experiences shifts in
frequencies of some alleles
Agreeability Among Approaches
● Agree for the most part
◦ At K=2
⚫ Two tend to have + or – values of PC1
◦ At K=3
⚫ Third group tends to separate along PC2, and so forth
⚫
Less commonality as more components
Agreeability Among Approaches
● Minor components
◦ Correspond to either:
⚫
close relatives (share ½ or ¼ their genotype) or
⚫
small blocks of one chromosome that have different frequency
● Program available:
⚫
Eigenstrat
⚫ Performs PCA
Problems with Ignoring Population
Structure
● Can cause false conclusions
⚫
Example: variants contribute to diseases that exist different in various
populations
● Including population structure can help determine contributions of:
⚫
Genetic divergence
⚫
Admixture to those differences in prevalence
Role of Eigenvectors Detected by PCA
in Genetics Association Studies
● Correct for genome-wide
population structure
● Calculation of classical Fst statistics
● Conduction of population structure
analyses
⚫
One SNP at a time or across small
region of locus
Hardy-Weinberg Equilibrium
Hardy-Weinberg Equilibrium
● Can determine population differentiation
⚫
Measure difference between the observed heterozygosity and that expected
under Hardy-Weinberg
● (p+q)2 = p2 + 2pq + q2 = 1
⚫
p and q are allele frequencies
⚫
2pq is the expected heterozygote frequency
⚫
F is frequency
Hardy-Weinberg Equilibrium
● Heterozygosity ↓ as populations drift apart
⚫
Due to random assortment of alleles ↓
⚫ Causes a form of inbreeding, thus ↑ in homozygotes
● Under admixture:
⚫
# of heterozygotes can ↑(temporary)
Hardy-Weinberg Equilibrium
● If F=0 and allele frequencies are as expected
⚫
No deviation
⚫
No population structure
● If F is between 0 and 0.1
⚫
Slight divergence
Hardy-Weinberg Equilibrium
● If F > 0.1
◦ Strong genetic divergence
⚫
Only shown by small fraction of genome
⚫
Between any two populations
⚫ Thus,
cans along genome can identify where divergence is greater than expected by
chance
One method used to detect selection acting on individual genes
Fixation Index (Fst)
● Measure of population differentiation due to genetic structure
⚫ Commonly estimated from data regarding SNPs
Fixation Index (Fst)
● Based on the out-of-Africa migration model:
◦ Fst should ↑ with geographic distance between populations
⚫
Genes that show different trend and high deviation are possibly local
adaptations
Recombination & Linkage
Disequilibrium
Recombination
• Important governor of diversity
• Responsible for producing patterns of
genetic diversity seen in populations
• Crossing over of chromosomes during
meiosis leads to recombination of
different alleles of genes on the same
chromosome
• The production of offspring with
combinations of traits that differ from
those found in parents
Recombination
• Rate increases with physical
distance
• Varies among humans
• Typically scales inversely
with genomic length
• In humans, rate is ~60%
higher in females
• Studies in flies and rodents
Recombination
• Does not occur at random in humans
• Concentrated in “hot spots”
• Closely associated with sex
o
Sex synonymous with mixing of
genes between individuals
o Evolution of sex equated to
evolution of genetic
recombination
• Frequencies used to map gene
locations
Linkage Disequilibrium
• Tendency of 2 alleles to be transmitted
together more often than expected
under independent assortment
• Non-random association of alleles at
different sites
• Can determine what allele you will get
on second site from what allele is at
first site
Linkage Disequilibrium
• Haplotype= combo of 2 or
more alleles in the same region
of a chromosome
• Haplotype Blocks= variant
alleles transmitted together in a
population
= LD
• Large haplotype blocks= Large
LD within a population
= LD
Linkage Disequilibrium
• Natural Selection is one of the
most important factors in
creating LD
• LD can be created between
genes by favouring specific
combos of alleles
• Snails
Linkage Disequilibrium
• Can be visualized by color-coding or plotting in triangular heat map
Why Is This Relevant?
Recombination Versus Linkage
Disequilibrium
• Recombination breaks up LD mutations
• Decay of LD depends on Recombination rate
• Recombination changes arrangement of
haplotypes and creates new ones which affects LD
• LD lowers recombination rates
LD
Recombination
Rate
Linkage Disequilibrium & Recombination
Rate in Different Human Populations
Recombination Rate in Different Human
Populations
• More and longer haplotype blocks
throughout European populations
• Lower recombination rate and
diversity among European
• Less LD in Nigerian populations as
haplotype blocks shorter
• More diversity present in these
populations for a longer period of
time
Demographic Processes
• Four evolutionary processes having effect on human
genome…
• Mutation
• Migration
• Inbreeding
• Recent population expansion (not discussed)
Mutation
●Change in DNA bases
◦ Missense
◦ Nonsense
◦ Point
◦ Frameshift
◦ Silent
●Mutagen
◦ Agent causing genetic mutation
Mutation
●Germline mutations
◦ Errors in DNA replication (meiosis)
●Somatic mutation
◦
◦
◦
◦
Tobacco
Pollution
Household plastics
Radiation (sunlight, nuclear accidents)
⚫Accumulate: machinery to fix errors can be stopped by
mutations
⚫Not contributed to genome evolution
Mutation
● Important in shaping the population distribution of disease susceptibility
● Can estimate using 4Nμ (N=# of indv. In population, μ=neutral mutation
rate)
Mutation
● A better measurement involves comparing genomes of offspring to
parents
● Rate of mutations –10-8 per base per generation
● New neutral mutations have ½ N chance of becoming new allele at that
site
◦ Most lost within a few generations
◦ Still cannot ignore collective impact they have on phenotypic
variability when they are in gene pool
Migration
●Genome-wide effects; faster & larger
changes in allele frequencies
compared to natural selection
●Bring together combination of
different genotypes
◦ ↑ population fitness
◦ ↓ rare disease
Migration: Founder effect
• Small group migrates from
original group
• Have small sample of alleles
from source population
• Loss of genetic variation
Migration: Admixture
• Reproduce with another
population in the new territory
• Affect allele frequencies of both
population
• Seen in variance of allele
frequencies of “genetic melting
pots” such as North America
Genetic Drift
• Chance change of
allele frequency over
time
• Caused by founder
effects or genetic
bottlenecks
Inbreeding
●Close inbreeding: marriage of relatives (cousins)
●Certain societies have “incest taboos”
●Globally ~10% humans related to their partner
at least at second cousin level
◦ Risk of autosomal recessive diseases increased
●“Homophily”: Tendency of people to marry
similar to themselves (height, temperament..
Etc.)
◦ Doesn’t affect entire genome
Purifying Selection
●Process to remove deleterious alleles
●Allele causing childhood mortality or sterility won’t be
passed on
●Serve deficiencies have low chance of being passed on
◦ Deficiencies of cognitive function
◦ Immune system
◦ Essential organ function etc.
●Recessive allele can drift to ~1% before selection will act
against it
●~19-26% reduction in hominid autosomal genetic
diversity by purifying selection
Purifying Selection: SNPs
• Single letter change of DNA called Single-Nucleotide Polymorphisms (SNPs or
“snips”)
• Natural genetic variation
• Synonymous (no change in protein sequence) or nonsynonymous (change amino
acid)
• Transitions (C↔T or A↔G) or Transversions (C/T↔A/G)
Purifying Selection: SNPs
●
Different results depending on position of SNP in triplet
◦ 1st position= missense (almost always)
◦ 2nd position= missense (always)
◦ 3rd position= silent (typical) or missense (occasional)
Distribution of SNPs in Human Genome
“The HapMap is a catalog of common genetic variants that occur in
human beings. It describes what these variants are, where they
occur in our DNA, and how they are distributed among people
within populations and among populations in different parts of the
world.” (International HapMap Consortium, 2005)
•
•
•
•
•
HapMap data on two human populations
Derived alleles are variants that replaced an ancestral allele
Derived allele frequencies shifted towards greater number of low frequency minor alleles
Nonsynonymous substitutions less common in genome
Rare variants with low derived allele frequency more nonsynonymous
• Average ~1 SNP per kilobase per individual
• Not even along the chromosome
• Lower in upstream 5’ region
• Increased in 3’ region
• Also not even among different genes
Purifying Selection Continued…
●Human population in 2011 ~7 billion
●Consequence of this exponential growth is less
purifying selection
●Mutation accumulation in humans not observed in
other species which have not had this population
explosion
●Tools for identifying deleterious variants?
◦ Biochemical assays
◦ Testing variant in model organisms
◦ Bioinformatics
⚫Algorithms used: PolyPhen +SIFT
Adaptation
Adaptation Definition
• The opposite of purifying selection
• Selecting positive mutations that will increase the fitness of a species
• Key to the concept of evolution
Local or Global Distribution
• Correlations between allele frequencies and
geographic location give evidence that where you
live may affect certain genetic variants
• Similarly the environmental conditions of where you
live may contribute to variation
Research
• Statistical models have been used to show adaptation through human
history
• Positive selection have been proven to be involved in growth, immune
responses, and metabolism
• Adaptive scenarios have been pieced together to explain genetic
contribution to some diseases
Hard versus Soft Selection
Hard Selection
• Type we are most used to talking about
• When a trait gives an organism an advantage so it is more likely to be
transferred generation to generation
• It eventually becomes a fixed trait
Hard versus Soft Selection
Soft Selection
1.
Heterogeneity depends on any polymorphism no matter how temporary,
nor the level of frequency, other functional alleles will appear at the same
gene and evolve in parallel
1.
Adaptive circumstances can make an allele, whether it was deleterious,
adaptive or neutral, into an advantageous allele.
Soft selection leaves much different traces in our genome than hard selection
Detecting Selection
●
The easiest way to detect if there has been any changes in a gene
◦
●
find the ratio of nonsynonymous to synonymous nucleotide changes in the
coding region
More advanced tests of neutrality (like Hudson-Kreitman-Aguade) extend
analyses to non-coding regions by comparing the genetics with other
species/populations
Extended runs of homozygosity
●
Extended haplotype homozygosity (EHH) compares lengths of
haplotypes blocks focusing on the alleles that selection has been
proposed in.
Genetic Hitchhiking
• Selection on one nucleotide increases the
frequencies of all other nucleotides in linkage
disequilibrium
• Hard selection expected to have less diversity
in selected allele area
Figure 3.8 Extended Haplotype Homozygosity (Part 1)
Figure 3.8 Extended Haplotype Homozygosity (Part 2)
Major Histocompatibility Complex
(MHC)
• Most adaptations involve more subtle combinations
• MHC (also known as HLA-Human leukocyte antigen) most polymorphic
region of our genome
• It encodes our immune responses protecting us against different bacteria
and viruses
Major Histocompatibility Complex
(MHC)
• This complex shows combined genes from both Neanderthal and
Denisovan genomes
• Evidence of different environmental contributions
• Soft selection has been observed in this complex
Why Do We Get Sick?
• Why do we vary so much in our susceptibility to certain diseases?
⚫Evolutionists believe that there must be some advantage to the
deleterious gene thus why it persists in the population
⚫Many researchers search for the positive aspects of harmful genes
Why Do We Get Sick?
• Other researchers believe rare alleles are responsible for disease and that
some disease in the world is unavoidable
• The last view we will discuss is that disease is due to there being a normal
amount of variation in alleles
• In this model as long as distribution is normal some disease will exist,
natural selection can only limit the number of people affected
Natural Selection Fitness and Disease
• Main conclusion of research- natural selection should limit disease
• Many diseases have a late onset- after we have had a chance to reproduce
• It has been speculated that diseases that occur later may be at the cost of
increased fecundity
Problem with the Theories
• The minor allele is not always the one with the disease associated with it
Demographics
• Genes can evolve to protect one against disease
• By looking at different populations patterns are seen
• East Asian’s for instance have a lower risk of Type II diabetes
• It has been found that they have a high frequency of protective derived
alleles associated with Type II diabetes
Demographics
• Correlations such as the one on the previous slide lead us to wonder how
disease differs among populations
• Genetic association studies show that most variants which cause risk in
one population do so in another
Why are Diseases Rising?
• Famine and infectious diseases dominate the developing world
• World health Organization estimates by 2020 in developed countries 7/10
deaths will be due to:
• Diabetes
• Heart disease
• Cancer
• Depression
Emerging Diseases
• While many diseases can be attributed to increased diagnoses it does not
explain some diseases
• Diseases that were never heard prior to the 19th centaury of have become
frequent in today’s society (Autism, Alzheimer's)
• Changes may be due to behavioural and environmental changes
Figure 3.9 The Global Burden of Disease
References
• Carr, S. (2016). Biology 2250 - Principles of Genetics. Retrieved 15 March, 2016, from
https://www.mun.ca/biology/scarr/2250_Genetic_Code_2015.html
• Enard,W., et al. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418 869-872
• Fixation index. (2016). Wikipedia. 16 March 2016 http://www.en.wikipedia.org/wiki/fixation_index.
• Gibson,G. (2015). A Primer of Human Genetics. Sunderland, MA. Sinauer
• International HapMap Consortium. 2005. A haplotype map of the human genome. Nature 437: 1299-1320.
• Linkage disequilibrium and recombination. Bio 107/207. 2005.
http://bio.classes.ucsc.edu/bio107/Class%20pdfs/W05_lecture15.pdf.
• Mikkelsen, T. S., et al. (2005). Initial sequence of the chimpanzee genome and comparison with the human genome.
Nature 437: 69-87.
• Scally, A., et al. (2012). Insights into hominid evolution from the gorilla genome sequence. Nature 483 169-175.
• Staveley, B.E. (2016). Sexual Reproduction, Meiosis, and Genetic Recombination. Principles of Cell Biology (BIOL2060).
http://www.mun.ca/biology/desmid/brian/BIOL2060/BIOL2060-20/CB20.html.
• Wall, Jeff. (2011). Linkage Disequilibrium and Association Studies in Different Human Populations. California Academic
Media Services. https://www.youtube.com/watch?v=Z5K90DVkj1M.