PopGenomics_2009B

Download Report

Transcript PopGenomics_2009B

Are we still evolving?
Mapping sites of selection in the human genome
Simon Myers
Targets of selection are important
What makes us human?
(FOXP2, gene loss)
What parts of our genome
are functional?
(Genes, regulatory
regions, siRNAs,….)
Pathogen
evolution
Humans
Understand how we adapt to our environment
• Diet (Lactase, amylase)
• Mating success
• Physical environment (SLC24A5, EDAR…)
• Disease (LARGE, Duffy,…)
• ??
Other species
Resistance to pesticides
Adaptive evolution
Time
• Advantagous mutations arise by chance
• Once arisen, carriers have more offspring
• “Positive selection”
• On average, higher rate of change towards
advantageous mutations
Looking for positive selection
• Direct approach is very difficult
– Need to observe trait for long time
– Need very strong selection
• In many cases, need a more indirect approach
–
–
–
–
Compare genomes among closely related species
Look for “accelerated evolution”
Current day patterns of diversity
Look for “signature of selection”
FOXP2
• Gene coding for a transcription factor
• Mutations in this gene cause speech impairment
and other problems (Lai et al., Nature 2001)
– Mutation in FOXP2 co-segregates with a disorder in a
family in which half of the members have severe
speech, linguistic and grammatical difficulties
– Translocation in same gene in unrelated individual
with similar disorder
• Are changes in this gene associated with human
language development?
FOXP2 (Enard et al., Nature 2002)
Yellow: human lineage
mutations (since
chimpanzee-human
split)
Blue: mutations on all
other lineages
Very conserved gene (top 5% of 1,880 genes)
Only 3 non-repeat amino acid changes in 130 million years between
human and mouse
2 occurred on human lineage in last 5-6 million years
FOXP2 (Enard et al., Nature 2002)
156 synonymous changes,
0 on human lineage
4 non-synonymous changes 2 on human lineage
(p=0.0005 by Fishers exact test)
Gene loss
•
CMAH: Loss of enzymes
that transform sialic acid
– Sugar on cell surface
that mediates a variety
of recognition events
involving pathogenic
microbes and toxins
•
Myosin heavy chain
– Reduces masticatory
muscles?
– Associated with
gracilization
•
KRTHAP1:
–
Hair keratin
Wang et al (2006)
Is this the answer?
• Comparative genomics has disadvantages
– Need repeated mutations to give power
– Tells little about the timescale
– Recent research suggests Neanderthals may
share FOXP2 mutations with humans (Krause
et al., Current Biology 2007)
• How do we find out if, and where, we’re
currently evolving?
Looking for positive selection
• Direct approach is difficult
– Need to observe trait for long time
• In many cases, need a more indirect approach
–
–
–
–
Compare genomes among closely related species
Look for “accelerated evolution”
Current day patterns of diversity
Look for “signature of selection”
Variation data and selection
• Revolution in population genetics
• Genome-wide datasets
– HapMap project
– Many unrelated individuals (60 CEU, 60 YRI, 45 JPT
and 45 CHB)
– Typed at ~4,000,000 loci that vary within population
• Allow systematic searches for selection
– Comparison of interesting regions to genome
– Identification of novel candidates for selection
Neutral alleles
I
II
Neutral
variation
Neutral allele
arises
III
Recombination
scrambles variation
over time
e.g. HapMap
The signature of positive selection
I
II
Neutral
variation
Advantageous
allele arises
III
Spreads
(sweeps)
rapidly
through
population
Recombination has much less time to scramble variation on selected
background
The signature of positive
selection
SelSim (Spencer and Coop, Bioinformatics 2004)
Neutral mutation at 50%
Selected mutation at 50%
EHH
•
Several authors have developed tests based on
similar idea
– Sabeti et al. (Nature 2002)
– Focus on potentially selected mutation
– Measure proportion of haplotypes identical,
as a function of distance on either side
– Compare selected/nonselected types
– Look for signal of “extended haplotype
homozygosity” (EHH)
Simulation results (Voight et al.,PloS
Biology 2006)
Lactase gene
– 70% of all humans are lactose intolerant
– In Europe, 95% lactose tolerance
Lactase gene
•
•
•
•
•
DNA variant C/T-13910
14kb upstream of Lactase gene
Predicts lactose persistance (Enattah et al., Nature Genetics 2002)
Mutation enhances promoter activity, so probably causal (Olds et al. Hum.
Mol. Genet. 2003)
Other mutations exist in some groups
EHH around Lactase
From Bersaglieri et al. (AJHG, 2004)
EHH around Lactase
5’: p=.012
3’: p<0.0004
Human evolution in action
Malaria resistance
Infection by Lassa
virus
From the HapMap paper (Nature, 2005)
A complimentary approach
• SNPs that are at
highly different
frequencies across
populations are
excellent candidates
for selection
– EDAR (hair follicle
development,
HapMap paper,
Sabeti et al. Nature
2007)
– SLC24A5, SLC45A2
(HapMap paper,
Lamason et al.
Science 2005)
– Explored in practical
Non-synonymous SNP in FY gene
Conclusions
• Population genetics provides diverse information about
molecular evolution
• Combining population genetics with knowledge of
genomic sequence
– New insights into adaptive evolution
– Evolution is ongoing, and influenced by local environment
– Limited power means we will probably never find all sites of
selection
• Avalanche of variation data being gathered
– Will bring many more insights
– Presents major challenges in utilising vast and highly informative
datasets, whilst keeping analyses computationally tractable
Purifying selection
• Much of the work of selection is removing
disadvantageous alleles
Maladaptive mutation
Fewer offspring
Mutation lost
• Regions performing some useful function (e.g.
genes!) evolve more slowly
• Once again, comparative genomics can help!
– Look for regions that are conserved between distantly
related species
Identifying conserved regions
5% of genome is “conserved” – but only 1.5% exonic sequence
SNP frequency “spectrum” in CNC’s
•
SNPs are at lower frequencies in CNC’s (p=3x10-18)
•
Signal is weak – not all CNCs selected?
– Stronger near genes
– Strongest at very highly conserved elements (Katzman et al., Science 2007)
Drake et al. (Nature Genetics, 2005)
Conclusions
• Population genetics provides diverse information about
molecular evolution
• Combining population genetics with knowledge of
genomic sequence
– New insights into adaptive evolution
– Evolution is ongoing, and influenced by local environment
– Limited power means we will probably never find all sites of
selection
• Avalanche of variation data being gathered
– Will bring many more insights
– Presents major challenges in utilising vast and highly informative
datasets, whilst keeping analyses computationally tractable