Detection of positive selection in humane genome

Download Report

Transcript Detection of positive selection in humane genome

Detection of positive
selection in humane
genome
Introduction
Introduction
Before and after genome sequencing
Detection Methods
1.- High proportion of function-altering mutations
Sperm proteamine P1: Protamines are small, arginine-rich, nuclear proteins
that replace histones late in the haploid phase of spermatogenesis and are
believed essential for sperm head condensation and DNA stabilization
2.- Reduction in genetic diversity
Region with low diversity
and excess of rare alleles
3.- High-frequency derived alleles
African populations
Thought to be the result of selection for resistance to P.vivax malaria.
4.- Differences between populations
5.- Long haplotype
Results
Candidate region
characteristics:
Mean length : 815kb
Max length: 3.5Mb
Often contain multiple
genes. Mean: 4 Max: 15
A typical region harbour
400-4000 common SNP
(frec >5%)
¾ SNP database
½ Genotyped HapMap2
¿Which are the true signatures of positive selection?
Results
• SLC24A5:
–
–
–
600KB region
914 genotyped SNPs
Filter application:
•
•
•
•
•
Strongest signal of positive selection
Encodes A111T polymorphism associated with
pigment differences in humans.
• LCT:
–
–
–
9166 SNPs associated with the long-haplotype signal (Long
haplotype)
480 satisfied the two other criteria (Population differences and Derived
allele)
41 (0’2% of all SNPs genotyped in the regions) possibly functional on
the basis of newly compiled database
857 SNPs associated with long-haplotype signal
–
233 of 867 are high-frequency derived alleles
–
12 of which are highly differentiated between
populations
•
41 SNPs:
5 of which are common in Europe and rare in Asia
–
8 encode non-synonymous changes.
•
SLC24A5 (well kwon)
· EDAR
and Africa
•
PCDH15
· ADAT1
1 of these 5 is only one implicated as functional by
•
KARS
· HERC1
current knowledge
•
SLC30A9
· BLFZ1
–
–
–
–
• They performed a similar
analysis on all the 22
candidate regions.
2.4Mb
24 SNPs fulfill first two
criteria
Confer adult persistence of
lactase.
Only was identified as
functional after extensive
study of the LCT gene.
–
The remaining 33 potentially functional SNPs lie within
•
•
•
•
Conserved transcriptional factors motifs
Introns
UTRs
Other non-coding regions
Some specific cases
• PS on copy number
– Expression differences exist between populations and can confer
different fitness advantage and thus be positively selected.
– Therefore, positive selection can potentially act on copy number and on
non-coding regions.
– AMY1: copy number is positively correlated with salivary amylase
protein expression.
• Mean AMY1 copy was higher in the high-starch population
• PS on Noncoding Genomic Regions
Discussion
Why have many earlier results fared poorly in
genome-wide studies?
Red triangles: previous candidates
for selection (81)
Gray diamonds: newly available
genome-wide empirical data set.
Discussion
1.- False positives and negatives
2.- Ascertainment bias of data
3.- Demographic events
4.- Bias DNA repair
Bibliography