Leveraging Genetic variability across populations for

Download Report

Transcript Leveraging Genetic variability across populations for

Human Genetic Variation
Genetics of Complex Diseases
2
Challenges
Challenge 2: Correcting genotyping errors
3

How can we detect genotyping errors?
 Hardy-Weinberg
Equilibrium
 If we have Mother-father-child trios we can check
Mendelian consistency.
Challenge 3: Population Substructure

Imagine that all the cases are collected from Africa,
and all the controls are from Europe.
 Many
association signals are going to be found
 The vast majority of them are false; Why ???
Different evolutionary forces: drift, selection, mutation,
migration, population bottleneck.
Shaping Genetic Variation



Mutations add to genetic variation
Natural Selection controls the frequency of certain
traits and alleles
Genetic drift
Ancestral population
Ancestral population
migration
Ancestral population
Genetic drift
different allele frequencies
Population Substructure

Imagine that all the cases are collected from Africa,
and all the controls are from Europe.
 Many
association signals are going to be found
 The vast majority of them are false;
What can we do about it?
Ancestry Inference

To what extent can population structure be detected from SNP data?

What can we learn from these inferences?

Can we build the tree of life?

How do we analyze complex
populations (mixed)?
Novembre et al., Nature, 2008
Principal Component Analysis



Dimensionality reduction
Based on linear algebra
Intuition: find the ‘most important’ features of the
data.
Principal Component Analysis
Plotting the data on a one
dimensional line for which
the spread is maximized.
Principal Component Analysis


In our case, we want to look at two dimensions at a
time.
The original data points have many dimensions –
each SNP corresponds to one dimension.
14
Data Available
International consortium that aims
in genotyping the genome of 270
individuals from four different
populations.
HUJI 2006
- Launched in 2002.
- First phase (2005):
~1 million SNPs for 270 individuals from four populations
- Second phase (2007):
~3.1 million SNPs for 270 individuals from four populations
- Third phase (ongoing):
> 1 million SNPs for 1115 individuals across 11 populations
HUJI 2006
HapMap Populations
17
MKK
LWK
YRI
GIH
ASW
CHD CEU
MEX
JPT
CHB
TSI
HapMap PCA 1-2
18
HapMap PCA 1-3
19
HapMap PCA 1,2,4
20
Lessons from the HapMap
21

African populations have higher genetic diversity than
other populations
Evidence for bottlenecks or founder effect in the other
population
 Evidence for the out-of-Africa theory


HapMap was used to detect:
Common deletions across the genome
 Regions under selection
 Recombination rates, hotspots
 Associations of SNPs with disease

Example: detection of deletions using SNPs
22
Conrad et al., Nature Genetics, 2006
Example: detection of deletions using SNPs
23

Conrad et al. applied the method on the HapMap
and found:
Typical individuals have roughly 30-50 deletions larger than
5kb (500kb-750kb total sequence length).
 Deletions tend to be gene-poor.
 The deletions detected in the HapMap span 267 known and
predicted genes.


Deletions were found to be related to different
conditions such as Schizophrenia (Steffanson et al.,
2008), lupus glomerulonephritis (Aitman et al.,
Nature, 2006), and others.
Distribution of deletion length
24
Conrad et al., Nature Genetics, 2006
Significant Region
- Why do we have differences between data1 and data2?
- How come so many SNPs seem to be associated
in this region?
• Maybe there are multiple ‘causal SNPs’?
• Or maybe there are correlations between the SNPs… ?
26
Linkage Disequilibrium
Signatures of History
Linkage Disequilibrium
27
Haplotypes vs. Genotypes
Haplotypes
ATCCGA
AGACGC

Genotype
 T chromosome
C  C 
mother
A
CG




 
father
chromosome
GA  A 
Cost effective genotyping technology gives
genotypes and not haplotypes.
Possible
phases:
ATACGA
AGCCGC
AGACGA
ATCCGC ….
Haplotypes cluster naturally
Haplotypes cluster naturally