Admixture Mapping - Division of Statistical Genomics

Download Report

Transcript Admixture Mapping - Division of Statistical Genomics

Admixture Mapping
Qunyuan Zhang
Division of Statistical Genomics
GEMS Course M21-621
Computational Statistical Genetics
March 25, 2010
1
Three Mapping Strategies
Linkage Analysis (linkage): genotype &
phenotype data from family (or families)
Association Scan (LD): genotype & phenotype
data from population(s) or families
Admixture Mapping (LD): genotype data from
admixed and ancestral populations, phenotype
data from admixed populations
(1) Ancestry-phenotype association mapping
(2) Ancestry info for population structure
control
2
Genetic Admixture
Ancestral Population 1
Ancestral Population 2
Africans
Caucasians
Admixture Information
(Ancestry Analysis)
Admixed Population
African Americans
Admixture Mapping
3
Rationale of Admixture Mapping
If a disease has some genetic factors, and the disease gene
frequency in pop 2 is higher than in pop 1. After the admixture of
pop 1 and 2, the diseased individuals in admixed generations will
carry disease genes/alleles that have more ancestry from pop 2 than
from pop 1.
If a marker is linked with disease genes, because of linkage
disequilibrium, the diseased individuals will also carry the marker
copies that have more ancestry from pop 2 than from pop 1.
Inversely, if we find a marker/locus whose ancestry from pop 2 in
diseased group is significantly different from that in non-diseased
group, we consider this marker/locus to be linked with (or a part of )
disease gene.
4
Illustration of Admixture
5
Advantages of Admixture Mapping
Admixed population has more genetic variation and
polymorphism than relatively pure ancestral populations.
Admixture produces new LD in admixed population. Compared
with ancestral populations, shorter genetic history of admixture
population keeps more LD (long genetic history will destroy LD),
In admixed population, LD could be detected for relatively loose
linkage.
Ancestry information can be used to control population
stratification caused by genetic admixture.
According to simulation, admixture mapping demonstrates
higher power than regular methods, needs less sample size.
Flexible design: case-control or case-only, qualitative or
quantitative traits, no need of pedigree information
6
Ancestry
Proportion of genetic materials descending from each
founding population
Population level : population admixture proportion
Individual level: individual admixture proportion
Individual-locus level: locus-specific ancestry
7
Two Ways of Using Ancestral Info.
Individual Ancestry (IA) can be used as a genetic
background covariate for population structure control
Phenotype= a + b * Genotype + c * IA + Error
Locus-specific Ancestry (LSA) can be directly used to
detect association (admixture mapping)
Phenotype=a + b * LSA
8
Individual Ancestry (IA) Estimation
using MLE
G: Observed genotypes of admixed and ancestral populations
Q: Allelic frequencies in ancestral populations
P : Individual Ancestry to be estimated
Goal: obtain P that maximizes Pr(G|P,Q)
1. Assign prior values for Q (randomly or estimated from ancestral population
genotype data) & P (randomly)
2. Compute P(i) by solving
3.
Compute Q(i) by solving
(G | Q, P)
0
 ( P)
(G | Q, P)
0
(Q)
4. Iterate Steps 1 and 2 until convergence.
Tang et al. Genetic Epidemiology, 2005(28): 289–301
9
Locus-specific Ancestry Estimation
using MCMC
Observed G : genotypes of admixed and ancestral populations
Unknown Z : admixed individuals’ locus specific ancestries from ancestral populations
Problem: How to estimate Z
?
Maximum Likelihood Estimate(MLE):
How to obtain a
Z that maximizes
Pr(G|Z) ?
Z is a huge space of parameters, in which search is difficult for likelihood method.
Bayesian and Markov Chain Monte Carlo (MCMC) methods
1. Assume ancestral population number
K
2. Define prior distribution Pr(Z) under
K
3. Use MCMC to sample from posterior distribution Pr(Z|G) = Pr(Z)∙ Pr(G|Z)
4. Average over large number of MCMC samples to obtain estimate of
Z
Falush et al. Genetics, 2003(164):1567–1587
10
Software
STRUCTURE
Falush D, Stephens M, Pritchard JK (2003)
Inference of population structure using multilocus genotype data: linked
loci and correlated allele frequencies. Genetics 164:1567–1587.
ADMIXMAP
Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles
RA, Clayton DG, McKeigue PM (2003) Control of confounding of genetic
associations in stratified populations. Am J Hum Genet 72:1492–1504.
ANCESTRYMAP
Patterson N, Hattangadi N, Lane B, Lohmueller
KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O’Brien SJ, Altshuler
D, Daly MJ, Reich D (2004) Methods for high-density admixture mapping of
disease genes. Am J Hum Genet 74:979–1000
11
References
D.C.Rife. Populations of hybrid origin as source material for the detection of
linkage. Am.J.Hum.Genet. 1954, (6):26-33
R.Chakraborty et al. Adimixture as a tool for finding linked genes and detecting
that difference from allelic association between loci. Proc.Natl.Acad.Sci.
1988,Vol.85:9119-9123
N. Risch. Mapping genes for complex disease using association studies with
recently admixed populations. Am.J.Hum.Genet.Suppl. 1992, 51:13
…
P.M.McKeigue. Prospects for admixture mapping of complex traits.
Am.J.Hum.Genet. 2005, Vol.76:1-7
X.Zhu et al. Admixture mapping for hypertention loci with genome-scan markers.
Nature Genetics. 2005,Vol.37(2): 177-181
Q Zhang et al. Genome-wide admixture mapping for coronary artery calcification in
African Americans: the NHLBI Family Heart Study. Genet Epidemiol. 2008
Apr;32(3):264-72.
12
Marker Information Content (MIC ) Distribution
Used for Simulation (300 Loci)
Mean=0.22
Std Dev=0.1003
(MIC)
Freqency of allele k at
locus i in Caucasians
n
MICi  
k 1
fikW  fikB
Freqency of allele k at
locus i in Africans
2
Allele number of locus i
13
African Americans
622 Subjects from 211 families
400 microsatellite markers
Average distance 10 cM
Coronary and aortic
artery calcium (CAC)
Admixture Mapping
CAC Loci
calcified plaque
Quantified by CT
14
Data
Samples 1672 subjects from 3 populations:
622 African Americans (211 families) from FHSSCAN
893 Caucasians (320 families) from FHS-SCAN
157 Africans (unrelated) from Marshfield Center
Genotypes
302 microsatellite Loci of all subjects
Average marker distance 11.9cM
Phenotype
Coronary and aortic artery calcium (CAC) of 622
African Americans, BLOM transformation
15
Statisticl Procedure
Step 1
Randomly draw one subject from each family to create a sample of
688 unrelated subjects which comprises :
211 African Americans from 211 families (FHS-SCAN)
320 whites from 320 families (FHS-SCAN)
157 unrelated Africans (Marshfield Center)
Step 2
Ancestry estimation, STRUCTURE 2.1
Step 3
Ancestry-CAC association analysis, regress 211 African Americans’
CAC scores on their locus-specific ancestries from Africans.
Step 4
Repeat step1~step3 (100 times), obtain the average p-value of each
locus
Step 5
For each locus: permutation test on average p-value
Number of random permutations: 10000
16
RESULTS
Sources of Variation of Ancestry-from-Africans
Sources of variation
Variance components Percent(%)
Families
Subjects within family
Loci within subject
Replications within locus
0.01054
0.00492
0.00599
0.00042
48.19
22.50
27.39
1.92
2%
Var(families)
27%
48%
Var(subjects/family)
Var(loci/subject)
23%
Var(replications/locus)
17
RESULTS
Ancestry Analysis at Population Level
Population Admixture Proportions in African Americans
Founding population
Ancestry(%)
From Caucasians
22.04
From Africans
77.96
18
RESULTS
Ancestry Analysis at Individual Level
Individual Ancestry Distribution of 622 African Americans
Ancestry-from-Africans: average 77.96% (3.1%~96.9%)
19
RESULTS
Ancestry Analysis at Individual-locus Level
Distribution of Locus-specific Ancestries from Africans
Ancestry from Africans
An Example African American
302 Microsatellite Loci
ordered by chromosome and position
from Chrom. 1 (4.22cM) to Chrom. 23 (104.83cM)
20
RESULTS
Locus-specific Ancestry-CAC association analysis
No.
Loci
Chr#
Pos.
Permu. p
Reg. coeff.
R2
1
AFM063XF4
10
19 .0
(10p14)
0.0021
-1.2442
0.0310
2
GATA64D02
6
80.45
(6q12)
0.0024
-2.2112
0.0205
3
GATA42H02
4
181.93
(4q32)
0.0083
2.7996
0.0198
4
AFMB337ZH9
22
60.61
0.0120
1.1594
0.0194
5
GGAA20G10
2
27.6
0.0133
0.7271
0.0166
6
GATA73H09
12
78.14
0.0170
-1.4403
0.0150
7
GGAA3F06
7
41.69
0.0173
1.6652
0.0163
8
UT1307
20
69.5
0.0178
-1.1565
0.0175
9
UT7136
22
52.61
0.0194
1.9457
0.0162
10
GATA163B10
6
42.27
0.0267
-1.3473
0.0165
11
GATA88F09
10
4.32
0.0315
-2.1781
0.0153
12
GATA26D02
12
83.19
0.0319
-2.0880
0.0130
13
ATA1B07
11
54.09
0.0339
1.1540
0.0143
14
ATA4E02
1
192.05
0.0394
0.7829
0.0122
15
GATA137H02
7
29.28
0.0418
1.3168
0.0121
16
GATA4D07
2
145.08
0.0455
-1.0065
0.0125
17
ATA31G11
10
28.31
0.0461
-1.2933
0.0134
21
-log(p value) of Markers on Chromosome 4
Chromosome 4
(20 markers)
GATA42H02
-log(p)
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
120
140
160
180
200
220
Distance (cM)
22
-log(p value) of Markers on Chromosome 6
Chromosome 6
(16 markers)
-log(p)
GATA64D02
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
120
140
160
180
200
Distance (cM)
23
-log(p value) of Markers on Chromosome 10
Chromosome 10
(14 markers)
-log(p)
AFM063XF4
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
120
140
160
180
Distance (cM)
24