Transcript Document
Genome-Wide Association Studies
Xiaole Shirley Liu
Stat 115/215
Association Studies
• Association between genetic markers and
phenotype
– E.g. Cystic Fibrosis ~70% of Cystic Fibrosis
patients have a deletion of 3 base pairs resulting
in the loss of a phenylalanine amino acid at
position 508 of the CFTR gene
• Especially, find disease genes, SNP / haplotype
markers, for susceptibility prediction and
diagnosis
2
Influences individual decisions on life
styles, prevention, screening, and treatment
3
Warfarin and CYP2C9:
SNPs in Pharmacogenomics
• Warfarin anticoagulant drug; CYP2C9 gene
metabolizes warfarin.
• A patient requiring low dosage warfarin
compared to normal population, has an odd
ratio of 6.21 for having 1 variant allele
• Subgroup of patients who are poor
metabolisers of warfarin are potentially at
higher risk of bleeding
Aithal et al., 1999, Lancet.
Genome-Wide Association Studies
• Quality Control
–
–
–
–
Unusual similarity between individual
Wrong sex
Trio has non-Mendelian inheritance
Genotyping quality
• Two strategies:
– Family-based association studies
– Population-based case-control association
studies
5
Quality Control: SNP calls
• % SNP called
Good calls!
Bad calls!
Family-based Association Studies
Look at allele transmission in unrelated families and
one affected child in each
Like coin toss,
likelihood of fair coin
7
TDT: Transmission Disequilibrium Test
• Only heterozygote parents matters, calculate
observed over expected
(A- a) (9 - 2)
2
=
=
, ZTDT
~ c 2 ,1df
A+ a
9+2
2
Z
2
TDT
2
• Could also compare allele frequency between
affected vs unaffected children in the same family
8
Case Control Studies
• SNP/haplotype marker frequency in sample
of affected cases compared to that in age
/sex /population-matched sample of
unaffected controls
9
From Genotyping to Allele Counts
10
Test Significant Associations
• Expected:
– (24 + 278) * (24 + 86) / (24 + 278 + 86 + 296) = 49
– (278+296) * (86+296) / (24 + 278 + 86 + 296) = 321
2
•
i, j
11
(eij oij )2
eij
2 = 27.5, 1df, p < 0.001
12
Association of Alleles and Genotypes of
rs1333049 (‘3049) with Myocardial Infarction
C
N (%)
G
N (%)
2,132 (55.4)
1,716 (44.6)
Controls 2,783 (47.4)
3,089 (52.6)
Cases
2
(1df)
P-value
55.1
1.2 x 10-13
Allelic Odds Ratio = 1.38
• OR = 1, no disease association
• OR > 1, allele increase risk of disease
• OR < 1, allele decrease risk of disease
Samani N et al, N Engl J Med 2007; 357:443-453.
Multiple hypotheses testing?
GWAS Pvalues
GWAS Pvalues for Type II Diabetes
• Bonferroni correction: most common, typically p
< 10-7 or 10-8
Manhattan Plot
McCarthy et al, Nat Rev Genetics, 2008
Size Matters
Visscher, AJHG 2012
16
How to Improve Statistical Power?
• Without increasing samples?
• Test association of disease with haplotypes
instead of individual SNPs
– Also reduce genotyping errors
• Split samples:
– First half narrow down promising SNPs /
haplotypes
– Second half refining hits (much fewer multiple
hypotheses)
17
Unusual Pvalue distributions
• Pvalue QQ plot
18
Unusual Pvalue distributions
• Pvalue QQ plot
• Population
stratification
Marchini, Nat Genet. 2004
19
European population structure
1,387 samples
~200K SNPs
UK WTCCC1 Study
Afro-Caribbean samples
South Asian samples
Africa
European
Chinese + Japanese
21
Genomic control
• Devlin and Roeder (1999) used
theoretical arguments to propose that
with population structure, the
distribution of Cochran-Armitage trend
tests, genome-wide, is inflated by a
constant multiplicative factor λ.
• We can estimate the multiplicative
inflation factor using the statistic λ =
median(Xi2)/0.456.
• Inflation factor λ > 1 indicates
population structure and/or
genotyping error.
• We can carry out an adjusted test of
association that takes account of any
mismatching of cases/controls at any
SNP using the statistic Xi2/ λ.
True hits?
Population outliers
and/or structure?
Inflation factor λ = 1.11
IBD: Identity By Descent Test
• If two individuals share common ancestor, they
will share many SNPs / haplotype blocks on their
genome (identical by state: IBS)
23
IBD: Identity By Descent Test
• Pairwise IBD probability between samples
• Probability two individuals share 0 (Z0), 1 (Z1),
and 2 (Z2) haplotypes across the genome.
• Remove IDBs
24
Manolio et al., Clin Invest 2008
Acknowledgement
•
•
•
•
•
•
•
•
•
26
Tim Niu
Kenneth Kidd, Judith Kidd and Glenys Thomson
Joel Hirschhorn
Greg Gibson & Spencer Muse
Jim Stankovich
Teri Manolio
David Evans
Guodong Wu
Bo Li