Jianfeng Xu, MD, DrPH: GWA - UCLA School of Public Health
Download
Report
Transcript Jianfeng Xu, MD, DrPH: GWA - UCLA School of Public Health
GWA ─ promising but challenging
Jianfeng Xu, M.D., Dr.PH
Professor of Public Health and Cancer Biology
Director, Program for Genetic and Molecular Epidemiology of Cancer
Associate Director, Center for Human Genomics
Wake Forest University School of Medicine
Outline
The need for genome-wide association studies
The reality of genome-wide association studies
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
The need for GWA
Current understanding of disease etiology is limited
Therefore, candidate genes or pathways are insufficient
Current understanding of functional variants is limited
Therefore, the focusing on nonsynonymous changes is not sufficient
Results from linkage studies are often inconsistent and broad
Therefore, the utility of identified linkage regions is limited
GWA studies offer an effective and objective approach
Better chance to identify disease associated variants
Improve understanding of disease etiology
Improve ability to test gene-gene interaction and predict disease risk
GWA is promising
Many diseases and traits are influenced by genetic factors
i.e., they are caused by sequence variants in the genome
Over 6 millions SNPs are known in the genome
i.e., some SNPs will be directly or indirectly associated with causal variants
The cost of SNP Genotyping is reduced
i.e., it is affordable to genotype a large number of SNPs in the genome
Large numbers of cases and controls are available
i.e., there is statistical power to detect variants with modest effect
When the above conditions are met…
…associated SNPs will have different frequencies between cases and controls
GWA is challenging
Many diseases and traits are influenced by genetic factors
But probably due to multiple modest risk variants
They confer a stronger risk when they interact
True associated SNPs are not necessary highly significant
Too many SNPs are evaluated
False positives due to multiple tests
Single studies tend to be underpowered
False negatives
Considerable heterogeneity among studies
Phenotypic and genetic heterogeneity
False positives due to population stratification
Reality of GWA
AMD, IBD, T1D, etc.
Parkinson’s, nicotine dependence, T2D, etc.
Prostate cancer, breast cancer, and other ongoing studies
Heart diseases, lung diseases, psychiatric diseases, inflammatory
diseases, cancers, and many other studies that are in planning stages
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Genome coverage
Two major platforms for GWA
Illumina: HumanHap300, HumanHap550, and HumanHap1M
Affymetrix: GeneChip 100K, 500K, and 1M
Genome-wide coverage
The percentage of known SNPs in the genome that are in LD with the genotyped
SNPs
Calculated based on HapMap
Calculated based on ENCODE
Genome coverage
Genome-wide coverage
Genome coverage of common SNPs (MAF ≥ 0.05)
Genome coverage of rare SNPs
Genome coverage using multi-markers
Pe’er, 2006
Genome coverage
Genome coverage for common SNPs (MAF ≥ 0.05)
Pe’er, 2006
Genome coverage
Genome coverage for common SNPs (MAF ≥ 0.05)
Genome coverage for common and rare SNPs
Pe’er, 2006
Genome coverage
Genome coverage of common SNPs (MAF ≥ 0.05)
Genome coverage of common and rare SNPs
Genome coverage using multi-markers
Pe’er, 2006
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Strategies for pre-association analysis
Quality control
Filter SNPs by genotype call rates
Filter SNPs by minor allele frequencies
Filter SNPs by testing for Hardy-Weinberg Equilibrium
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Evaluate whether there is an upward bias in association tests
Q-Q plot
All SNPs
Filter by call rate
Adjust for stratification
Clayton, 2006
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification
Genomic control
Correct for stratification by adjusting association statistics at each SNP by a uniform
overall inflation factor
Is susceptible to over or under adjustment
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification
Genomic control
Structure (STRUCTURE)
Used to assign the samples to discrete subpopulation clusters and then aggregate
evidence of association within each cluster
Estimate individual proportion of ancestry and treat it as a covariate
Computationally intensive when there are a large number of AIMs
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification
Genomic control
Structure (STRUCTURE)
Principal component analysis (EIGENSTRAT)
Identify several eigenvectors (ancestries or geographic regions)
Adjust genotypes and phenotypes along each eigenvector
Compute association statistics using adjusted genotypes and phenotypes
No need for AIMs
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Strategies for association analysis
Single SNP analysis using pre-specified genetic models
2 x 3 table (2-df)
Additive model (1-df), and test for additivity
All possible genetic models
Strategies for association analysis
Single SNP analysis using pre-specified genetic models
Haplotype analysis
Two-marker and three-marker slide
Multi-marker
Within haplotype block
Between two recombination hot spots
Strategies for association analysis
Single SNP analysis using pre-specified genetic models
Haplotype analysis
Gene-gene and gene-environment interactions
Interaction with main effect
Logistic regression
Interaction without main effect: data mining
Classification and recursive tree (CART)
Multifactor Dimensionality Reduction (MDR)
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Sample size and false positives
Estimate sample size
Sample size
OR
MAF
Type I error
Power
Quanto
Effective sample
size
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests
Bonferroni correction
Nominal significance level = study-wide significance / number of tests
Nominal significance level = 0.05/500,000 = 10-7
Effective number of tests
Take LD into account
Permutation procedure
Permute case-control status
Mimic the actual analyses
Obtain empirical distribution of maximum test statistic under null hypothesis
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests
False discovery rate (FDR)
Expected proportion of false discoveries among all discoveries
Offers more power than Bonferroni
Holds under weak dependence of the tests
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests
False discovery rate (FDR)
Bayesian approach
Taking a priori into account, False-Positive Report Probability (FPRP)
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Confirmation in independent study
populations
The above approaches may limit the number of false positives
Confirmation is needed to dissect true from false positives
Replication, examine the results from the 2nd stage only
Joint analysis, combining data from 1st stage with 2nd stage
Multiple stages
Replication vs. joint analysis
Skol, 2006
Multiple stages
1st stage
# of Risk SNPs
# of SNPs tested
# of true sig. SNPs (80% power)
# of total sig. SNPs (a = 0.01)
% of true sig. SNPs
2nd stage
3rd stage
20
16
13
500,000
5,016
63
16
13
10
5,016
63
10
0.38%
21%
100%
Important issues in genome-wide
association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Increase the magnitude of effects of a
specific gene
Increase their effects by focusing on a subset of study subjects
Cases with a uniform phenotype, e.g. aggressive or early onset
Study aggressive cases
0.20
Controls
0.18
Low grade
High grade
MAF of rs1447295
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Iceland
Sweden
Chicago
CGEMS
JHU
Increase the magnitude of effects of a
specific gene
Increase their effects by focusing on a subset of study subjects
Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Study cases with family history
Antoniou and Easton, 2003
Increase the magnitude of effects of a
specific gene
Increase their effects by focusing on a subset of study subjects
Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Controls that are disease free
Disease free controls
0.20
Controls
0.18
Low grade
High grade
MAF of rs1447295
0.16
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
Iceland
Sweden
Chicago
CGEMS
JHU
Increase the magnitude of effects of a
specific gene
Increase their effects by focusing on a subset of study subjects
Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Controls that are disease free
Increase their effects by studying a homogeneous population
Lower levels of genetic heterogeneity
Summary
GWA studies are promising but difficult
There are many important issues in GWA
The impact of these issues can be minimized by a well-
designed study