Transcript 36311

Genetics for Epidemiologists
National Human
Genome Research
Institute
National
Institutes of
Health
U.S. Department
of Health and
Human Services
Lecture 4: Genetic Association Studies
U.S. Department of Health and Human Services
National Institutes of Health
National Human Genome Research Institute
Teri A. Manolio, M.D., Ph.D.
Director, Office of Population Genomics and
Senior Advisor to the Director, NHGRI,
for Population Genomics
Topics to be Covered
• Case-control and cohort studies in genomic
research
• Candidate gene studies
• Genome-wide association studies
• Randomized/experimental designs
Collins FS, Nature 2004; 429:475-77.
Desirable Characteristics of Large US
Cohort Study
•
•
•
•
Large sample size
Full representation of US minority groups
Broad range of ages
Broad range of genetic backgrounds and
environmental exposures
• Family-based recruitment for at least part of
cohort to control for population stratification
• Broad array of clinical and laboratory data,
regular follow up for events, additional
exposure assessment
After Collins FS, Nature 2004; 429:475-77.
Desirable Characteristics of Large US
Cohort Study (continued)
• Technologically advanced dietary, lifestyle, and
environmental exposure data
• Collection and storage of biological specimens
• Sophisticated data management system
• Broad access to materials and data
• Goals should not be “hypothesis-limited”
• Comprehensive community engagement from
the outset
• State of the art (?dynamic) consent to allow
multiple uses of data and regular feedback
After Collins FS, Nature 2004; 429:475-77.
Larson, G. The Complete Far Side. 2003.
Manolio TA et al. Nature 2006; 7:812-820.
Willett WC et al. Nature 2007; 445:257-258.
Collins FS et al. Nature 2007; 445:259.
Pros and Cons of Case-Control Studies
Advantages
• May be the only way to study rare diseases
or those of long latency
• Existing records can occasionally be used if
risk factor data collected independent of
disease status
• Can study multiple etiologic factors
simultaneously
• May be less time-consuming and expensive
• If assumptions met, inferences are reliable
Pros and Cons of Case-Control Studies
Disadvantages
• Relies on recall or records for information
on past exposures; validation can be difficult
or impossible
• Selection of appropriate comparison group
may be difficult
• Multiple biases may give spurious evidence
of association between risk factor and
disease
• Usually cannot study rare exposures
• Temporal relationship between exposure
and disease can be difficult to determine
“But,” they say, “This Is Genetics!”
(you dumb epidemiologist)
“This Is Different!”
• Genes are measured the same way in cases and
controls
• Information on key exposure is easy to validate
• No recall or reporting involved
• Temporal relationship between genes and disease
is piece of cake
“BUT,” I SAY,
• Bias-free ascertainment of cases and controls is still
major concern; cases in most clinical series unlikely
to be representative
• Assessment of risk modifiers or gene-environment
interactions is likely to be incomplete or flawed
Appreciation of Weaknesses of Case-Control Studies
Larson, G. The Complete
Far Side. 2003.
http://www.mainlesson.com/
display.php3?author=treadw
ell&book=primer&story=chic
kenlittle
Candidate Genes
Genetic Studies in Unrelated Individuals
(pre-2005): Candidate Gene Studies
• Goal: characterize candidate genes and variants
related to disease
• Not typically intended to “find genes,” generally
begun after disease-related variants identified
• Assess generalizability of family-based
observations (genetic heterogeneity)
• Assess importance of allelic variation at
population level (PAR, penetrance)
• Identify modification of genetic association by
environmental factors (GxE interaction)
Population Studies of Genetic Variants:
Angiotensin I-Converting Enzyme (ACE)
Larsen: Williams Textbook of Endocrinology, 10th ed., 2003
ACE Gene Identification
• cDNA sequence determined for human
testicular ACE, identical from residue 27 to C
terminus to C-terminal domain of endothelial
ACE (Ehlers et al, PNAS 1989)
• Assigned to chromosome 17q23 by in situ
hybridization (Mattei et al, Cytogenet Cell
Genet 1989)
• Linked to elevated blood pressure in rat model
of hypertension (Jacob et al, Cell 1991)
• Mapped to human chromosome 17q22-q24
(Jeunemaitre et al, Nat Genet 1992)
ACE Gene Polymorphisms
• Insertion/deletion polymorphism identified through
restriction fragment length polymorphism (RFLP) analysis
• Two alleles results from 250-bp insertion in intron 16;
allele frequencies = 0.41 for I allele and 0.59 for D allele
• Accounted for 47% variance in serum ACE in 80 subjects
ACE Genotype
II
ID
DD
ACE (µg/L)
299 (49)
392 (67)
494 (88)
Ln-ACE (µg/L)
5.7 (0.2)
6.0 (0.2)
6.2 (0.2)
Rigat et al, J Clin Invest 1990; 86:1343-46.
Nature 1992; 359:641-44.
Genotype Frequency (%)
Frequency of ACE Genotypes in 1,304 MI
Cases and Controls
60
OR = 1.34, p = 0.007
50
40
30
20
10
0
197 200
309 390
104 104
DD
ID
II
Cases
Controls
Cambien et al, Nature 1992; 359:641-44.
Frequency of ACE Genotypes in 1,304 MI
Cases and Controls
Genotype Frequency (%)
Low Risk
80
70
60
50
40
30
20
10
0
High Risk
OR = 3.2 [1.7,5.9]
OR = 1.1 [0.9,1.5]
38 46
41 143
159 154
372 390
DD
ID/II
DD
DD/ID
Cases
Controls
Cambien et al, Nature 1992; 359:641-44.
Age-Adjusted Odds on Hypertension by
ACE ID/DD Genotype and Sex
DD
ID
II
Men: % HTN
53.1
45.8
44.4
Men: OR
1.67
1.19
1.00
Women: % HTN
43.3
41.8
44.4
Women: OR
1.01
0.80
1.00
after O’Donnell C et al, Circulation 1998; 97:1766-72.
P-value
0.004
0.15
Number of New, Significant Gene-Disease
Associations by Year, 1984 - 2000
Hirschhorn J et al, Genet Med 2002; 4:45-61.
Of 600 Gene-Disease Associations, Only 6
Significant in > 75% of Identified Studies
Disease/Trait
Gene
Polymorphism
DVT
F5
Arg506Gln
0.015
Graves’ Disease CTLA4
Thr17Ala
0.62
Type 1 DM
INS
5’ VNTR
0.67
HIV/AIDS
CCR5
32 bp Ins/Del
0.05-0.07
Alzheimer’s
APOE
Epsilon 2/3/4
0.16-0.24
CreutzfeldtJakob Disease
PRNP
Met129Val
Hirschhorn
al, Genet
GenetMed
Med2002;
2002;
4:45-61.
Hirschhorn JJ et
et al,
4:45-61.
Frequency
0.37
Reports For and Against Associations of
Variants with Carotid Atherosclerosis
Polymorphism
ACE I/D
APOE
AGT M235T
AGTR1 A1166C
MTHFR
PON1 Q192R
PON1 L55M
Present
Absent
Summary
13 with D; 1 with I
18
favors none
8 with ε4, 2 with ε2
9
equivocal
0
8
none
0
7
none
7 with T, 1 with non-T
8
equivocal
3 with R
10
none
5 with L (subgroups)
1
weak
NOS3 G894T
MMP3 -1516 5A/6A
1 with T
4
none
4 with 6A
0
association
IL6 G-174C
1 with G
3
none
Manolio et al, ATVB 2004; 24:1567-1577.
Summary Points: Candidate Gene
Studies
• Initial enthusiasm markedly damped by failure
to replicate
• Can probably find study or story that will fit
almost any candidate to any disease/trait
• Understanding of genome structure and
function, and of pathophysiologic mechanisms,
just too preliminary to project more than a
handful of plausible candidates at present
Larson, G. The Complete Far Side. 2003.
2007
2008 second
quarter
third
quarter
fourth
2005
quarter
first
2006
quarter
Manolio et al., J Clin Invest 2008; 118:1590-605.
2007: The Year of GWA Studies
Pennisi E, Science 2007; 318:1842-43.
Diseases and Traits with Published GWA
Studies (n = 53, 5/9/08)
• Macular Degeneration
• Exfoliation Glaucoma
•
•
•
•
•
Lung Cancer
Prostate Cancer
Breast Cancer
Colorectal Cancer
Neuroblastoma
•
•
•
•
Crohn’s Disease
Celiac Disease
Gallstones
Irritable Bowel Syndrome
•
•
•
•
•
•
•
QT Prolongation
Coronary Disease
Stroke
Hypertension
Atrial Fibrillation/Flutter
Coronary Spasm
Lipids and Lipoproteins
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Parkinson Disease
Amyotrophic Lat. Sclerosis
Multiple Sclerosis
Prog. Supranuclear Palsy
MS Interferon-β Response
Alzheimer’s Disease
Cognitive Ability
Memory
Restless Legs Syndrome
Nicotine Dependence
Methamphetamine Depend.
Neuroticism
Schizophrenia
Bipolar Disorder
Family Chaos
• Rheumatoid Arthritis
• Systemic Lupus
Erythematosus
• Psoriasis
• HIV Viral Setpoint
• Childhood Asthma
•
•
•
•
Type 1 Diabetes
Type 2 Diabetes
Diabetic Nephropathy
End-Stage Renal
Disease
• Obesity, BMI, Waist, IR
• Height
• Osteoporosis
•
•
•
•
F-Cell Distribution
Fetal Hgb Levels
C-Reactive Protein
18 groups of
Framingham Traits
• Pigmentation
• Uric Acid Levels
• Recombination Rate
NHGRI Catalog of GWA Studies:
http://www.genome.gov/gwastudies/
NHGRI Catalog of GWA Studies:
http://www.genome.gov/gwastudies/
•
•
•
•
•
•
•
•
•
•
•
First author/Data/Journal/Study
Disease/Trait
Initial Sample Size
Replication Sample Size
Region
Gene
Strongest SNP – Risk Allele
Risk Allele Frequency in Controls
P-value
OR per copy [95% CI]
Platform and SNPs passing QC
NHGRI Catalog of GWA Studies:
http://www.genome.gov/gwastudies/
•
•
•
•
•
•
•
•
•
•
•
First author/Data/Journal/Study
Disease/Trait
Initial Sample Size
Replication Sample Size
Region
Gene
Strongest SNP – Risk Allele
Risk Allele Frequency in Controls
P-value
OR per copy [95% CI]
Platform and SNPs passing QC
NHGRI Catalog of GWA Studies:
http://www.genome.gov/gwastudies/
What is a Genome-Wide Association Study?
• Method for interrogating all 10 million variable
points across human genome
• Variation inherited in groups, or blocks, so not
all 10 million points have to be tested
• Blocks are shorter (so need to test more
points) the less closely people are related
• Technology now allows studies in unrelated
persons, assuming 5,000 – 10,000 base pair
lengths in common (300,000 – 1,000,000
markers)
Association of Alleles and Genotypes of
rs1333049 (‘3049) with Myocardial Infarction
C
N (%)
G
N (%)
2,132 (55.4)
1,716 (44.6)
Controls 2,783 (47.4)
3,089 (52.6)
Cases
2
(1df)
P-value
55.1
1.2 x 10-13
2
(2df)
P-value
59.7
1.1 x 10-14
Allelic Odds Ratio = 1.38
CC
N (%)
Cases
CG
N (%)
GG
N (%)
586 (30.5)
960 (49.9)
378 (19.6)
Controls 676 (23.0)
1,431 (48.7)
829 (28.2)
Heterozygote Odds Ratio = 1.47
Homozygote Odds Ratio = 1.90
Samani N et al, N Engl J Med 2007; 357:443-453.
Nicotine Dependence among Smokers
Bierut LJ et al, Hum Molec Genet 2007; 16:24-35.
P Values of GWA Scan for Age-Related
Macular Degeneration
Klein et al, Science 2005; 308:385-389.
Genome-Wide Scan for QTc Interval
Arking D et al, Nat Genet 2006; 38:644-651.
Genome-Wide Scan for Type 2 Diabetes in a
Scandinavian Cohort
http://www.broad.mit.edu/diabetes/scandinavs/type2.html
Wellcome Trust Genome-Wide Association
Study of Seven Common Diseases
WTCCC, Nature 2007; 447:661-678.
“There have been few, if any, similar bursts of
discovery in the history of medical research…”
Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439.
Lessons Learned from Initial GWA Studies
Signals in Previously Unsuspected Genes
Macular Degeneration
CFH
Coronary Disease
CDKN2A/2B
Childhood Asthma
ORMDL3
Type II Diabetes
CDKAL1
QT interval prolongation
NOS1AP
Signals in Gene “Deserts”
Prostate Cancer
8q24
Crohn’s Disease
5p13.1, 1q31.2, 10p21
Signals in Common
Diabetes, CHD, Melanoma, Frailty
CDKN2A/2B
Prostate, Breast, Colorectal Cancer
8q24 region
Crohn’s Disease, Psoriasis
IL23R
Crohn’s Disease, T1DM
PTPN2
Rheumatoid Arthritis, T1DM
PTPN22
Unique Aspects of GWA Studies
• Permit examination of inherited genetic variability at
unprecedented level of resolution
• Permit "agnostic" genome-wide evaluation
• Once genome measured, can be related to any trait
• Most robust associations in GWA studies have not
been with genes previously suspected of
association with the disease
• Some associations in regions not even known to
harbor genes
“The chief strength of the new approach also contains its
chief problem: with more than 500,000 comparisons per
study, the potential for false positive results is
unprecedented.”
Hunter DJ and Kraft P, N Engl J Med 2007; 357:436-439.
Larson, G. The Complete Far Side. 2003.
Ways of Dealing with Multiple Testing
• Bonferroni correction: most common, typically
p < 10-7 or 10-8
• False discovery rate: proportion of significant
associations that are actually false positives
• False positive report probability: probability
that the null hypothesis is true, given a
statistically significant finding
• Replication, replication, replication
Chanock S, Manolio T, et al, Nature 2007; 447:655-660.
Replication Strategy for Prostate Cancer
Study in CGEMS
Initial Study
1,150 cases / 1,150 controls
>500,000 Tag SNPs
Replication Study #1
3,000 cases / 3,000 controls
~24,000 SNPs
Replication Study #2
2,400 cases / 2,400 controls
~1,500 SNPs
Replication Study #3
2,500 cases / 2,500 controls
200+ New
ht-SNPs
Hoover R, Epidemiology 2007; 18:13-17.
25-50 Loci
Replication, Replication, Replication
Initial study: Sufficient description to permit replication
• Sources of cases and controls
• Participation rates and flow chart of selection
• Methods for assessing affected status
• Standard “Table 1” including rates of missing data
• Assessment of population heterogeneity
• Genotyping methods and QC metrics
Replication study:
• Similar population, similar phenotype
• Same genetic model, same SNP, same direction
• Adequately powered to detect postulated effect
Chanock S, Manolio T, et al, Nature 2007; 447:655-660.
Replication Strategy in Easton Breast
Cancer Study
•
•
•
•
•
•
Stage
1
Cases
408
Controls
400
SNPs
266,722
2
3
Final
3,990
23,734
3,916
23,639
13,023
31
6
ABCFS
BCST
COPS
GENICA
HBCS
HBCP
•
•
•
•
•
•
TBCS
KConFab/AOCS
KBCP
LUMCBCS
MCBCS
MCCS
•
•
•
•
•
•
Easton et al, Nature 2007; 447:1087-93.
MEC-W
MEC-J
NHS
PBCS
RBCS
SASBAC
•
•
•
•
•
•
SEARCH2
SEARCH3
SBCP
SBCS
CNIOBCS
USRT
Larson, G. The Complete Far Side. 2003.
Replication Strategy in CGEMS Prostate
Cancer Study
Stage
Cases
Controls
SNPs
1
1,172
1,157
527,869
2
3,941
3,964
26,958*
* Selected for p < 0.068
SNP
Gene
Stage 1+2
P-value
rs4962416
MSMB
7 x 10-13
24,223
0.042
rs10896449
11q13
2 x 10-9
2,439
0.004
rs10993994
CTBP2
2 x 10-7
319
4 x 10-4
rs10486567
JAZF1
2 x 10-6
24,407
0.042
Thomas et al, Nat Genet 2008; 40:310-15.
Initial
Rank
Initial
P-value
Genome-wide Association and Cohort Studies
Cupples et al, BMC Med Genet 2007; 8(Suppl 1):S1
Ridker et al, Clin Chem 2008; 54:249-55.
http://www.ncbi.nlm.nih.gov/sites/entrez
Genetic Association and Clinical Trials
7/2000
6/2006
1/2008
Genome-wide Association and Clinical
Trials
5/2007
HLA DRB1*0701 and Transaminase
Elevations Following Ximelegatran Treatment
Kindmark et al, Pharmacogen J 2007; May 15 (on-line)
Genome-wide Association and Clinical Trials
5/2007
1/2008
Summary Points: Genetic Association
Studies
• Candidate gene studies enormously prone to
spurious associations
• GWA presents new paradigm, is unconstrained
by current imperfect understanding of genome
structure and function
• Initial findings astoundingly positive
• Most are skimming surface of what could be
learned
• GWA beginning to be applied to cohort studies
• Very little work in genetic association in clinical
trials and treatment response
Larson, G. The Complete Far Side. 2003.