Replication Studies on Genome

Download Report

Transcript Replication Studies on Genome

An Introduction to
Genome-Wide Association Studies
Shen-Chih Chang, PhD
Epi 295
Oct 2, 2009
What is a genome-wide association study?

Any study of genetic variation across the entire human genome
designed to identify genetic association with a disease.

It usually refers to studies with genetic markers of 100,000 or
more to represent a large proportion of variation in the human
genome.

It allows for efficient and comprehensive analysis of common
genomic variation to be conducted without priori hypotheses
based on gene function or disease pathways.

It requires very large series of cases and controls to ensure
adequate statistical power, and multiple subsequent studies to
confirm the initial findings.
Why are such studies possible now?

With the completion of the Human Genome
Project in 2003 and the International HapMap
Project in 2005, researchers now have a set of
research tools that make it possible to find the
genetic contributions to common diseases.

The tools include
◦ computerized databases that contain the reference
human genome sequence
◦ a map of human genetic variation
◦ a set of new technologies that can quickly and
accurately analyze whole-genome samples.

Molecular Epidemiology Before/After
GWAS
Before GWAS
◦
◦
◦
◦

A few markers at a time
Gene functions
Disease pathways
Biological mechanisms
Association studies
After GWAS
◦ Hundreds of thousands of markers at a time
◦ Gene hunting
Gene
functions
◦ Association Studies
Disease pathways
Biological mechanisms
Terms Frequently Used in GWAS

Single-nucleotide polymorphism (SNP)
◦ DNA sequence variation resulted from a single-base
substitution.
◦ Most common form of genetic variation in the genome.

Nonsynonymous SNP
◦ A polymorphism which results in a change in the amino acid
sequence of a protein (and therefore may affect the
function of the protein)

Alleles
◦ Alternative DNA sequences at the same physical gene
locus, which may or may not result in different phenotypic
traits.
*Adopted from Pearson, et al, 2008.
Terms Frequently Used in GWAS
(continued)

Minor allele
◦ The allele of a bi-allelic polymorphism that is less frequent
in the study population.

Minor allele frequency (MAF)
◦ Proportion of the less common allele in a population,
ranging from less than 1% to 50%.

Haplotype
◦ A group of specific alleles at neighboring genes or markers
that tend to be inherited together

Terms Frequently Used in GWAS
(continued)
Linkage disequilibrium (LD)
◦ Associations between 2 alleles located near each other on a
chromosome, such that they are inherited together more
frequently than expected by chance.

Tag SNP
◦ A SNP which is in strong LD with other SNPs so that it can
serve as a proxy for these SNPs. So we only need to genotype
the Tag SNPs which can represent the variation of the gene(s).

Hardy-Weinberg equilibrium (HWE)
◦ The population distribution of two alleles (with frequencies p
and q) is stable from generation to generation.
◦ Genotypes occur at frequencies of p2, 2pg, and q2 for the major
allele homozygote, heterozygote, and minor allele homozygote,
respectively.
Other types of common DNA
variation
Deletion/Insertion
 Copy number variation

Single Nucleotide Polymorphism
TTCAGTCAGATCCTAGCCC
Person 1
AAGTCAGTCTAGGATCGGG
TTCAGTCAGATCCCAGCCC
AAGTCAGTCTAGGGTCGGG
SNP
Person 2
Insertion/Deletion
TTCAGTCAGATCCTAGCCC
Person 1
AAGTCAGTCTAGGATCGGG
TTCAGTCAGATCCCTAGCCC
AAGTCAGTCTAGGGATCGGG
Insertion/Deletion
Person 2
Copy Number Variation
TTCACAGCAGCAGCAGAGCCC
Person 1
AAGTGTCGTCGTCGTCTCGGG
TTCACAGCAGCAGAGCCC
AAGTGTCGTCGTCTCGGG
3 vs. 4 trinucleotide repeats
Person 2
Genotyping in GWA Studies
GWA studies rely on LD information.
 Usually at least 1 SNP within a group of
SNPs with high LD (r2 ≥ 0.8) will be
included on the platform.
 Genotyping platforms comprising 500,000
to 1,000,000 SNPs have been estimated
to capture:

◦ 67% to 89% of common SNP variation in
European and Asian ancestry
◦ 46% to 66% in African ancestry
Genotyping Platforms

Affymetrix
◦ Genome-Wide Human SNP Array 6.0
 More than 906,600 SNPs
 More than 946,000 copy number probes

Illumina
◦ HumanAmni1-Quad /Human1M-Duo
 ~ 1 million markers
◦ Human660W_Quad
 ~650,000 markers
◦ HumanCytoSNP-12
 ~ 300,000 markers
Quality Control in GWA Studies

Certain thresholds should be set up to
ensure genotyping quality:
◦
◦
◦
◦
the SNP call rate, typically > 95%
the minor allele frequency, typically > 1%
violations of Hardy Weinberg equilibrium
concordance rates in duplicate samples,
typically > 99.5%.
Study Design in GWA Studies

A multistage approach can reduce the amount of
genotyping required, without sacrificing power.

In stage 1, a full set of SNPs is genotyped in a fraction of
samples, and a p-value threshold is used to identify a subset
of SNPs with putative associations.

In the second and possibly third stages, the SNPs identified
from the first stage are re-tested in populations that are
larger or of a similar size.

The replication results can be used to distinguish the few
true-positive associations identified in stage 1 from the
many false-positive results that occur by chance.
Multistage Study Designs
Joel N. Hirschhorn & Mark J. Daly
Nature Reviews Genetics 6, 95-108, 2005
Examining GWAS DATA

Population stratification (population sturcture)
◦ Refers to confounding in genetic association studies caused by
genetic differences between cases and controls unrelated to
disease but due to sampling them from populations of different
ancestries.

Quantile-Quantile plot (Q-Q plot)
◦ In a GWAS, a Q-Q plot can be used to assess the magnitude of
population stratification.
◦ Observed association statistics (χ 2 or -log10 P values) are ranked
from smallest to largest on the y-axis and plotted against the
distribution of what would be expected under the null
hypothesis of no association on the x-axis.
◦ Deviations from the identity line suggest either a very highly
associated locus or significant differences in population structure.
Hypothetical Quantile-Quantile Plots in Genome-wide Association Studies
The sharp deviation above
an expected value of
approximately 8 could be
due to a strong association
of the disease with SNPs in
a heavily genotyped region.
Copyright restrictions may apply.
Inflation of observed
statistics is more likely due
to population structure than
disease susceptibility genes.
Pearson, T. A. et al. JAMA 2008;299:1335-1344.
Analyzing GWA Studies

Different genetic model
◦ Additive model: each copy of the allele is assumed
to increase risk by the same amount.
◦ Dominant model: rare allele carriers compared to
homozygotes of the common allele
◦ Recessive model: homozygotes of the rare allele
compared to common allele carriers

Correction of multiple comparison
◦ Bonferroni correction has been the most
commonly used correction in GWAS:
◦ A threshold of P = 0.05/number of tests
performed
Selected Genome-Wide
Association Studies
On Lung Cancer
deCODE Genetics, Iceland
Population: European descendant
Outcome: smoking quantity
First Stage – discovery
10,995 Icelandic smokers
 Infimium HumanHap300 SNP chips (Illumina)
 306,207 SNPs
 Significance threshold = 2 x 10-7 ≈ 0.05/306207

•Identified allele T of rs1051730 most strongly
associated with smoking quantity (P = 5 x 10-16)
•On chromosome 15q24
•Within the CHRNA3 gene in a linkage
disequilibrium block containing CHRNA5 and
CHRNB4 (encode nicotinic acetylcholine
receptors)
Second Stage - replication



Genotyping rs1051730 on additional samples
(Centarus)
523 smokers from Spain
1,375 smokers from The Netherlands
Third Stage - expansion
association with lung cancer and
peripheral arterial disease
For lung cancer: three case-control studies from
Iceland, Spain, and The Netherlands
 For peripheral arterial disease: five case-control studies
from Iceland, New Zealand, Austria, Sweden, and Italy

*Adjusted on sex and year of birth.
IARC Central Europe lung cancer study
European descendant
Outcome: lung cancer
First Stage - discovery
1,926 lung cancer cases/2,522 controls
 Illumina Sentrix HumanHap300 BeadChip (Illumina)
 310,023 SNPs (≈ 80% of common genomic
variation)
 Significance threshold = 5 x 10-7

•Identified two SNPs on chromosome 15q25,
•rs1051730 (P = 5 x 10-9)
•Allelic ORadj, 1.30 (1.19-1.43)
•rs8034191 (P = 9 x 10-10)
•Allelic ORadj, 1.32 (1.21-1.45)
Second Stage - expansion

Genotyping 34 additional 15q25 SNPs (Taqman):
◦ SNPs with an association P-value of < 10-6 from Center
d’Etude du Polymorphism Humain Utah (CEU)
HapMap (using an imputation method)
◦ SNPs of CHRNA5 and CHRNA3 from previous
studies on nicotine dependence
◦ All non-synonymous SNPs in dbSNP from the six
genes within or near the association region

Findings:
◦ 23 showed evidence of association exceeding the
genome-wide significance level
◦ Strong LD region
Third Stage - confirmation


Genotyping rs8034191 (from the first panel) and
rs16969968 (from the second panel)
In five independent studies of lung cancer
◦ the European Prospective Investigation in Cancer and Nutrition
(EPIC) cohort study (781 cases/1,578 controls)
◦ the Beta-Carotene and Retinol Efficacy Trial (CARET) cohort
study (764 cases/1,515 controls)
◦ the Health Study of Nord-Trondelag (HUNT) and Tromso cohort
studies (235 cases/392 controls)
◦ the Liverpool lung cancer case-control study (403 cases/814
controls)
◦ the Toronto lung cancer case-control study (330 cases/453
controls)
Third Stage - replication
Findings:
•An increased risk for both heterozygous and homozygous
carriers was observed in all five replication samples.
•Two SNPs are in high LD (D’ = 1.00, r2 = 0.92).
•An increased risk was also observed among non-smokers.
Fourth Stage – expansion
association with head and neck cancer

Genotyping rs8034191 in two separate studies of
head and neck cancer in Europe
◦ Five of the six countries in the original GWAS,
overlapping with the lung cancer controls (726
cases/694 controls)
◦ The ARCAGE study with eight countries in Europe
(1,536 cases/1,443 controls)

Findings:
◦ No association with HNC was observed.
◦ No evidence of an association with nicotine addiction.
MD Anderson
European descendant
Outcome: lung cancer
First Stage - discovery
1,154 lung cancer cases/1,137 controls (all smokers)
 Illumina HumanHap300 v1.1 BeadChips (Illumina)
 315,860 SNPs
 Significance threshold = 4.9 x 10-5 (the least
significant result among the ten SNPs retained for
follow-up)

Choose 10 SNPs with the most significant
associations
Second Stage - replication

Genotyping 10 most significant associations
from the discovery phase (Taqman) in two
studies:
◦ Texas (711 cases/632 controls)
◦ UK (2,013 cases/3,062 controls)

Findings:
◦ Elevated risks were replicated in 2/10 SNPs
 rs1051730
 rs8034191
 Two SNPs are in High LD (r2 > 0.8)
*Adjusted on age, sex, packyears, and center
A Catalog of Published GenomeWide Association Studies

http://www.genome.gov/26525384#1
GWA Replication Studies
Lung
Cancer
Bladder Cancer
Head and Neck Cancer
• Manuscript submitted to JNCI
• Organizer: International Lung Cancer Consortium (ILCCO)
• Participants: 21 studies (11,645 cases/14,954 controls)
• 9 from North America
• 8 from Europe
• 4 from Asia
Table 1 – Summary of the participating studies from ILCCO
Ref.
16
Coordinating institute
Study location
MD Anderson Cancer Center
Karmanos Cancer Institute (KCI), Wayne
state University
University of Hawaii
Mayo clinic
Norris Cotton Cancer Center, Darmouth
Medical School (NELCS study)
Texas, US
Penn State University College of Medicine
18 University of California, Los Angeles (UCLA)
University of California, San Francisco
13
(UCSF)
17
National Institute of Occupational Health
University of Sheffield
10
12
14
15
8
6
6
INSERM U794
Helmholtz Center Munich
University of Göttingen Medical School
German Cancer Research Center (DKFZ)
German Cancer Research Center (DKFZ)
German Cancer Research Center (DKFZ)
University Hospital of Cologne
Division of Medical Oncology, University
Hospital
Radboud University Nijmegen Medical
Centre
CHS National Cancer Control Center at
Carmel Medical Center and Technion
18 University of California, Los Angeles (UCLA)
University of Hawaii
9
11
Recruitment
period
Whites
Michigan, US
1988-2007
Hawaii, US
Minnesota, US
1992-1997
1997-2006
New Hampshire, US
2005-2008
Eligibility
Control source
Only ever smokers
Hospital
709
629
Population
575
860
Population
Hospital
138
1,644
175
1,021
Population
228
162
447
733
319
581
1,804
558
443
436
114
133
135
146
1,839
3,336
198
450
203
327
Hospital
350
1,227
Population
396
2,068
Population
212
197
Population
Population
Population and
hospital
Hospital
Hospital
Hospital
Total
58
100
53
170
65
98
271
484
716
11,645
276
813
716
14,954
26-79 years old
18-79 years-old, within 1 year
2000-2003 diagnosis, no previous cancer
Community
history
California, US
1999-2004
18-65 years old
Population
1998-2003,
Population and
San Francisco, US
>18 years old
2005-2009
community
Current smokers or quit
Norway
1986-2005
Population
smoking <5 years
Diagnosed under age 61 or Population recruited
United Kingdom
2005-2009
reported family history
through family
France
Only ever smokers
Munich, Göttingen, Germany 2000-2008
LUCY 18-51 years old
Munich, Germany
1990-1998
INRA all ages
Population
Heidelberg, Germany
1997-2007
DKFZ from 18 years old
Heidelberg, Germany
1994-1998
EPIC from 18 years old
Saarland, Germany
2000-2003
50-75 years old
Population
Cologne, Germany
2005-2008
Hospital
Florida, US
Zaragoza, Spain
2006-2008
Netherlands
2008
Haifa, Israel
2007-2009
California, US
Hawaii, US
Asians
1999-2004
1992-1997
Samuel Lunenfeld Research Institute
Ontario, Canada
1997-2002
Seoul National University
National University of Singapore
Aichi Cancer Center
Korea
Singapore
Japan
2001-2008
2005-2007
2000-2005
18-75 years old
18-65 years old
26-79 years old
Residents of Greater Toronto
Area
Hospital
Only women
20-79 years old
* Maximum number of cases and controls of European and Asian ethnic groups with DNA
Cases* Controls*
SNPs selected
Gene Location
15q25
6p
5p15
Gene
Name
Cholinergic receptor,
nicotinic, alpha 5
Cholinergic receptor,
nicotinic, alpha 3
Similar to RIKEN
cDNA C630028N24
gene
Nucleolar protein 5B
pseudogene
(Unknown)
Cisplatin resistance
related protein
CRR9p
Gene
Symbol
CHRNA5
LOC123688
SNP ID
rs16969968
rs931794
rs12914385
rs1317286
rs8034191
NOL5BP
rs4324798
G>A
0.00 (A)
0.11 (A)
(Unknown)
CLPTM1L
rs2256543
rs402710
C>T
IVS16+9G>A
0.17 (T)
0.27 (A)
0.43 (T)
0.33 (A)
Telomerase reverse
transcriptase
TERT
rs2736100
IVS2-3777G>T
0.41 (G)
0.45 (T)
CHRNA3
Amino Acid
Change
Asp398Asn
Rare allele freq*
Han
Chinese
European
0.03 (A)
0.42 (A)
0.31 (G)
0.43 (G)
0.25 (A)
0.43 (A)
0.08 (C)
0.41 (C)
0.04 (C)
0.43 (C)
Nucleotide
Change
Ex5-54G>A
-31881G>A
IVS4-4117G>A
T>C
IVS2+256T>C
Gene Function
Nicotinic acetylcholine receptors are
members of a superfamily of ligandgated ion channels that mediate fast
signal transmission at synapses.
A hypothetical gene
CLPTM1L is a predicted
transmembrane protein that is
expressed in a range of normal and
malignant tissues including skin,
lung, breast, ovary and cervix.
The enzyme consists of a protein
component with reverse
transcriptase activity, encoded by
this gene, and an RNA component
which serves as a template for the
telomere repeat.
*From HapMap
• For Caucasians: two variants in the 15q25 locus (rs8034191 and rs16969968),
two variants in 5p15 (rs402710 and rs2736100), and two variants in 6p21
(rs4324798 and rs2256543).
• For Asians: three additional variants were selected in the 15q25 region
(rs12914385, rs1317286 and rs931794) and the variants in 6p21 were not
genotyped based on their low prevalence in these populations
Methods

Study population
◦ The control group was frequency matched to cases on age and sex in
most of the studies.
◦ Some other studies also matched on ethnicity, residence or smoking
status
◦ Only Whites or Asians were included

Genotyping method
◦ Genotyping was performed locally using Taqman probes supplied
centrally from IARC
◦ Toronto and France studies used data from the Illumina HumanHap300
BeadChip
◦ German Multicentre and Saarland study used Sequenom’s iPLEX assay
◦ Spain and Netherlands studies used the Centaurus (Nanogen) platform
Method (continued)
• Quality control
◦ 90 standard DNAs were used as inter-lab control
◦ The study was excluded from the analysis of a variant if more
than one discrepancy for that variant was found
◦ Average call rates per SNP: 97.1% to 99.6%
◦ No deviation from HWE was observed (P cutoff = 0.0005,
considering 100 independent tests)

Statistical analysis
◦ Unconditional logistic regression to estimate ORs and 95%
CIs
◦ Adjusted on age, sex, center, and smoking packyears
◦ Cochran’s Q test for heterogeneity
◦ SAS 9.1 software was used
Table 2 – Distribution of selected demographic variables by ethnic group
Whites
Asians
cases
controls
cases
controls
%
%
%
%
n
n
n
n
Sex
Male
Female
5,741
4,210
57.7
42.3
7,325
5,503
57.1
42.9
838
856
49.5
50.5
902
1,224
42.4
57.6
Age (years)
<50
50-59
60-69
70-79
≥80
1,252
2,499
3,273
2,451
476
12.6
25.1
32.9
24.6
4.8
2,969
3,443
3,859
2,310
247
23.1
26.8
30.1
18.0
1.9
182
426
565
443
78
10.7
25.1
33.4
26.2
4.6
243
492
679
612
100
11.4
23.1
31.9
28.8
4.7
Smoking status
Never
Former smoker
Current smoker
Ex or current
Missing
962
4,125
4,644
134
86
9.7
41.4
46.7
1.3
0.9
4,136
4,491
3,173
455
573
32.2
35.0
24.7
3.6
4.5
674
461
526
23
10
39.8
27.2
31.1
1.4
0.6
1,270
470
308
20
58
59.7
22.1
14.5
0.9
2.7
Histology
Adenocarcinoma
Squamous cell
Large cell
Small cell
Other or not specified
3,892
2,370
413
1,235
2,041
39.1
23.8
4.2
12.4
20.5
329
317
96
109
243
30.1
29.0
8.8
10.0
22.2
Total
9,951
12,828
1,694
2,126
Table 3 – Summary estimates of the main effects of the selected variants in White and Asian ethnic groups
Variants
Risk
allele
Allele
freq.
Cases
ref/het./hom.
Controls
ref/het./hom.
Whites
Chr 15q25
rs16969968
A
0.35
3,371 / 4,523 / 1,484
4,827 / 5,019 / 1,373
rs8034191
G
0.35
2,586 / 3,488 / 1,185
4036 / 4,256 / 1,171
Chr 5p15
rs2736100
C
0.51
1,878 / 4,526 / 2,722
2,853 / 5,817 / 3,142
rs402710
G
0.65
873 / 3,847 / 4,140
1,115 / 4,178 / 3,905
Chr 6p
rs2256543
A
0.43
2,898 / 4,519 / 1,803
3,860 / 5,813 / 2,260
rs4324798
A
0.08
8,066 /1,630 / 111
10,580 / 1,911 / 99
Asians
Chr15q25
rs16969968
A
0.03
1,591 / 98 / 2
1,986 / 125 / 5
rs8034191
G
0.03
1,583 / 104/ 3
1,992 /122 / 3
rs12914385
T
0.30
728 / 647/ 148
584 / 762 / 177
rs1317286
G
0.10
1,223 / 291/ 13
1,521 / 313 / 22
rs931794
G
0.37
591 / 721/ 213
764 / 828 / 264
Chr5p15
rs2736100
C
0.39
538 / 836 / 312
775 / 1,014 / 312
rs402710
G
0.68
144 / 694 / 842
219 / 917 / 981
Ref.: reference class; Het.: Heterozygote; Hom.: Homozygote for the risk allele
Risk allele frequencies are calculated among controls
ORs are adjusted on age, sex, study
Heterozygotes
OR (95% CI)
Homozygotes
OR (95% CI)
Per allele
OR (95% CI)
p-trend
p-heterogeneity
(by study)
1.33 (1.24-1.41)
1.33 (1.24-1.43)
1.54 (1.41-1.69)
1.62 (1.47-1.79)
1.26 (1.21-1.32)
1.29 (1.23-1.35)
2.10-26
6.10-25
0.09
0.15
1.16 (1.07-1.25)
1.16 (1.04-1.28)
1.32 (1.21-1.43)
1.30 (1.18-1.45)
1.15 (1.10-1.20)
1.14 (1.09-1.19)
1.10
-8
5.10
-10
0.60
0.73
1.03 (0.96-1.10)
1.04 (0.96-1.12)
1.07 (0.98-1.16)
1.39 (1.04-1.87)
1.03 (0.99-1.08)
1.07 (0.99-1.14)
0.14
0.07
0.92
0.11
0.98 (0.75-1.30)
1.06 (0.81-1.40)
1.05 (0.91-1.21)
1.18 (0.99-1.41)
1.12 (0.96-1.29)
0.44 (0.08-2.31)
1.06 (0.21-5.36)
1.04 (0.81-1.32)
0.73 (0.36-1.47)
1.01 (0.82-1.25)
0.94 (0.73-1.23)
1.06 (0.82-1.37)
1.03 (0.93-1.14)
1.10 (0.94-1.30)
1.03 (0.93-1.14)
0.67
0.66
0.58
0.23
0.54
0.07
0.09
0.10
0.10
0.10
1.24 (1.07-1.43)
1.15 (0.91-1.46)
1.51 (1.24-1.83)
1.32 (1.05-1.66)
1.23 (1.12-1.35)
1.15 (1.04-1.27)
2.10-5
0.007
0.32
0.22
•Among whites:
•Increased risk was observed for two SNPs on Chr15, and
two SNPs on Chr5
•Among Asians:
•Increased risk was observed for two SNPs on Chr5
Figure 1 – Stratified analysis for rs16969968 (Chr 15) in Whites
CHRNA5 (rs16969968)
Ca
Co
OR
95%CI
5019 1.33 1.24-1.41
1373 1.54 1.41-1.69
Heterozygous
Homozygous
4523
1484
Co-dominant
9378 11219 1.26 1.21-1.32
By histology (p-heterogeneity=0.38)
3776 11219 1.22
Adenocarcinomas
2128 11219 1.31
Squamous
384 10896 1.23
Large cell
1106 10597 1.21
Small cell
1.15-1.29
1.23-1.41
1.06-1.43
1.10-1.33
By smoking status (p-heterogeneity=0.0001)
922 3706 1.02 0.91-1.14
Never smokers
3994 4181 1.27 1.18-1.37
Former smokers
4277 2875 1.39 1.29-1.50
Current smokers
By packyears (p-heterogeneity=0.75)
453 1591 1.19
>0-<10 packyears
771 1271 1.22
10-<20 packyears
1076 1165 1.22
20-<30 packyears
1373 1057 1.28
30-<40 packyears
691 1.11
1257
40-<50 packyears
400 1.29
919
50-<60 packyears
693 1.14
2069
>=60 packyears
0.99-1.43
1.05-1.42
1.07-1.39
1.13-1.46
0.97-1.29
1.07-1.56
1.00-1.30
By age (p-trend=0.002)
1177
<50
2356
50-60
3095
60-70
2750
>=70
1.34-1.66
1.21-1.42
1.11-1.29
1.06-1.27
2323
3084
3530
2282
1.49
1.31
1.19
1.16
By gender (p-heterogeneity=0.88)
5264 6445 1.26 1.19-1.33
Men
4114 4774 1.27 1.19-1.35
Women
•No association among never smokers1.0
•Stronger associations in current
smokers than in former smokers
1.2
OR
1.4
1.6
Table 4 – Association between rs16969968 and smoking intensity expressed in cigarettes per day in
White ethnic group Means are adjusted by age, sex, study and case/control status when appropriate
All
n
rs16969968 (CHRNA5)
mean
Controls
CI 95%
n
mean
Cases
CI 95%
n
mean
CI 95%
GG
5425
20.74 20.36-21.12
2610
17.99 17.45-18.53
2815
22.68 22.14-23.22
GA
6597
21.85 21.49-22.20
2701
19.20 18.67-19.78
3896
23.70 23.22-24.18
AA
2039
23.48 22.92-24.04
746
20.56 19.68-21.44
1293
25.56 24.84-26.28
p-trend
7.10-19
6.10-9
5.10-12
• The mean of cigarettes per day was higher among
homozygous carriers for the risk allele compared to carriers
of the common allele.
Figure 2 – Stratified analysis for rs2736100 and rs402710 (Chr 5) in Whites and Asians
Whites
TERT (rs2736100) Ca
Co OR
Asians
TERT_rs2736100
95%CI
Ca
Co OR
95%CI
Heterozygous
Homozygous
4526
2722
5817 1.16 1.07-1.25
3142 1.32 1.21-1.43
Heterozygous
Homozygous
836 1014 1.24 1.07-1.43
312 312 1.51 1.24-1.83
Co-dominant
9162 11812 1.15 1.10-1.20
Co-dominant
1686 2101 1.23 1.12-1.35
By histology (p-heterogeneity=0.01)
Adenocarcinomas 927 2101 1.32 1.18-1.48
Squamous
317 2101 0.93 0.78-1.12
Large cell
94 2101 1.17 0.87-1.59
Small cell
109 2101 1.00 0.75-1.33
By histology (p-heterogeneity=0.0002)
Adenocarcinomas 3551 11666 1.20 1.13-1.27
Squamous
2162 11812 1.06 0.99-1.13
Large cell
405 11344 1.33 1.15-1.54
Small cell
1205 11733 1.00 0.92-1.09
By smoking status (p-heterogeneity=0.75)
Never smokers
671 1264 1.27 1.10-1.46
Former smokers
458 454 1.24 1.02-1.51
Current smokers
524 305 1.15 0.93-1.42
By smoking status (p-heterogeneity=0.44)
Never smokers
934 3972 1.22 1.09-1.35
Former smokers 3699 4095 1.14 1.07-1.23
Current smokers 4309 2760 1.12 1.04-1.20
By age (p-heterogeneity=0.63)
<50
181 242 1.24
50-60
423 487 1.20
60-70
562 674 1.34
>=70
520 698 1.15
By age (p-heterogeneity=0.13)
<50
1192 2708 1.18 1.06-1.31
50-60
2327 3234 1.24 1.15-1.35
60-70
2986 3536 1.11 1.04-1.20
>=70
2657 2334 1.09 1.00-1.20
0.92-1.67
0.99-1.44
1.14-1.59
0.98-1.37
By gender (p-heterogeneity=0.03)
Men
834 886 1.10 0.96-1.26
Women
852 1215 1.35 1.19-1.54
By gender (p-heterogeneity=0.02)
Men
5288 6758 1.10 1.05-1.17
Women
3874 5054 1.22 1.14-1.30
0.8
0.8
1.0
1.2 1.4
1.0
1.2 1.4
1.8
1.8
OR
OR
• More important in adenocarcinomas and large cell
carcinomas.
•Stronger association in women (no heterogeneity by gender
was observed in adenocarcinoma analysis only)
Figure 2 – Stratified analysis for rs2736100 and rs402710 (Chr 5) in Whites and Asians
(continued)
Whites
CLPTM1L (rs402710)Ca
Co OR
Asians
95%CI
Heterozygous
Homozygous
3847 4178 1.16 1.04-1.28
4140 3905 1.30 1.18-1.45
Co-dominant
8860 9198 1.14 1.09-1.19
CLPTM1L_rs402710 Ca
Co OR
Heterozygous
Homozygous
917 1.15 0.91-1.46
981 1.32 1.05-1.66
Co-dominant
694
842
95%CI
1680 2117 1.15 1.04-1.27
By histology (p-heterogeneity=0.58)
Adenocarcinomas
921 2117 1.20 1.06-1.36
Squamous
316 2117 1.15 0.95-1.39
Large cell
95 2117 1.07 0.78-1.46
Small cell
109 2117 0.98 0.73-1.31
By histology (p-heterogeneity=0.03)
Adenocarcinomas 3583 9052 1.18 1.11-1.26
Squamous
2004 9198 1.15 1.07-1.25
Large cell
364 8731 1.21 1.03-1.42
Small cell
1102 8571 1.00 0.91-1.10
By smoking status (p-heterogeneity=0.67)
Never smokers
663 1263 1.08 0.92-1.25
Former smokers
460 469 1.20 0.98-1.47
Current smokers
525 307 1.07 0.87-1.33
By smoking status (p-heterogeneity=0.27)
Never smokers
845 3070 1.15 1.01-1.30
Former smokers
3617 3155 1.20 1.11-1.30
Current smokers
4217 2547 1.09 1.01-1.18
By age (p-heterogeneity=0.61)
<50
182 241 1.01 0.75-1.35
50-60
420 491 1.19 0.97-1.45
60-70
562 676 1.24 1.04-1.48
>=70
516 709 1.09 0.92-1.31
By age (p-heterogeneity=0.51)
<50
1144 2108 1.12 1.00-1.25
50-60
2202 2642 1.21 1.10-1.32
60-70
2910 2783 1.11 1.02-1.20
>=70
2604 1665 1.13 1.02-1.24
By gender (p-heterogeneity=0.92)
Men
836 899 1.15 1.00-1.32
Women
844 1218 1.14 0.99-1.31
By gender (p-heterogeneity=0.85)
Men
4977 5472 1.13 1.07-1.20
Women
3883 3726 1.14 1.07-1.23
0.8
0.8
1.0
1.2 1.4
1.0
1.2 1.4
1.8
OR
OR
• Heterogeneity by histology observed in Whites only
1.8
Table 6 – Association between risk of lung cancer and combined genotypes of
rs402710, rs2736100 and rs16969968 in Whites
ca
co
OR
95% CI
7964
8212
0
95
153
1.00
reference
1
551
765
1.16
0.87-1.55
0.32
2
1538
1856
1.30
0.98-1.71
0.06
3
2364
2458
1.53
1.16-2.01
0.003
4
2097
1955
1.72
1.30-2.26
1.10-4
5
1099
883
1.98
1.49-2.63
2.10-6
6
220
142
2.64
1.86-3.74
4.10-8
1.15
1.12-1.18
1.10-26
Number of risk-allele
per risk-allele
p-value
• An OR of 2.64 was found for homozygous carriers of the
three risk variants compared to individuals with no risk allele.
Conclusion


The largest pools of independent studies not
included in previous GWAS.
For 15q25
◦ Replicated the results from GWAS in Whites.
◦ Expanded to Asians with no association.

For 5p15
◦ Confirmed the results in Whites.
◦ Reported an association in Asians.

For 6p21
◦ Results were not replicated.
• Nature Genetics, 41, 991 - 995 (2009).
• MD Anderson
• European Descendant
• Bladder Cancer
First Stage - discovery
969 bladder cancer cases/957 controls
 Illumina HumanHap610 BeadChip (Illumina)
 556,426 SNPs

• None of the SNPs reached genome-wide
significance.
• After removing highly linked SNPs, three SNPs
had a P-value < 10-5 and 50 SNPs showed a Pvalue < 10-4.
No evidence for inflation of chi-squared test
(none of the SNPs reached genome-wide significance
Second Stage - replication

Genotyping the top 50 SNPs and the top 10
additional SNPs in 8q24 in 3 additional US
sites:
◦ New Hampshire (800 cases/912 controls)
◦ Texas (764 cases/2,807 controls)
◦ MSKCC (149 cases/152 controls)
One SNP, rs2294008, showed consistent
results in the discovery and replication phase.
 9 additional European populations were used
to replicate this SNP.

Overall allelic OR, 1.15 (95% CI, 1.10-1.20).
No heterogeneity between populations was observed.
Similar associations were observed across different
strata of gender, smoking status, and age.
Third Stage - expansion


rs2294008 is located in exon 1 of the PSCA gene
(prostate stem cell antigen), which is upregulated
in most bladder tumors.
Resequenced the genomic region of PSCA in 106
individuals of European ancestry.
◦ 27 SNPs were identified.
◦ All of the high frequency SNPs are in strong LD with
rs2294008.
◦ 7 of the SNPs were genotyped in the discovery set and
identical ORs compared to rs2294008 were observed.
Fourth Stage - expansion

Bladder cancer cell line study of rs2294008
◦ the T allele-containing haplotypes showed significantly lower
promoter activity
◦ substitution of C to T significantly reduced promoter
activity
◦ substitution of T to C increased promoter activity
◦ rs2294008 is a functional variant in vitro

Paradoxical
◦ T allele reduces the transcriptional activity of the PSCA
promoter
◦ PSCA has been shown to be overexpressed in bladder
tumors
◦ The functional consequence of the T allele in vivo is still
unclear
The following slides are from Dr. McKay presented in the
2009 INHANCE meeting, Paris
Total sample numbers Genotypes
Replication
Total
Ca
Bremen
Co
163
189
1228
1076
Rome
235
222
Seattle
208
413
1288
1362
Penn State
429
685
UCLA
319
934
ORC
477
487
Brown
568
651
Pittsburgh
633
793
Netherlands
454
304
6002
7116
South America
UNC
Total replicates
Updated: 06/26/09
marker
segment GeneSymbol coding_status Reason
rs1573496
4q
ADH7
rs7431530
3p2
rs4767364
12q24.13a
SYNON
Discovery phase
Replication
4429 ca/5996 co
5322 ca/6218 co
OR
95%CI
P
OR
95%CI
P
p_all<1x10-5
0.70
0.62-0.79
8E-09
0.78
0.71-0.85
6E-08
RBMS3
p_all<1x10-5
0.81
( 0.74- 0.88)
2E-06
0.94
0.88-1.00
0.03
FLJ13089
p_all<1x10-5
1.21
( 1.12- 1.32)
2E-06
1.12
1.04-1.19
0.001
rs10801805 1p
ZNF326
p_all<1x10-5
1.20
( 1.11- 1.29)
3E-06
1.03
0.97-1.09
0.29
rs2287802
19p
COL5A3
p_all<1x10-5
1.19
( 1.11- 1.28)
4E-06
1.02
0.96-1.08
0.46
rs4799863
18q1
FHOD3
p_all<1x10-5
0.84
( 0.78- 0.91)
5E-06
0.95
0.90-1.00
0.06
rs11067362 12q24.21b
TBX3
p_all<1x10-5
1.35
( 1.19- 1.54)
6E-06
1.01
0.92-1.10
0.89
rs2299851
6p
MSH5
p_all<1x10-5
0.72
( 0.62- 0.83)
6E-06
1.09
0.98-1.22
0.10
rs1431918
8q
ASPH
p_all<1x10-5
1.19
( 1.10- 1.28)
7E-06
1.04
0.98-1.10
0.18
rs7924284
10q
CWF19L1
p_all<1x10-5
1.38
( 1.20- 1.59)
8E-06
0.97
0.88-1.07
0.59
rs2517452
6p
C6orf15
p_oral <5x10-7
0.86
( 0.80- 0.93)
8E-05
1.01
0.94-1.08
0.75
rs16837730 1p3
OPRD1
P_heavy <5x10-7
1.35
( 1.14- 1.60)
4E-04
1.11
0.98-1.25
0.11
rs1041973
2q
IL1RL1
NONSYN
NONSYN p<1x10-4
0.83
( 0.76- 0.90)
3E-05
0.93
0.87-0.99
0.01
rs3810481
20q13
PRIC285
NONSYN
NONSYN p<1x10-4
1.22
( 1.11- 1.34)
6E-05
1.06
0.98-1.15
0.14
rs2012199
1q
FCRL5
NONSYN
NONSYN p<1x10-4
1.24
( 1.11- 1.38)
1E-04
1.07
0.99-1.16
0.10
rs484870
19p
FLJ35784
NONSYN
NONSYN p<1x10-4
1.16
( 1.08- 1.26)
1E-04
0.98
0.93-1.04
0.59
rs1494961
4q
HEL308
NONSYN
NONSYN p<1x10-4
1.15
( 1.07- 1.24)
1E-04
1.11
1.05-1.17
0.0001
Additional SNPs genotyped, but not selected from GWAS study
rs1229984
4q _
rs16969968 15q25
ADH1B
NONSYN
ADH1B
0.49
( 0.40- 0.60)
2.E-12
0.68
0.60-0.78
8E-09
CHRN
NONSYN
Lung cancer smoking
varaint
1.04
( 0.96- 1.13)
0.351
1.09
1.03-1.15
0.002
Alcohol Dehydrogenase
7
G → C substitution
Gly → Ala
MAF: Caucasian 0.09
Asian 0-0.5 (?)
RNA binding motif, single
stranded interacting
protein
C → T substitution
Intron
MAF: Caucasian 0.28
Asian 0.41
chromosome 12 open
reading frame 30
A → G substitution
Intron
MAF: Caucasian 0.65
Asian 0.03
interleukin 1 receptorlike 1
C → A substitution
Ala → Glu
MAF: Caucasian 0.13
Asian 0.17
helicase, POLQ-like
G → A substitution
Val → Ile
MAF: Caucasian 0.48
Asian 0.23
alcohol dehydrogenase 1B
A → G substitution
His → Arg
MAF: Caucasian 0.99
Asian 0.23
Nicotinic acetylcholine
receptors
G→ A substitution
Asp → Asn
MAF: Caucasian 0.42
Asian 0.03
Limitations in GWAS
Potential false-positive (negative) results.
 Lack of information on gene function.
 Insensitivity to rare variants.
 Can not assay insertion/deletion variants.
 Requirement of large sample sizes.
 Bias due to case and control selection.
 Findings are many steps away from actual
clinical application.

What we’ve learned from the
GWAS?
International collaboration.
 Don’t give up on negative results.
 Be an active thinker, explore all
possibilities.

What’s next after GWAS?
Thank you!!
References
Thorgeirsson TE, Geller F, Sulem P, Rafnar T, Wiste A, Magnusson KP, Manolescu A,
Thorleifsson G, Stefansson H, Ingason A, Stacey SN, Bergthorsson JT, Thorlacius S,
Gudmundsson J, Jonsson T, Jakobsdottir M, Saemundsdottir J, Olafsdottir O, Gudmundsson LJ,
Bjornsdottir G, Kristjansson K, Skuladottir H, Isaksson HJ, Gudbjartsson T, Jones GT, Mueller
T, Gottsäter A, Flex A, Aben KK, de Vegt F, Mulders PF, Isla D,Vidal MJ, Asin L, Saez B, Murillo L,
Blondal T, Kolbeinsson H, Stefansson JG, Hansdottir I, Runarsdottir V, Pola R, Lindblad B, van
Rij AM, Dieplinger B, Haltmayer M, Mayordomo JI, Kiemeney LA, Matthiasson SE, Oskarsson
H, Tyrfingsson T, Gudbjartsson DF, Gulcher JR, Jonsson S, Thorsteinsdottir U, Kong A,
Stefansson K. A variant associated with nicotine dependence, lung cancer and peripheral
arterial disease. Nature. 2008 Apr 3;452(7187):638-42.
 Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, Zaridze D, Mukeria A, SzeszeniaDabrowska N, Lissowska J, Rudnai P, Fabianova E, Mates D, Bencko V, Foretova L, Janout V,
Chen C, Goodman G, Field JK, Liloglou T, Xinarianos G, Cassidy A, McLaughlin J, Liu G, Narod
S, Krokan HE, Skorpen F, Elvestad MB, Hveem K, Vatten L, Linseisen J, Clavel-Chapelon F, Vineis
P, Bueno-de-Mesquita HB, Lund E, Martinez C, Bingham S, Rasmuson T, Hainaut P, Riboli E,
Ahrens W, Benhamou S, Lagiou P, Trichopoulos D, Holcátová I, Merletti F, Kjaerheim K, Agudo
A, Macfarlane G, Talamini R, Simonato L, Lowry R, Conway DI, Znaor A, Healy C, Zelenika D,
Boland A, Delepine M, Foglio M, Lechner D, Matsuda F, Blanche H, Gut I, Heath S, Lathrop M,
Brennan P. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor
subunit genes on 15q25. Nature. 2008 Apr 3;452(7187):633-7.

References

Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Dong Q, Zhang Q, Gu X,
Vijayakrishnan J, Sullivan K, Matakidou A, Wang Y, Mills G, Doheny K, Tsai YY, Chen
WV, Shete S, Spitz MR, Houlston RS. Genome-wide association scan of tag SNPs
identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet. 2008
May;40(5):616-22. Epub 2008 Apr 2.

Wu X, Ye Y, Kiemeney LA, Sulem P, Rafnar T, Matullo G, Seminara D,Yoshida T, Saeki
N, Andrew AS, Dinney CP, Czerniak B, Zhang ZF, Kiltie AE, Bishop DT, Vineis P, Porru
S, Buntinx F, Kellen E, Zeegers MP, Kumar R, Rudnai P, Gurzau E, Koppova K,
Mayordomo JI, Sanchez M, Saez B, Lindblom A, de Verdier P, Steineck G, Mills GB,
Schned A, Guarrera S, Polidoro S, Chang SC, Lin J, Chang DW, Hale KS, Majewski T,
Grossman HB, Thorlacius S, Thorsteinsdottir U, Aben KK, Witjes JA, Stefansson K,
Amos CI, Karagas MR, Gu J. Genetic variation in the prostate stem cell antigen gene
PSCA confers susceptibility to urinary bladder cancer. Nat Genet. 2009
Sep;41(9):991-5. Epub 2009 Aug 2.

Pearson TA, Manolio TA. How to interpret a genome-wide association study. JAMA.
2008 Mar 19;299(11):1335-44. Erratum in: JAMA. 2008 May 14;299(18):2150.