Genome Wide Association Studies
Download
Report
Transcript Genome Wide Association Studies
Genome-Wide Association Studies
Weikuan Gu, PhD
Professor
Director of Translational Research
Department of Orthopaedic Surgery & Biomedical Engineering
Director, Gene Discovery Core
Center of Genomic and Bioinformatics & Center of Disease of Connective
Tissues
Special Assistance to the Dean for International Collaboration
College of Medicine
Email: [email protected]
Webpage: http://www.genediscovery.org/
The DNA Discovery Core
Dr. Weikuan Gu, Director (http://www.genediscovery.org)
(Dept Orthopaedic Surgery and Biomedical Engineering, Pathology)
This Core has three elements:
•
Service: The core will provide microarray analysis, genome screening
and polymorphic detection for research and education in UT and other
institutions in Tennessee. Meanwhile, the core will continue develop
and modify protocols for faster processes and lower costs during
services.
•
Research: The core serves as a resource for the development of
research projects that involves in gene profiling, genetic mapping, fine
mapping, genome screening, DNA sequencing, and positional cloning.
•
Education: The core serves as an education and training base for
genetic and genomic analyses. Technique training currently includes
genomic analysis and genome comparison, genome screening, simple
sequence repeat length polymorphism (SSLP) analysis, Single
nucleotide polymorphism (SNP) detection, and DNA sequencing.
Total Number of Publications
Published GWAS Reports, 2005 – 6/2011
951
Calendar Quarter
Through 9/30/10 postings
What is GWAS: Genome Wide Association Study, Also
as whole genome association study (WGA study, or
WGAS), is an examination of large number of common
genetic variants in different individuals to see if any
variant is associated with a trait. GWAS typically focus on
associations between single-nucleotide polymorphisms
(SNPs) and traits like major diseases.
Genetic variants (Common)
Single-nucleotide polymorphisms
Traits
Associations
Large number common variants
Different individuals
1. Genetic variants: 1, A single-nucleotide polymorphism (SNP), 2, A
mutation, in case it is a rare genetic variant (Single or multiple nucleotide
mutation, deletion, insertion, inversion etc), 3, A copy-number variation
Common Genetic variants used in GWAS: Single nucleotide
polymorphism
2. Single-nucleotide polymorphism: DNA sequence variation
occurring when a single nucleotide — A, T, C or G — in the genome
(or other shared sequence) differs between members of a biological
species or paired chromosomes in an individual
3. Traits: A trait is a distinct variant of a phenotypic character of an
organism that may be inherited, be environmentally determined or
be a combination of the two.
There are phenotypic and genetic traits.
A phenotypic trait is an obvious and observable trait; it is the
expression of genes in an observable way
4. Associations: the co-occurrence or linkage of two or more traits ,of
which at least one trait is known to be genetic. The co-occurrence or
linkage of such can not be explained by chance. Association does not
explain the mechanism underlies causal-result relationship.
Wisdom of History-All start with the gene
Positional cloning from mouse disease model: Giant axonopathy (gaxp)
Single gene mutation-single locus -discrete phenotype
The trouble caused by multiple genes
3
2.5
2
1.5
1
0.5
0
Dominant
Recessive
2
1.5
1
0.5
0
AA
Aa
aa
10
8
6
4
2
0
AABB AAB-
A-B-
A-BB
aaaa
Chr/
Mouse
Position
(cM)
LOD/P- value*
Genomic
sequence**(bp)
1
82-106
8.8
159161492- 195694164 1
1q21-25; 1q42-q44
15, 16, 17
1
64-82
24
123137140 -159201852 2, 1
2q37-q21; 1q25-32
18
2
2.2-23
4.05
3374309 - 35049497
10, 9, 2
19
2
35-45
3.5
62721162- 67146921
2
10p15-p13; 9q34-q34.1;
2q14.2
2q24-q32
2
45-55
p=0.002
67146921-104378449
11, 2
11p13-p12; 2q31-2q32
20, 16, 21
4
35-54
16.3
61124566-113400698
9, 1
9p24-p21; 1p31-34
18
4
60-80
9
130853280-154299058 1
1p33-p36.3
16, 17
5
25-36
3.6
42713374-71142400
4
4p16.3-p14
15
6
7
51.0
11-13
4.56
P=0.001
~117348206
25522212-29345204
3
19
3p26.2-p24; 3q21-q24;
19q13-p13.1
18
19
7
11
44
31.0
P=0.0007
6.76
~79815461
~51857154
15, 11,
5, 17
20, 17
18
11
49-60
5.4
87424373-100790282
17
15q23-q26; 11q13-q21
5q21-q23.3; 5q31-q35;
17p11-p13
17q12-q24
11
58-72
10.8
97586515-117908416
17
17q11.2-q25.3
22
13
0-10
6.1
0-21072268
1, 7, 6
15
13
35
7.73
~54537729
5, 9
1q42-q43; 7p15-13; 6p23p21
5q31-q35.3; 9q21.3-q22.3
13
13
22
10-30
5.5
5.8
~40945185
21072268-63274779
6
7, 9, 6, 17
19
22, 23
14
40
4.3
~63201604
8, 13
6p24-p22.3;
7p15-p13; 9q21-q22;
6p25.1.-p21; 17q22-q24
8p21-p11.2; 13q14.1-q21.1
14
15
2.0
24-55
P=0.0007
3.2
~5168925
59453579 -100348551
10, 3
8, 12, 22
20
15
16
27.6
4.07
~38896865
3
16
9-28.4
P=0.01
10q22.1-q23; 3p21-p14
8q21-q24.3; 12q12-14;
22q13-q13.3
3q12-q13.2; 3q21-q23;
3q28-q29
8q11-q11.2; 22q11-q11.2;
3q21-q29
5q21-q23; 5q31-q32
N/A
18
Total: 15
24
13.67
N/A
Chr/human Human homologous***
s
8, 22, 3
15488977- 38208380
~45399525
5
References
19
19, 16, 17
18
18
13
14
13
N/A
Single gene mutation: changes in a genome are restricted to a
single gene, a single- gene defect ensues. Usually this involves a
point mutation and leads to an altered amino acid sequence in the
proteins that are coded in this section of the DNA.
-Autosomal dominant inheritance
-Autosomal recessive inheritance
-X chromosomal inheritance
Complex traits are those that are influenced by more than one
factor. The factors can be genetic or environmental. This is in
contrast to simple genetic traits, whose variations are controlled
by variations in single genes
-Each factor contribute to a small portion of a trait,
-Genetic factors are influenced by environment,
-Multiple genes, each contribute small portions of a trait,
-Therefore the causal genes in general are usually not defect but
somehow tiny changes occur in either expression or function.
QTL mapping has been used to locate genetic factors for the
complex traits or quantitative traits.
Strategy for identification of genetic factors for complex traits:
Quantitative trait loci (QTLs) mapping to determine the chromosome
regions of genetic factors of quantitative traits. Mapping regions of the
genome that contain genes involved in specifying a quantitative trait is
done using molecular tags such as AFLP or, microsatellite markers,
more commonly SNPs.
Candidate gene evaluation: A candidate gene is a gene, located in a
chromosome region suspected of being involved in the expression of
a trait such as a disease, whose protein product suggests that it could
be the gene in question. Usually an integrative approach using
resources of animal models, genomics, high throughput technologies,
and bioinformatics tools has been used.
Gwas: An approach aimed at detecting variants at genomic loci that
are associated with complex traits in the population and, in particular,
at detecting associations between common single-nucleotide
polymorphisms (SNPs) and common diseases.
What is the difference between QTL and GWAS approaches?
(homework)
Sample collection-where, what, possibilities, protocols
Sample characterization:
Age, gender, history, ……
Phenotypic data: Case, control,
detailed clinic feature
DNA Extraction: (RNA)
quality and quantity
Whole genome SNP:
Microarray chips platforms
Association analysis:
Software
Candidate verification:
large or other population, function test, pathways
Example calculation illustrating the methodology of a case-control GWA study. The allele counts of each
measured SNPs is evaluated, in this case with a chi-squared test, in order to identify variants associated
with the trait in question. The numbers in this example are taken from a 2007 study of coronary artery
disease (CAD) which showed that the individuals with the G-allele of SNP1 (rs1333049) were
overrepresented amongst CAD-patients.[(Wellcome Trust Case Control Consortium (June 2007).
Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, Henning AK,
SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL, Ott J, Barnstable
C, Hoh J (April 2005). "Complement Factor H Polymorphism in Age-Related
Macular Degeneration". Science 308 (5720):
Age-related macular degeneration (AMD) is a major cause of blindness in the
elderly. We report a genome-wide screen of 96 cases and 50 controls for
polymorphisms associated with AMD.
Among 116,204 single-nucleotide polymorphisms genotyped, an intronic and
common variant in the complement factor H gene (CFH) is strongly associated
with AMD (nominal P value <10−7).
In individuals homozygous for the risk allele, the likelihood of AMD is increased
by a factor of 7.4 (95% confidence interval 2.9 to 19). Resequencing revealed a
polymorphism in linkage disequilibrium with the risk allele representing a
tyrosine-histidine change at amino acid 402. This polymorphism is in a region of
CFH that binds heparin and C-reactive protein. The CFH gene is located on
chromosome 1 in a region repeatedly linked to AMD in family-based studies.
P values of genome-wide
association scan for genes that
affect the risk of developing
AMD. –log10(p) is plotted for
each SNP in chromosomal
order. The spacing between
SNPs on the plot is uniform
and does not reflect distances
between SNPs on the
chromosomes. The dotted
horizontal line shows the cutoff
for P = 0.05 after Bon-ferroni
correction. The vertical dotted
lines show chromosomal
boundaries. The arrow
indicates the peak for SNP
rs380390, the most significant
association, which was studied
further. (B) Variation in
genotype frequencies between
cases and controls.
Table 1
Odds ratios and population attributable risks (PARs) for AMD. The
dominant odds ratio and PAR compare the likelihood of AMD in
individuals with at least one copy of the risk allele versus individuals
with no copy of the risk allele. The recessive odds ratio and PAR
compare the likelihood of AMD in individuals with two copies of the risk
allele versus individuals with no more than one copy of the risk allele.
The population frequencies for the risk genotypes are taken from the
CEU HapMap population (CEPH collection of Utah residents of
northern and western European ancestry).
(A) Linkage disequilibrium across the CFH
region, plotted as pairwise D′values. The
red/orange box in the center of the plot is the
region in strong linkage disequilibrium with the
two associated SNPs in our data. (B) Schematic
of the region in strong linkage disequilibrium with
the two associated SNPs in our data. The
vertical bars represent the approximate location
of the SNPs available in our data set. The
shaded region is the haplotype block found in the
Hap-Map data. (C) Haplotype blocks in the
HapMap CEU data cross the region. Darker
shades of red indicate higher values of D′. Light
blue indicates high D′with a low logarithm of the
odds ratio for linkage (lod score). The dark lines
show the boundaries of haplotype blocks. (D)
Maximum-parsimony cladogram derived from
haplotypes across the 6-SNP region. The
number near each line indicates which of the six
SNPs changes along that branch. The two red
numbers are the two SNPs initially identified as
being associated with AMD. SNP 4 is rs380390
and SNP 6 is rs1329428.
Immunofluorescence localization of CFH
protein in human retina. Neighboring
human retina sections are stained with
(A) antibody to CFH or (B) antibody to
CFH preabsorbed with CFH as negative
control. (C) High-magnification view of
the boxed area in (A). For (A), (B), and
(C), left panels are the fluorescence
images, with CFH labeling in green and
DAPI (4′,6′-diamidino-2-phenylindole)–
stained nuclei in blue; right panels are
differential interference contrast (DIC)
images showing the tissue morphology.
In (C), the CFH signal is superimposed
onto the DIC image. Labeling of CFH is
intense in choroid, including blood
vessels and areas bordering RPE [(A)
and (C)]; this CFH signal is competed
away by purified CFH protein (B), which
demonstrates the labeling specificity. The
fluorescence signal from RPE arises
from lipofuscin autofluorescence, which
cannot be competed away with CFH
protein [(A) and (B)]. The black spots in
DIC images correspond to melanin
granules in RPE and choroids. The cell
layers are indicated: GC, ganglion cells;
INL, inner nuclear layer; ONL, outer
nuclear layer; RPE, retinal pigment
epithelium. Scale bars: 40 μm in (A) and
(B), 20 μm in (C).
Nat Genet. 2011 Dec 4;44(1):73-7. doi: 10.1038/ng.1005.
A genome-wide association study in Han Chinese identifies new susceptibility loci for
ankylosing spondylitis.
Lin Z, Bei JX, Shen M, Li Q, Liao Z, Zhang Y, Lv Q, Wei Q, Low HQ, Guo YM, Cao S, Yang M,
Hu Z, Xu M, Wang X, Wei Y, Li L, Li C, Li T, Huang J, Pan Y, Jin O, Wu Y, Wu J, Guo Z, He P,
Hu S, Wu H, Song H, Zhan F, Liu S, Gao G, Liu Z, Li Y, Xiao C, Li J, Ye Z, He W, Liu D, Shen
L, Huang A, Wu H, Tao Y, Pan X, Yu B, Tai ES, Zeng YX, Ren EC, Shen Y, Liu J, Gu J.
To identify susceptibility loci for ankylosing spondylitis, we performed a twostage genome-wide association study in Han Chinese. In the discovery stage,
we analyzed 1,356,350 autosomal SNPs in 1,837 individuals with ankylosing
spondylitis and 4,231 controls; in the validation stage, we analyzed 30
suggestive SNPs in an additional 2,100 affected individuals and 3,496
controls. We identified two new susceptibility loci between EDIL3 and HAPLN1
at 5q14.3 (rs4552569; P = 8.77 × 10(-10)) and within ANO6 at 12q12
(rs17095830; P = 1.63 × 10(-8)). We also confirmed previously reported
associations in Europeans within the major histocompatibility complex (MHC)
region (top SNP, rs13202464; P < 5 × 10(-324)) and at 2p15 (rs10865331; P
= 1.98 × 10(-8)). We show that rs13202464 within the MHC region mainly
represents the risk effect of HLA-B*27 variants (including HLA-B*2704, HLAB*2705 and HLA-B*2715) in Chinese. The two newly discovered loci implicate
genes related to bone formation and cartilage development, suggesting their
potential involvement in the etiology of ankylosing spondylitis.
Broad Institute of MIT and Harvard
Wellcome Trust Case-Control Consortium
Ten Basic Questions to Ask About a Genome-wide Association Study Reporta
1. Are the cases defined clearly and reliably so that they can be compared with
patients typically seen in clinical practice?
2. Are case and control participants demonstrated to be comparable to each other
on important characteristics that might also be related to genetic variation and to the
disease?
3. Was the study of sufficient size to detect modest odds ratios or relative risks (1.31.5)?
4. Was the genotyping platform of sufficient density to capture a large proportion of
the variation in the population studied?
5. Were appropriate quality control measures applied to genotyping assays,
including visual inspection of cluster plots and replication on an independent
genotyping platform?
6. Did the study reliably detect associations with previously reported and replicated
variants (known positives)?
7. Were stringent corrections applied for the many thousands of statistical tests
performed in defining the P value for significant associations?
8. Were the results replicated in independent population samples?
9. Were the replication samples comparable in geographic origin and phenotype
definition, and if not, did the differences extend the applicability of the findings?
10. Was evidence provided for a functional role for the gene polymorphism
identified?
JAMA. 2008;299(11):1335-1344.
Five years of GWAS discovery
Visscher PM, Brown MA, McCarthy MI, Yang J. Am J Hum Genet. 2012 Jan
13;90(1):7-24.
The past five years have seen many scientific and biological discoveries made
through the experimental design of genome-wide association studies (GWASs).
These studies were aimed at detecting variants at genomic loci that are
associated with complex traits in the population and, in particular, at detecting
associations between common single-nucleotide polymorphisms (SNPs) and
common diseases such as heart disease, diabetes, auto-immune diseases, and
psychiatric disorders. We start by giving a number of quotes from scientists and
journalists about perceived problems with GWASs. We will then briefly give the
history of GWASs and focus on the discoveries made through this experimental
design, what those discoveries tell us and do not tell us about the genetics and
biology of complex traits, and what immediate utility has come out of these
studies. Rather than giving an exhaustive review of all reported findings for all
diseases and other complex traits, we focus on the results for auto-immune
diseases and metabolic diseases. We return to the perceived failure or
disappointment about GWASs in the concluding section.
GWAS Discoveries over TimeData obtained from the Published GWAS Catalog
(see Web Resources). Only the top SNPs representing loci with association p
values < 5 × 10−8 are included, and so that multiple counting is avoided, SNPs
identified for the same traits with LD r2 > 0.8 estimated from the entire HapMap
samples are excluded.
Increase in Number of Loci
Identified as a Function of
Experimental Sample Size(A)
Selected quantitative traits.(B)
Selected diseases. The coordinates
are on the log scale. The complex
traits were selected with the criteria
that there were at least three GWAS
papers published on each in
journals with a 2010–2011 journal
impact factor >9 (e.g., Nature,
Nature Genetics, the American
Journal of Human Genetics, and
PLoS Genetics) and that at least
one paper contained more than ten
genome-wide significant loci. These
traits are a representative selection
among all complex traits that fulfilled
these criteria.
Is the GWAS approach founded on a flawed assumption that genetics plays an
important role in the risk for common diseases?
Have GWASs been disappointing in not explaining more genetic variation in
the population?
Have GWASs delivered meaningful biologically relevant knowledge or results of
clinical or any other utility?
Are GWAS results spurious?
If we assume that the GWAS results from Figure 1 represent a total of 500,000
SNP chips and that on average a chip costs $500, then this is a total investment of
$250 million. If there are a total of ∼2,000 loci detected across all traits, then this
implies an investment of $125,000 per discovered locus. Is that a good
investment? We think so: The total amount of money spent on candidate-gene
studies and linkage analyses in the 1990s and 2000s probably exceeds $250M,
and they in total have had little to show for it. Also, it is worthwhile to put these
amounts in context. $250M is of the order of the cost of a one-two stealth fighter
jets and much less than the cost of a single navy submarine. It is a fraction of the
∼$9 billion cost of the Large Hadron Collider. It would also pay for about 100 R01
grants. Would those 100 non-funded R01 grants have made breakthrough
discoveries in biology and medicine? We simply can't answer this question, but we
can conclude that a tremendous number of genuinely new discoveries have been
made in a period of only five years.
GWAS Improvement:
Disease –Disease subtypes
SNP - SNP, Expression, SNP
+Expression
Common-Common rare
allele/diseases
GWAS Expansion:
Humans- Animals
Softwares
Reviewers and related researchers
QTL
GWAS
Microsatellite markers/SNP
SNP
Relative small number
Large number
Easy being conduct
Require statistician or software
Multiple loci
Multiple loci
Large genomic region
Narrow down to several SNP or haplotype
Genome-wide association studies (GWAS) are becoming increasingly popular
in genetic research, and they are an excellent complement to QTL mapping.
Whereas QTL contain many linked genes, which are then challenging to
separate, GWAS produce many unlinked individual genes or even
nucleotides, but these studies are riddled with large expected numbers of
false positives.
Home/class work:
1.Please list at least two items of similarity and two items of difference
between QTL studies and GWAS.
2. Please identify the candidate genes for a genomic region flanked by two
rs2019727 (located on human chromosome 1, at 48163350 bp) and
rs187391907 (at Chromosome 1, 50019040 bp), which are the significant
SNPs for a GWAS study on blood-brain barrier.
3. Please conduct a simulation study using GSAA program with the data
provided to get data on: 1) GSAA, 2) GSAA-SNP, and 3) GSEA.
Please email be if you have problem to convert the excel file into gmt file.
Acknowledgements
UT collaborators:
•
•
•
•
Yan Jiao
Griffin Gibson
Yue Huang
XiaoYun Liu
•
•
•
•
•
•
•
•
•
•
•
•
•
YanHong Cao
Hank Chiu
Qing Xiong
Feng Jiao
Stephanie Troemel
HongBin Yang
Peilan Zhang
Dejeaune Smith
Feng Lan
Jian Yan
Zhiping Jia
Caili Han
Chi Zhang
Shan Sun
Non UT Collaborators:
Robert Williams
Dan Goldowitz
Mark Ledoux
John Stuart
Arnold Postlethwaite
Karen Hasty
Lawrence Pfeffer
Christopher Nosrat
Kristin Hamre
UM collabortors
Eugene C. Eckstein
Waldemar G. de Rijk
Charles D Blaha
Guy Mittleman
Support: NIH, UTHSC
Bruce Roe, Univ. Oklahoma
Xinmin Li, UCLA
Harry Jerett, UT San Antonia
David Mount, Harvard Med.
Sch.
Beth Bennett, U Colo
Wesley G. Beamer
Leah Rae Donahue
Cliff Rosen
The Jackson Laboratory
Yun Jiao, St Jude
YungJun Wang, Tiantan
Hospital, BeiJing, PRC