IRS1 in Type 2 Diabetes

Download Report

Transcript IRS1 in Type 2 Diabetes

Genome-wide association studies
Misha Kapushesky
Slides: Johan Rung, EBI
St. Petersburg Russia 2010
Overview
• Methods for genome-wide association studies
• Montreal GWAS for Type 2 Diabetes
• GWAS results - context and caveats
Study coverage
• Associating phenotype/disease state to genetic variation
• Cost per genotype has decreased
• Instead of a candidate gene approach, just scan the
entire genome
• SNP microarrays covering up to 5M SNPs on one chip
• Increased sample sizes
Recombination
Linkage disequilibrium
Two markers on the genome are inherited together more often than
would be expected by chance
This leads to high correlation between nearby markers in its haplotype
block
Haplotypes and genotype tagging
Association studies
• Linkage disequilibrium enables association studies,
because of detection by proxy - not every variant need to
be typed
Study power
A
1
Cases
2
3
4
1
Controls
2
3
4
B
Study power
• The power of a study is to correctly predict a true
positive
• To calculate this, you need:
•
•
•
•
•
•
risk model
genotype relative risk
allele frequency
number of cases and controls
population penetrance
Acceptable rate of false positives
How many SNPs should be tested? Studies of small
regions revealed linkage disequilibrium blocks in which
common SNPs are highly correlated (usually <10,000–
30,000 base pairs in African populations or 30,000–50,000
base pairs in the newer European and Asian populations)
(22). This motivated the HapMap Project (www.hapmap.org
[12]), which has validated approximately 4 million SNPs,
including 2.8 million of the estimated 10 million common
SNPs in major world populations, while creating
competition among biotechnology companies to develop
high-throughput genotyping technologies. Sequencing and
genotyping studies showed that sets of 500,000 (European
populations) to 1,000,000 (African populations) SNPs could
"tag" (serve as proxies for) approximately 80% of common
SNPs (23).
Quality controls
• Call rates for samples and SNPs
• Exclusion of low frequency SNPs
• Exclusion of SNPs out of Hardy-Weinberg Equilibrium
• Clean (or take into account) population stratification
Hardy-Weinberg Equilibrium
• If the alleles A and B have frequencies p and q, you
would expect the following genotype frequencies:
AA: p2
AB: 2pq
BB: q2
Hardy-Weinberg Equilibrium
• When observed genotype frequencies deviate from the
ones expected under HWE, this is indicative of
• population stratification
• different mutation rates between males and females
• different fitness between alleles
• genotype calling problems
• true association at the locus
Binary or real-valued phenotypes
• Binary traits are typically disease state labels (case
or control)
• Real-valued traits are quantitatively measured
phenotypes
•
•
•
•
•
blood sugar
lipids
height
BMI
gene expression
Molecular vs disease phenotypes
• Disease phenotypes are the result of combinations of
molecular phenotypes in the body
• Progression with time
• Precision of phenotype measurement
Molecular vs disease phenotypes
• Many physiological phenotypes involved in disease
dynamics
Molecular vs disease phenotypes
Molecular phenotypes can give more precise information
about disease state
Association statistics
• Association statistics for binary traits are most often
based on a c2-statistic, based on the genotype count
table, or a logistic regression model
c2-statistic summarizes independence between
disease state and genotype
Association statistics
• For aa in cases, you would expect
R n0
~
r0  N * *
N N
• The sum of the squares of the differences is c2distributed
aa
aA
AA
Sum
Cases
r0
r1
r2
R
Controls
s0
s1
s2
S
Count
n0
n1
n2
N
Regression
• For real-valued phenotypes, use linear regression
• For binary phenotypes, use logistic regression
Population stratification
• Population stratification occurs when groups or
subpopulations within your sample are more related
than would be expected by random
• This introduces correlations and inflates association
p-values and need to be corrected for
Genomic control
Eigenstrat
Imputation
• Using a reference population (like HapMap or 1000
genomes) we can infer the genotype of SNPs that were
not tested
• IMPUTE or MACH commonly used
• Yields probabilistic genotypes that need special treatment
Imputation
Wu et al, Nat. Genet. 41, 991-995, 2009
Montreal GWAS
Type 2 diabetes
•
Blood glucose levels are regulated by insulin release
•
Increased blood glucose levels triggers release of
insulin, that signals to the cells in muscle for glucose
intake
•
Through b-cell dysfunction or insulin resistance, insulin
regulation is impaired, leading to increased glucose
levels and eventually type 2 diabetes
Type 2 diabetes
Genetics of type 2 diabetes
• Before GWAS, T2D genetics was studied with linkage studies and
candidate gene approaches
• Results in particular for MODY variants, caused by disruptions of
single genes
• Genome-wide association studies and SNP arrays made it possible
to study complex diseases
• Five large GWAS for T2D in 2007
• DIAGRAM meta-analysis in 2008
Montreal GWAS
• Part of a larger T2D project at McGill and Genome
Quebec
• After initial planning for candidate gene genotyping, we
switched to a GWAS strategy
Multi-stage GWAS
• Two main strategies for increasing study power
• Meta-analyses increase effective sample size by
combining results from different studies
• Multi-stage approaches scan the whole genome with
relatively low power, followed by focusing in on the hits
with higher power
• Maximizing power in a single study in a cost-effective way
Multi-stage GWAS
Study design
Fasting glucose
Normoglycemic individuals
Stage
Stage 1:
1: Genome-wide
Genome-wide scan
scan -- 392,365
392,365 SNPs
SNPs
Stage 1: French (N=654)
French
French (N=1,376)
(N=1,376)
679
679 cases,
cases, 697
697 controls
controls
Stage 2: rs560887 (N=9,353)
Previously published,
Science, May 2007
CASE-CONTROL
T2D ASSOCIATION
Focused
Focused Stage
Stage 22 -- 16,273
16,273 SNPs
SNPs
Fast-track confirmation - 57 SNPs
French
French (N=4,977)
(N=4,977)
2,245
2,245 cases,
cases, 2,732
2,732 controls
controls
French (N=5,511)
2,617 cases, 2,894 controls
Previously published,
Nature, Feb 2007
Focused Stage 3 - 28 SNPs
Danish (N=7,698)
3,334 cases, 4,364 controls
Stage 4: population effect study - 1 SNP (rs2943641)
QT ASSOCIATION
IN POPULATIONS
Population based study samples
French (N=3,351), Finnish (N=5,183), Danish (N=5,824)
Stage 1 samples
• French individuals: 690 cases, 670 controls
• Criteria for cases:
• T2D
• First degree relative with T2D
• Non-obese (BMI < 31 kg/m² , 25.8 ± 2.8 kg/m²)
• Controls from DESIR, a prospective French cohort
• Normal glucose tolerance for the 9 years of the study
Stage 1 SNPs
• Tested on Illumina Human1 (100k) and HumanHap300
(300k)
• 392,935 unique SNPs from the combined arrays
Stage 1 results
Fast-track validation
• Top 57 fast-tracked and tested on a Sequenom panel on
2,617 cases, 2,894 controls
• Relaxed criteria for cases
• BMI < 35 kg/m² (28.9 ± 3.7 kg/m²)
• Sladek et al., Nature 445, 881-885, 2007
Results
SNP
Chr
Position
pMAX
Closest
gene
rs7903146
10
114748339
1.5 x 10-34
TCF7L2
rs13266634
8
118253964
6.1 x 10-8
SLC30A8
rs1111875
10
94452862
3.0 x 10-6
HHEX
rs7923837
10
94471897
7.5 x 10-6
HHEX
rs7480010
11
42203294
1.1 x 10-4
LOC387761
rs3740878
11
44214378
1.2 x 10-4
EXT2
rs11037909
11
44212190
1.8 x 10-4
EXT2
rs1113132
11
44209979
3.3 x 10-4
EXT2
SLC30A8
Chimienti et al. Biometals 18:313
-log10(p)
rs2259049
rs2901587
rs7086285
rs12257053
rs10786044
rs7910977
rs551266
rs1887922
rs2149632
rs2421940
rs3737225
rs11187025
rs6583820
rs7078413
rs1832197
rs11187060
rs11187064
rs2421943
rs7908111
rs1999763
rs3758505
rs6583826
rs12256435
rs3824735
rs4604791
rs10882088
rs2275219
rs10882091
rs7914814
rs7070990
rs6583830
rs7902436
rs7917359
rs2275729
rs1111875
rs7923837
rs2497311
rs2497304
rs947591
rs2488071
rs1539330
rs9420592
rs2497351
rs10509646
rs2488062
rs1935492
rs11187173
rs11592067
rs1418388
rs4244932
rs2490751
rs11187182
rs2422067
rs2490745
HHEX
4
2
0
IDE
KIF11
HHEX
* *
**
**
**
D'
0
0.2
0.4
0.6
0.8
1
HHEX controls pancreatic development
Hex homeobox gene-dependent tissue positioning is required for organogenesis of the
ventral pancreas. Bort (2004)
Heart induction by Wnt antagonists depends on the homeodomain transcription factor
Hex. Foley (2005)
The homeobox gene Hex is required in definitive endodermal tissues for normal
forebrain, liver and thyroid formation. Martinez Barbera (2000)
Habener Endocrinology 146:1025
Stage 2
• Top 5% of GWAS hits were selected for design of a
focused Stage 2
• Control for population bias with EIGENSTRAT
• iSelect array with 16,405 SNPs, tested on 2,245 cases,
2,732 controls (French)
• Analysis with EIGENSTRATand selection of 28 SNPs for
a focused Stage 3
QC
Exclusion criterion
Samples
Call rate < 95%
27
Continental
stratification
296
Sex mismatch
64
Related individuals
70
Total
457
Chromosome
SNPs
Failed HWE
Failed MAF
Successful
TOTAL
16,360
48
43
16,273
EIGENSTRATcorrection
filters for MAF, HWE, call rate
filters for MAF, HWE, call rate and r2
Results - stage 1 vs stage 2
Results - taking out known loci
Stage 3
• The top 28 SNPs were tested using a Sequenom panel in
~7,700 Danish cases and controls
•
We confirm association of TCF7L2, WFS1, CDKAL1 and
find one new association: rs2943641 near IRS1
rs2943641
• We studied the effect of variation in rs2943641 on T2D
risk and metabolic phenotypes in general populations:
• DESIR: 3,351 French adults
• Inter99: 5,183 Danish adults
• NFBC 1986: 5,824 Finnish adolescents
Metabolic traits
• A variety of indexes to capture b-cell function and insulin
resistance
• HOMA-B and HOMA-IR based on fasting levels of
glucose and insulin
• For Inter99, we had access to OGTT data and could
calculate other measures of insulin response
•
•
•
•
time course data
AUC
corrected insulin response (CIR)
disposition indexes
Oral Glucose Tolerance Test
Metabolic traits 1
Metabolic
trait
Cohort
rs2943641
P add
P dom
P rec
21.1 ± 3.5
0.24
0.43
0.21
24.4 ± 3.5
24.4 ± 3.4
0.55
0.63
0.61
25.6 ± 3.9
25.4 ± 4.1
25.7 ± 4.2
0.57
0.094
0.24
NFBC 1986
5.13 ± 0.41
5.14 ± 0.40
5.13 ± 0.41
0.77
0.62
0.90
DESIR
5.21 ± 0.44
5.20 ± 0.42
5.18 ± 0.43
0.05
0.32
0.07
INTER99
5.31 ± 0.40
5.31 ± 0.41
5.33 ± 0.39
0.66
0.93
0.32
NFBC 1986
78.7 ± 48.6
76.8 ± 44.5
71.7 ± 32.1
0.001
0.03
0.0009
DESIR
50.6 ± 32.9
48.4 ± 29.7
49.1 ± 29.1
0.05
0.003
0.76
INTER99
38.8 ± 24.7
36.4 ± 21.9
37.6 ± 23.3
0.018
0.0043
0.49
C/C
C/T
T/T
NFBC 1986
16
16
16
DESIR
47.1 ± 9.8
47.5 ± 9.9
47.6 ± 10.1
INTER99
44.9 ± 7.9
45.4 ± 7.8
45.2 ± 7.6
NFBC 1986
1062/1092
1153/1208
322/346
DESIR
645/728
728/812
216/222
INTER99
776/942
974/1070
307/354
NFBC 1986
21.3 ± 3.8
21.3 ± 3.7
DESIR
24.5 ± 3.7
INTER99
Fasting
plasma
glucose
(mmol/l)
Fasting
serum
insulin
(pmol/l)
Age
Sex
BMI (kg/m2)
Metabolic traits 2
NFBC 1986
141 ± 95.1
136 ± 80.1
131 ± 91.6
0.006
0.05
0.009
DESIR
109 ± 87.0
103 ± 64.8
108 ± 92.2
0.16
0.006
0.24
INTER99
75.2 ± 65.6
68.3 ± 42.2
71.0 ± 49.9
0.005
0.0011
0.32
NFBC 1986
2.52 ± 1.63
2.47 ± 1.58
2.29 ± 1.06
0.007
0.07
0.005
DESIR
1.95 ± 1.35
1.86 ± 1.20
1.88 ± 1.17
0.03
0.004
0.95
INTER99
1.54 ± 1.00
1.44 ± 0.89
1.49 ± 0.95
0.026
0.0058
0.59
Insulin 30’
300 ± 183
277 ± 172
281 ± 169
0.0019
8.1 x 10-4
0.14
Insulin 120’
176 ± 138
163 ± 127
162 ± 124
0.0059
0.011
0.057
AUC insulin
22000 ± 13800
20300 ± 12900
20500 ± 1270
0
6.9 x 10-4
2.2 x 10-4
0.12
Glucose 30’
8.19 ± 1.53
8.17 ± 1.56
8.22 ± 1.50
0.72
0.34
0.55
5.51 ± 1.11
5.51 ± 1.11
5.47 ± 1.15
0.54
0.99
0.23
AUC glucose
182 ± 101
181 ± 102
180 ± 99.5
0.44
0.48
0.59
AUC insulin /
AUC glucose
32.5 ± 17.4
30.1 ± 16.2
30.6 ± 16.1
6.0 x 10-4
1.6 x 10-4
0.13
CIR
1140 ± 4210
1000 ± 1130
1000 ± 1060
0.045
0.066
0.17
ISI
0.151 ± 0.095
0.16 ± 0.098
0.156 ± 0.096
0.026
0.0058
0.59
Disp. Index
(CIR * ISI)
180 ± 1610
147 ± 220
143 ± 174
0.73
1.0
0.50
HOMA-B
HOMA-IR
Glucose 120’
INTER99
IRS1 locus - rs2943641
IRS1
•
G972R is a missense polymorphism in IRS1 that is known to impair insulin signalling
(rs1801278) (Almind 1993)
•
G972R associated to insulin resistance and insulin release (Clausen 1995, Sesti 2001)
•
In mice, IRS1 disruption causes disrupted insulin action, both in target tissues and in
b-cells (Nandi 2004)
•
Also linked to insulin resistance, glucose intolerance, islet hyperplasia
(Tamemoto 1994, Araki
1994, Terauchi 1997, Withers 1998)
•
G972R not conclusively associated to T2D (Florez 2004, Florez 2007, Jellema 2003, Zeggini 2004)
•
We detect no epistasis between rs2943641 and G972R in DESIR or NFBC, only
nominal significance in Inter99
•
Evidence for link between rs2943641 and IRS1?
rs2943641 - IRS1 protein association
rs2943641 - IRS1 protein association
rs2943641
CC
rs2943641
CT
rs2943641
TT
n (male/female)
74 (35/39)
88 (51/37)
28 (10/18)
Age (years)
42.5 ± 17.1
43.5 ± 16.9
43.2 ± 17.6
BMI (kg/m2)
25.0 ± 3.8
24.9 ± 3.9
10.4 ± 3.5
Di (x 10-7)
PAdd
PDom
PRec
25.3 ± 4.1
0.3
0.7
0.2
11.0 ± 3.2
11.7 ± 3.7
0.2
0.2
0.4
1.7 ± 1.1
1.8 ± 1.3
1.8 ± 1.1
0.8
0.8
0.9
IRS-1 protein basal (AU)
296.7 ± 167.7
314.0 ± 155.1
413.1 ± 227.6
0.03
0.3
0.009
IRS-1 protein insulin
(AU)
276.6 ± 143.6
280.9 ± 156.4
313.3 ± 147.9
0.3
0.7
0.2
IRS-1-associated PI3K
activity basal (AU)
25.0 ± 12.6
26.6 ± 15.4
30.1 ± 17.2
0.3
0.4
0.4
IRS-1-associated PI3K
activity insulin (AU)
47.1 ± 29.9
56.6 ± 32.1
72.2 ± 41.3
0.001
0.02
0.002
Rd insulin clamp
(mg/kgFFM/min)
Conclusions
• The multi-stage study detected T2D risk loci that were
later confirmed in other cohorts (SLC30A8, HHEX)
• Variation in rs2943641 is associated to
•
•
•
•
•
T2D risk
increased insulin levels
impaired insulin sensitivity
IRS1 protein levels
IRS1 activity in insulin signaling pathway
• Study provided a ”full story” from GWAS scan to
functional evidence thanks to rich phenotyping
Paper
Rung et al., Nature Genetics, 41, 1110-1115, 2009
Acknowledgements
Rosalie Frechette
Alexander Montpetit
Valérie Catudal
Charlotta Pisinger
Philippe Laflamme
Barry Posner
Stephane Cauchi
Anneli Pouta
Oluf Pedersen
Christian Dina
Marc Prentki
Constantin Polychronakos
David Meyre
Rasmus Ribel-Madsen
Ghislain Rocheleau
Christine Cavalcanti-Proença
Aimo Ruokonen
Alexander Mazur
Anders Albrechtsen
Anelli Sandbaek
Torben Hansen
Jean Tichet
Knut Borch-Johnsen
Martine Vaxillaire
Daniel Vincent
Torsten Lauritzen
Jorgen Wojtaszewski
Alexandre Belisle
Marjo-Riitta Järvelin
Allan Vaag
Samy Hadjadj
Jaana Laitinen
Beverley Balkau
Emmanuelle Durand
Barbara Heude
Paul Elliott
Johan Rung
Rob Sladek
Philippe Froguel
Lishuang Shen
David Serre
Philippe Boutin
Guillaume Charpentier
Tom Hudson
Sebastien Brunet
François Bacot
Samy Hadjadj
Michel Marre
GWAS into context
Complexity of interactions in biological systems...
Complexity
...a lot of complexity
B
G
C
A
D
F
E
A
B
Redundancy
Network structure
• Biological networks have a scale-free structure
Log(# genes)
Most genes have few connections
Few genes have many connections
Log(#edges)
Signal propagation
• The structure of biological networks result in robustness
against random errors
• Most mutations, even knockouts, can go by unnoticed
because of redundancy and network wiring
• Low probability to knock out a hub
Common diseases
• What is most common - disease cause by many variants
with low effect, or few rare variants with strong effects?
• GWAS so far have by necessity focused on common
variants
• Many known rare variants associated with common
diseases - or phenotypes that may contribute and
progress to disease
Common disease / common variant
• The hypothesis that most common diseases are caused
by a large number of variants, common in a general
population, but each adding just a small risk
• GWAS results find many loci for common complex
diseases, with small risk
• But... GWAS detected loci so far only explain a very small
fraction of the observed variation
Rare variants
• With improved and lower cost sequencing, we can
address rare variants
• Not just SNPs
• Utility of “extreme cohorts”
• Ex. “A new highly penetrant form of obesity due to
deletions on chromosome 16p11.2” (Nature Feb 4, 2010)
Polygenic contributions
• Groups of non-genomewide significant SNPs proven to
be associated with phenotype
• Individual SNPs can not be inferred, just “group action”
• Supports the idea of many weak variants responsible for
effect
• Ex. “Common polygenic variation contributes to risk of
schizophrenia and bipolar disorder” (Nature 460, 748752)
Meta-analysis caveats
• Meta-analysis on heterogeneous data
• Phenotypes
• Quality control
• Platforms
• Genotype calling
• Analysis
Future directions for GWAS
• Sequencing is cheaper and yielding higher quality data
• Better basis for studying and detecting rare variants and
their effect on diseases or phenotypes
• Copy number variants
• Genetic interactions, GxE interactions
• More samples => higher power
Future directions for GWAS
• Complex phenotypes
• Association of genetic loci to
• genome-wide expression levels
• protein levels
• metabolite levels
Future directions for GWAS
• More data shared => better quality of results
• As in other branches of science, data sharing,
transparency and openness should be promoted
Resources
•
Analysis software packages
• PLINK - http://pngu.mgh.harvard.edu/~purcell/plink/
• *Abel - http://mga.bionet.nsc.ru/~yurii/ABEL/
• MERLIN - http://www.sph.umich.edu/csg/abecasis/merlin/
•
Imputations
• IMPUTE - http://mathgen.stats.ox.ac.uk/impute/impute.html
• MACH - http://www.sph.umich.edu/csg/abecasis/MACH/
•
Population structure
• Eigenstrat - http://genepath.med.harvard.edu/~reich/Software.htm
• EMMA(X) - http://genetics.cs.ucla.edu/emmax/index.html
•
Meta-analysis
• METAL - http://www.sph.umich.edu/csg/abecasis/METAL/
• GWAMA - http://www.well.ox.ac.uk/gwama/index.shtml
•
Data
• EGA - http://www.ebi.ac.uk/ega/
• dbGAP - http://www.ncbi.nlm.nih.gov/gap