SNP Analysis (GAW15 data)

Download Report

Transcript SNP Analysis (GAW15 data)

High Density SNP analysis of 642 Caucasian
Families with Rheumatoid Arthritis Identifies
Two Novel Regions of Linkage on
Chromosomes 11p12 and 2q33
Christopher I. Amos, Wei V. Chen, and NARAC
Previous Genome Scans using Microsatellites

We did two independent genome wide scans using microsatellites at 10 cM
interval. Outside of the MHC, no genetic region has been identified that
meets accepted criteria for “definite” linkage. Table4

A meta-analytical study of data published by groups in the U.K,
Europe, the U.S. provided evidence (p<0.01) for loci influencing RA
risk on chromosomes 6p, 6q, 16 centromeric and 12p.
2
Current Status for Worldwide RA Gene Searching

Consistent associations clearly implicate a role of the major histocompatibility
complex (MHC) on chromosome 6p21 in risk for rheumatoid arthritis.
–
–

Recently, the R620W variant of the PTPN22 locus on chromosome 1p13 has been
shown to confer increased risk for rheumatoid arthritis, with odds ratios ranging
between 1.5-2.0 for heterozygotes, and over 3.0 for homozygous carriers of the
variant.
–
–
–

the MHC region makes the largest single contribution (relative recurrence risk ~1.8 ) to disease
susceptibility7
A set of alleles at the DRB1 locus, many of which share a common polymorphic sequence, the
“shared epitope”, explain a large portion, but not all, of the genetic risk within the MHC.
This finding has been extensively replicated , and is now accepted as the most robust genetic
association with RA outside of the MHC.
Interestingly, the 620W PTPN22 allele is also associated with several other autoimmune
disorders including type 1 diabetes, autoimmune thyroid disease, systemic lupus erythematosus
and some forms of juvenile arthritis.
additional variability in the PTPN22 locus may account for a more minor proportion of risk for
RA
Associations with CTLA4 (chromosome 2q33.1) and PADI4 (chromosome 1p36)
have also been supported in some additional studies suggesting that these genes
may also contribute to RA susceptibility in Caucasian populations, but with rather
modest relative risks .
Goals of this study

Obtain refined evidence for linkage using a
larger sample size and denser map of
markers, thus improving informativity
 Evaluate evidence for linkage by clinical
strata to identify more homogeneous
subgroups
 Comparison of microsatellite and SNP
scans
Largest Single Genome Wide Linkage Scan in RA

>5,700 informative SNP markers (Illumina IV SNP linkage panel)
–
–
–
–

Data from 5744 markers passed Quality Control requirements in the lab (98.1%)
Hardy-Weinberg Disequilibrium Detected in 5/4995 markers at p<=0.001 (excluding
chromosome 6, X and XY)
18 Markers on Y chromosome were dropped, 866 markers dropped for strong LD
(D’>0.7) from many analyses
293 markers are on chromosome X, 19 on XYp and 7 are on XYq
642 Caucasian families containing affected sibling pairs with rheumatoid
arthritis, recruited by the North American Rheumatoid Arthritis
Consortium (NARAC)
–
Table 1. Structure and sampling of 642 caucasian sibling pair families
studied
Informativity of SNP scan Improved
Information Content – (entropy)
Microsatellites – 0.526
 Information Content including all SNP
markers - 0.756
 Information Content After excluding SNP
markers in LD - 0.749

Information Content
Information content is a measure of how informative a
marker or map of markers is in a collection of pedigrees
in order to extract the maximum amount of inheritance
information for a linkage analysis.
Information content is a function of marker
heterozygosity and the number of meioses in the
genetic study. For multipoint linkage analysis,
information content is also a function of marker density
and spacing.
It is important to have high information content
throughout the genome for genome-wide searches for
disease susceptibility loci or other traits so that regions
of no linkage can be excluded, regions of significant
linkage can be detected, and the linkage interval can be
accurately defined.
Statistical Analysis
Applied SNPLINK – set of Perl scripts that
manage SNP data and can call Merlin
 Summarized evidence for linkage using
Kong and Cox LOD scores from Merlin
 Analyzed complete data or after eliminating
markers showing D’ > 0.7.
 Reanalyzed pseudoautosomal regions after
balancing same-sex and opposite sex pairs

Removal of Markers in LD


Linkage analysis of tightly linked loci can lead to an excess of false positive results if
the markers are in strong linkage disequilbrium and parents are not available for
genotyping (Huang et al., 2004).
We choose to remove markers that showed D’ values greater than 0.7, because earlier
analyses on simulated stata have shown little inflation in LOD score if this criterion is
used.
–
–
–
866 markers dropped for strong LD (D’>0.7) from analyses (15.12%); when only
autosomal chromosomes are considered, 784 out of 5407 markers tested were dropped
(14.50%).
Over all markers, there was an average decrease in LOD score of 0.129 when markers in
linkage disequilibrium were dropped. For only the autosomal chromosomes, dropping
markers in LD lead to a decrease of 0.120 in LOD score.
Dramatic decreases in LOD scores were noted on a few chromosomes when markers in
LD were dropped, for example on chromosome 21, the LOD score decreased from 11.59 to
1.11. The most prominent of these instances involved markers located at the telomeres,
and can be explained by the lack of any flanking markers that are not in linkage
disequilibrium with the set of markers.
Effects of Different Cutoffs for LD Removal


R-square > 0.16: fewer markers dropped
R-square > 0.05: also didn’t qualitatively change the results
–

the LOD scores on chromosomes 2, 4, 7, 10 and 11 increased slightly with the maximum
increase being from 2.35 to 2.55 on chromosome 4, while on chromosomes 5, 6, 12, 16, and 18
there were modest decreases in LOD score, with the largest decreases being on chromosome 18
from 1.47 to 1.08 and chromosome 5 from 2.55 to 2.32.
The overall moderate changes in LOD scores argue strongly against false
positive results due to LD after adjustment for LD (D’< 0.7) between markers.
10
Effects of Untyped Parents


we checked the LOD scores among families with no parents genotyped
(59%) versus families with at least one genotyped parent (41%) for the
major regions of linkage on chromosomes 2, 4, 7, 10 and 11.
LOD scores remained positive in all family groups.
On chromosomes 2, 7, and 11 the LOD scores from the families with one or
more parents typed were higher, while on chromosomes 4 and 10, the LOD
scores were higher for the set of families without typed parents.
– The only potential region of concern occurs on chromosome 4, for which the
maximum LOD score in families without typed parents was 2.50, while
among the families with at least one genotyped parent the LOD score was
only 0.08. However, the finding that LOD scores on chromosome 4 do not
decrease when restricting to R-squared values for LD among adjacent
markers of 0.05 or less indicates that the evidence for linkage on this
chromosome does not reflect a false positive due to LD among tightly linked
markers.
–
11
Effects of Large Centromere



Since genetic maps are not available for the majority of SNPs that were
available in this panel, we assumed that 1 megabase is 1 centiMorgan.
In the proximity of the centromeric regions this mapping approach is
inaccurate because the centromeric regions have suppressed recombination.
we reperformed analysis setting the recombination to zero for the 6
chromosomes with large centromeres (1, 3, 9, 11, 16, 19). Excluding
recombination in these regions led to slightly lower LOD scores.
12
Effects of Markers in Hardy Weinberg Disequilibrium
–
Hardy-Weinberg Disequilibrium Detected in 5/4995 markers at p<=0.001
(excluding chromosomes 6, X and XY)


Chromosome 6: known strong genetic factors
conforms well to the expected frequency of Hardy-Weinberg disequilibrium with
this large set of markers.
–
Of note, marker rs238510 at 103.49 megabases in the linkage peak on
chromosome 4 and just proximal to the potential candidate gene, B-cell
scaffold protein with ankyrin repeats 1 (Bank1), showed a significant
(p<=0.001) departure from Hardy-Weinberg equilibrium.
–
Departures from Hardy-Weinberg equilibrium can occur for numerous
reasons including genotype call failures and association between
marker alleles and disease susceptibility. Since the latter would be
expected for certain regions showing strong evidence for linkage, we
here only report results including all markers.
13
Chromosome 1
4
Microsatellites
SNPs (wo LD)
SNPs (w LD)
3
LOD
2
1
0
0
50
100
150
-1
-2
Mb
200
250
SNPs Identify a 2q Linkage
4
Microsatellites
SNPs (wo LD)
SNPs (w LD)
3
LOD
2
1
0
-1
0
50
100
150
-2
Mb
200
250
SNPs Identify an 11p Linkage
4
3
Microsatellites
SNPs (wo LD)
SNPs (w LD)
LOD
2
1
0
0
20
40
60
80
-1
Mb
100
120
Chromosome 4
4
LOD
3
w-LD
wo-LD
2
1
0
-1
0
50
100
Physical Distance (Mb)
150
200
Chromosome 5
4
LOD
3
w-LD
wo-LD
2
1
0
-10
-1
40
90
Physical Distance (Mb)
140
190
Chromosome 6
20
15
w-LD
wo-LD
LOD
10
5
0
-20
30
80
-5
Physical Distance (Mb)
130
180
Chromosome 7
LOD
4
3
w-LD
2
wo-LD
1
0
0
20
40
60
80
100
-1
Physical Distance (Mb)
120
140
160
Chromosome 10
4
LOD
3
w-LD
wo-LD
2
1
0
0
20
40
60
80
-1
Physical Distance (Mb)
100
120
140
Chromosome 18
4
LOD
3
w-LD
wo-LD
2
1
0
0
20
40
60
-1
Physical Distance (Mb)
80
Reanalysis of pseudoautosomal regions after
balancing same-sex and opposite sex pairs



19 on XYp and 7 are on Xyq
Selection of sib pairs that include an excess of same-sex pairs leads to an
excess of sharing in the pseudoautosomal region, while opposite sex-pairs
show a decrease in sharing.
There are more affected female-female sib pairs in the dataset than the
affected female-male or male-male sib pairs. So after dropping malemale affected pairs, there are still more female-femlae pairs than
female-male pairs (not equal number of same-sex and opposite sex
pairs). Two ways were used to bring female-male pairs in balance with
sex concordant pairs:
1. After dropping male-male pairs and 180 (37%) of female-female pairs,
the LOD scores for all SNPs dropped to less than 0 after LD removal.
– 2. Also when including all male-male pairs and dropping out 236 (49%)
female-female pairs, the LOD scores for all SNPs dropped to less than 0
after LD removal.
–
Results
Chromosome
LOD Scores
Peak SNP
Position
Peak SNP
Position
w-LD
wo-LD
wo-LD
w-LD
wo-LD
w-LD
1
1.65
1.41
rs335523
2
4.02
3.52
rs1354905 189649014 rs1949429 192602000
4
3.78
2.35
rs223383
104209760 rs1384401 105023898
5
3.06
2.55
rs1857844
24569227
rs903391
43206838
6p
18.53
16.14
rs169679
28964566
rs11908
32991663
6q
5.32
0.84
211882862 rs1547502 216620855
14880100
Candidate Genes Near
Linkage Peaks
Chr
SNP
Genes
1
rs1547502
LYPLAL1, TGFB2, ZNT8, EPRS
2
rs1949429
4
rs1384401
MYO1B, STAT1, STAT4, GLS, serum
deprivation response (phosphatidylserine
binding protein), transmembrane protein
with EGF-like and two follistatin-like
domains 2
Tachkynin receptor 3, CENPE, MANBA,
NFKB, BANK1 (B-cell scaffold protein
with ankyrin repeats 1)
Results
Chromosome
LOD Scores
Peak SNP
Position
Peak SNP
Position
w-LD
wo-LD
wo-LD
w-LD
wo-LD
w-LD
7
2.49
1.94
rs903898
146244418 rs2040587 114230026
8
1.61
0.91
rs1735173
146076058 rs1375956 106521094
10
3.76
2.54
rs579142
60472666
rs1227938
70502871
11
3.92
3.09
rs2035693
39201341
rs1462224
41024049
12
5.09
1.46
rs7960480
132017084 rs2009625
22216512
16
1.73
1.69
rs1946155
53412897
53412897
rs1946155
Candidate Genes Near
Linkage Peaks
Chr
SNP
Genes
5
rs903391
7
rs2040587
Chemokine L28, Selenoprotein
SEPP1, HMG-CoA synthase
H1C, FoxP2, protein phosphatase 1,
10
rs1227938
Tachykinin receptor 2, HK1,
regulatory inhibitor) subunit 3A, desert
transmembrane 4 superfamily member
tetraspan NET-7, neurogenin 3,
PROTEOGLYCAN 1
11
rs1462224
Desert, NGL1, TRAF6 (4Mb), CD44, API5
Results
Chromosome
LOD Scores
Peak SNP
Position
Peak SNP
Position
w-LD
wo-LD
w-LD
w-LD
wo-LD
wo-LD
17
1.74
0.97
rs411602
69136700
rs764426
67565205
18
1.9
1.47
rs1792723
52015361
rs663220
52349253
19
4.52
0.21
rs1465789
63637868
rs306450
61192862
20
3.11
1.02
rs1434789
132900
rs914433
55196924
21
11.59
1.11
rs2835629
37442056
rs2837710
40861199
X
3.22
1.17
rs2015312
56385990
rs2057652
10529107
XY
8.69
3.91
rs2535444
2282688
rs700447
153441492
Candidate Genes Near
Linkage Peaks
Chr
SNP
Genes
12
rs2009625
16
rs1946155
Cytidine 5-prime-monophosphate nacetylneuraminic acid synthetase,
ABCC9, STAT8A
RBL2, FTS, fto (Fatso), MMP2,
SLC6A2
XY
rs1462224
SYBL1, SPRY3, IL9R
Stratified Analysis

Previous Studies identified major
differences among some strata, notably
when stratifying by sex, shared epitope, and
when adjusting for antiCCP levels.
 Individuals with antiCCP titers less than 20
were categorized as negative and
individuals with antiCCP titers of 20 or
higher were classified as positive
Conclusions

Strong evidence for linkages to chromosomes 2, 6,
11
– prominent and previously unreported linkage peaks are observed on
chromosomes 2q33 and 11p12 (within a ‘gene desert’) with LOD scores
of 3.52 and 3.09 respectively, after adjustment for LD (D’< 0.7) between
markers.
– Broad linkage signal on chromosome 6

Stratification/covariate effects most pronounced
for antiCCP strata for chromosomes 4, 5, 6 and 7
 HLA shared epitope status and sex had significant
impact on linkage evidence within the MHC
region of chromosome 6 only.
Future Study


Follow up association studies on the 2q33 and 11p12 regions, and on antiCCP+ disease subset
Fine mapping and gene identification supplemented by the selection of
candidate genes based on evoving knowledge of the disease
32
The End.
Thank you!