Transcript SNP

Genotyping: principle, technology.
application, and practice
Jer-Yuarn Wu 鄔哲源
National Genotyping Center at Academia Sinica
[email protected]
“Genotype” vs “Phenotype”
Genotype: the genetic makeup of an
organism or set of DNA variants found at one
or more loci in an individual, as characterized
by its physical appearance or phenotype.
Phenotype: our external features are called
our phenotypes and are very different, eg. skin
color, eye shape and color, hair texture; drug
efficacy/sensitivity; predisposition to complex
diseases, such as hypertension, diabetes, etc..
Genotype determines phenotype
Human Genome
Sequence Variations
1.
2.
3.
4.
Restriction Fragment Length Polymorphism (RFLP)
Variable Number of Tandem Repeat (VNTR)
Short Tandem Repeat Polymorphism (STRP)
Single Nucleotide Polymorphism (SNP)
Restriction Fragment Length Polymorphism (RFLP)
Restriction enzyme cleavage site (4-8 bp)
EcoRI
GAATTC
GGATTC
GAATTC
GAATTC
P
M
GAATTC
GAATTC
EcoRI Digestion
GAATTC
GAATTC
GAATTC
GGATTC
Variable Number of Tandem Repeat (VNTR)
Minisatellite (20-50 bp)
P
M
Short Tandem Repeat Polymorphism (STRP)
Microsatellite (2-5 bp)
P
M
GAATTC
What is STRP?
Weber JL and May PM, 1989. Abundant class of
human DNA polymorphism which can be typed
using the polymerase chain reaction. Am J Hum
Genet 44, 388-296.
Short Tandem Repeat Polymorphism (also
known as microsatellite or simple sequence
length polymorphism)
Characteristics of STRP
Multiple copies of an identical DNA
sequence arranged in direct succession in
a particular region of a chromosome
Highly polymorphic, abundant, multiallelic
DNA polymorphism

Ex: sequence “GCGCGCGCGCGC” represents six
copies of the dimer “GC”
Normally 2-5 base pair repeat
Application of STRP
Forensic Tests
 Paternity Tests
 Genetic Study of Heredity Diseases
 Molecular evolution and Phylogenetics

PCR Amplification
Fluorescence labeled primer (Blue)
PCR Product
Length
9 AT repeats marker
40 base pairs
60 base pairs
118 bp
7 AT repeats marker
40 base pairs
60 base pairs
114 bp
Instrument
ABI 384-well GeneAmp® PCR system 9700
Pooling PCR Product
Principle of pooling PCR products :


Different dyes with same size range
can be pooled into one tube
Same dye with different sizes can be
pooled into one tube
Benefit of pooling


Cost saving
Throughput increase
Instrument
Electrophoresis system

ABI 3730 48-capillary DNA analyzer
Electrophoresis Analysis
Laser
Catode
Anode
Sensor
Data Collection
Softwares
Genotyping Software

ABI GeneMapper® v3.0
Size Calling
Allele Calling
Results of STRP genotyping
Genotyping Quality Control
Check genotypes of CEPH (Centre d'Etude du
Polymorphisme humain) controls
Check genotypes of identical samples
Missing rate of each marker
Missing rate of each sample
Missing rate of each batch
Mendelian Inheritance Errors if family
information are available
Human Genome Unraveling in 2001
What is a SNP?
A variation in the genetic code at a specific point
on the DNA. Example order of bases in a section
of DNA on a chromosome:
...C C A T T G A C...
…G G T A A C T G...
...C C G T T G A C...
…G G C A A C T G...
Some people have a different
base at a given location
Single Nucleotide Polymorphism (SNP)
gSNP
Intergenic Region
Intrageneic Region
Coding
cSNP
Non-coding
rSNP
Functional Variants (5%)
iSNP
nSNP
Features of Single Nucleotide
Polymorphisms (SNPs)
the variant sequence type has a frequency of at least 1% in the
population.
high frequency of SNPs in human genome: estimated ~1 SNP/Kb.
SNP: bi-allelic markers with 2 common nucleotide substitution
alleles (0.1% of total SNPs is tri-allelic markers in TSC data).
SNPs has lower mutation rate than do repeat sequences, but not as
informative as microsatellite markers.
detection methods for SNPs are potentially more suitable for
genetic screening in automated and large-scale.
the SNPs are likely be responsible for the functional change of the
diseases: cSNPs.
Allele frequencies of some SNPs tend to be population-specific.
Application of SNP
Markers for whole genome linkage study of monogenic
diseases
Disease loci fine-mapping
Candidate genes association study
Whole genome association study of common diseases
Adverse drug reaction
Personal medicine
SNP genotyping
Genome-wide scans for Linkage Analysis:
2,800 – 4,600 markers.
Candidate Gene (LD mapping): 100 to 1,000
markers per cM.
Genome-wide scans for Association Studies:


~3,000,000 SNP markers
If haplotype map is constructed, ~200,000
markers
Genome-wide Linkage Analysis
Microsatellite markers



genome-wide scan with 10 cM microsatellite markers.
Advantage: low cost, high heterozygosity
Disadvantage: slow (genotyping and allele calling are
difficult to automate), widely spaced
SNP markers



Genome-wide linkage with ~1 cM 4,600 SNPs
Advantage: high-throughput (genotyping and allele calling
are easy to automate), densely spaced
Disadvantage: high cost, low heterozygosity
Disease Gene Mapping
microsatellite
chromosome
disease gene
genes
SNPs
Microsatellite markers are too widely spaced
to localize the disease gene.
SNPs are distributed densely (>1 SNP/kb).
What is “Haplotype”?
A set of closely linked alleles (genes or DNA
polymorphisms) inherited as a unit. A
contraction of the phrase "haploid genotype".
Haploid, a single set of chromosomes present in
the egg and sperm cells of animals.
Different combinations of polymorphisms are
known as haplotypes.
Collectively the results from several loci could be
referred to as a haplotype.
"Haplo" comes from the Greek word for "single".
International Haplotype Mapping
(HapMap) Project
The Hypothesis





Linkage disequilibrium (LD) occurs in blocks (10–100
kb) of low haplotype diversity
Few SNPs required to test common variation in each
block for association to phenotype
LD mapping is more evident with haplotypes than
single markers
Haplotype block is population specific
Block size: Chinese > Caucasian > African
Linkage disequilibrium: The occurrence of some
genes (SNPs) together, more often than would be
expected
International Haplotype Mapping
(HapMap) Project
The Project




$100 million investment over three years
1.5 million SNP assays to be developed
Participating countries include: United States,
United Kingdom, Canada, Japan, China
Study populations: Caucasian, African, Japanese,
Chinese
Cost and throughput consideration for genomewide association study using SNP markers
If average size of linkage disequilibrium (LD) is 30~50 Kb.
• 20 Kb per SNP marker
• 150,000 SNP markers analyzed per association study.
• at least 1,000 patients and 1,000 controls in a cohort.
• Cost per genotyping: 0.01 US dollar
A total of 300,000,000 SNP genotyping data per study
and try to associate with available phenotypes.
Cost: 300,000,000X0.01=3,000,000 US dollars
Factors in SNPs Technology
Platforms Consideration
Cost
Throughput
Accuracy
Experiment Design
Multiplex?
Pooling?
A workflow chart for SNP genotyping
Target fragment amplification
PCR (all except Invader assay)



Specific amplicon, filtering out repeat
sequence before designing PCR primers
PCR clean-up to remove unincorporated
dNTP (SAP) and/primers (Exo I)
Uniplex and/or multiplex
Strand displacement amplification
Biochemistry of Allelic Discrimination
Biochemical feature: high specificity and
accuracy
Enzymes or process used:
 DNA ligases, the best specific
 DNA polymerases, vary in specificity
 Hybridization, relatively lower discriminating
power
Allele discrimination based on
DNA polymerases
Utilizes feature of DNA polymerase


DNA synthesis activity in an accurate and precise way
5’ and 3’ exonuclease activity
Methods





Single-base extension or mini-sequencing
Allele-specific extension and allele-specific PCR
Synchronized DNA synthesis – Pyrosequencing
Structure specific cleavage -- the Invader assay
The 5’ exonuclease activity -- the TaqMan assay
Single-base extension
Extend a primer by one base, the terminators
(ddNTP) at the polymorphic site
SBE is highly specific, error rate < 10-4
Superior discrimination
Methods for allelic specific product
identification




Mass spectrometry, MALDI-TOF
Microarray-based detection
Fluorescent resonance energy transfer (FRET)
detection
Fluorescent polarization (FP)
Allele-specific extension (ASE)
and allele-specific PCR (AS-PCR)
ASE and AS-PCR utilize the difference in
extension efficiency between matched and
mismatched 3’bases
ASE requires two allele-specific primers that
anneal to the target with their 3’ bases
matching the two alleles of an SNP
Less discrimination power compared to SBE
Synchronized DNA synthesis pyrosequencing
When DNA polymerase incorporate nucleotide
on to the 3’ end of the primer, it produces a
pyrophosphate.
On pyrosequencing, the pyrophosphate
trigger an enzymatic cascade to produce a
detectable signal
High accuracy and quantifiable (pooling)
Principle of pyrosequencing system
Structure specific cleavage – the
Invader assay
DNA polymerase from archael bacterial has an endonuclease activity
that require specific substrate structure
Structure features:



Bifurcated duplex with a free 5’ end
There is at least one base overlapping the strand that has a free 5’ end
and the strand that is annealed to the template
The enzyme cleaves the strand with a free 5’ end when one or more
overlapping bases are detected
The Invader assay design uses 3 oligo probes, two allelic specific
ones and one invasive probe
Highly specific
Genotyping SNP directly from genomic DNA without PCR
amplification
The quality of Invader probe is critical
The Invader assay
The 5’ exonucleae activity –
TaqMan assay
The TaqMan assay exploit the 5’ nuclease activity of
DNA polymerase
Two dually labeled fluorescent probes annealed to SNP
site
The TaqMan assay exploit the difference in stability
between perfectly matched and single-base
mismatched duplexes
Requires optimization because subtle difference
between perfectly and mismatched duplexes
Probe design is empirical
Detection by fluorescent resonance energy transfer
(FRET)
TaqMan genotyping assay
Allelic discrimination based on
DNA ligases
DNA ligases join the end of two
oligonucleotides annealed next to each
other on a template
Ligation occurs only if the
oligonucleotides are perfectly matched
Oligonucleotide ligation genotyping
Allelic Discrimination based on
Hybridization
SNP genotyping by hybridization
exploits the subtle difference in
thermodynamics between a perfectly
matched and a single-base mismatched
duplexes
Detection basis:


Steady-state
dynamic
Microarray genotyping
Mechanisms for product detection
and identification
Homogeneous detection mechanism


Fluorescent resonance energy transfer (FRET)
Fluorescent polarization (FP)
Solid-phase-mediated detection mechanism




Identification of products by mass spectrometry
Microarray as supporter for genotyping
Fluorescent coded microbeads as supporters
Separation of products by electrophoresis
Fluorescent resonance energy
transfer (FRET)
FRET occurs between two fluorescent groups when
they are in physical proximity and one fluorophore’s
emission spectrum (the donor) overlaps the other’s
(the acceptor) excitation spectrum.
When FRET occurs the donor emission is quenched
and the acceptor emission increases when the donor
is excited
FRET can be monitored by quenching of donor
emission or increase of acceptor emission
Strategies for FRET detection
Separation of two closely spaced fluorescent
groups
Stitch two fluorescent groups together from
distant space to promote energy transfer
between fluorophores
FRET detection
FRET utilized SNP genotyping
technologies
Successfully used to detect allelic-specific products
from SBE, ASE, DNA hybridization, DNA ligation and
structure specific cleavage
SNP technologies that utilize FRET:




The TaqMan 5’ nuclease assay
The Molecular Beacons
The Invader assay
The scorpion AS-PCR assay
Identification of products by mass
spectrometry
Discriminating the allelic difference by mass
difference
A combination of ddNTP and dNTP for SBE extension
Multiplexable
Higher throughput and accuracy
One of the best quantification methods for allele
frequency estimation - Pooling
Microarray as supporter for
genotyping
High density (10,000 ~100,000 SNPs available)
High throughput (half million SNPs per day)
Well-characterized and unique DNA sequences attached
to microarray
Array the entire target sequences of the chip using
multiple overlapping oligonucleotides: resequencing by
hybridization




Novel SNP discovery
Cost
Low accuracy? (accuracy improved by multiple hyb at one SNP)
Linkage mapping, linkage disequilibrium and association studies
Small numbers of subjects and large number of SNPs
Fluorescent coded microbeads as
supporters
Similar to array that well-characterized and unique
DNA sequences attached to microbeads that
feature fluorescent signature
Capable of genotyping large number of SNPs in
parallel
Flexibility compared to chips
Competitive cost compared to chips
Separation of products by
electrophoresis
Tedious protocol
High cost
DHPLC, SNaPshot, SNuPe, etc.
Direct sequencing (gold standard)
Single-strand conformation
polymorphism analysis
Shi MM, Clin Chem, 2001
Direct heterozygote sequencing
Cost structure of SNP genotyping
Instrument and initial setup costs
Fixed cost per SNP marker
Consumable cost proportional to usage
Technician time, depend on protocol and automation
Maintenance cost
MassARRAY Genotyping Platfrom
Utilizing MALDI-TOF Mass
Spectrometer
MADI-TOF MS system
DNA fragment (extension product)
mix with matrix
Laser beam
Matrix absorb laser energy and
transfer to DNA fragment
DNA fragment become gas phase and
ionized
DNA fragment fly in the electricmagnetic vacuum field
Detector detect mass produce signal;
flight time in reverse ratio of
mass/charge
MassARRAY process
DNA Isolation
Target Amplification
SAP & Primer Extension Reaction
Primer
NNNNNNNNNNNN
Conditioning and Nano-dispensing of Products onto SpectroCHIP
MALDI-TOF Mass Spectrometry
Automated Data Analysis and Allele Calling
Principle of SNP genotyping
to neutralize free dNTPs
Principle of SNP genotyping
primer, terminators
Mass Spectrometry for Analysis
and Scoring
Haff and Smirnov, Genome
Res. 7 (1997): 378
Use mass spec to score
which base(s) add
Multiplex 4-10 with
known primer masses
Pool 50 to 500 samples
MassEXTEND Reaction
Allele 1
Allele 2
Unlabeled Primer (23-mer)
Same Primer (23-mer)
TCT
ACT
+Enzyme
+ddATP
+dCTP/dGTP/dTTP
Extended Primer (26-mer)
A
TCT
TG A
Allele 2
EXTEND Primer
Allele 2
Allele 1
ACT
EXTEND Primer
Allele 1
EXTEND Primer
Extended Primer (24mer)
Multiplex Analysis: 5-plex
*
T
C
*
*
A
C
T
G
*
A
G
*
A
G
MassARRAY Pooled Genotyping
MassARRAY Pooled Genotyping
MassARRAY Pooled Genotyping
Cost Effectiveness of Pooled Genotyping
Cost and Time for Case/Control
Study: Conventional Technology
1000 individuals x 100,000 SNPs
100 million genotypes
• Study Cost
$1.00/genotype
$0.01/genotype
$100 million
$1 million
• Time
100,000 genotypes/day
2 million genotypes/day
4 years
2 months
基因型鑑定技術平台
SpectroPREP
SNP (Single Nucleotide Polymorphism) 基因型鑑定平台
 Sequenom MassARRAY 7K System – 開放式系統(彈性
高)、低通量 (relatively low throughput, 7,000
genotypings/day)
Major
Instruments
(SNP,
Sequenom)
SpectroPOINT
SpectroREADER
基因型鑑定技術平台
Major
Instruments
STRP (short tandem repeat polymorphism) 基因型鑑定平台
(STRP)

Applied Biosystems 3730 DNA Analyzer
Hamilton MPH-96
SpectroREADER
ABI 3730
Hamilton MPH-96
“Bead-Array” System Genotyping
Platfrom
Genome-wide Linkage Analysis
Microsatellite markers
 genome-wide scan with 10 cM microsatellite markers.
 Advantage: low cost, high heterozygosity
 Disadvantage: slow (genotyping and allele calling are
difficult to automate), widely spaced
SNP markers
 Genome-wide linkage with ~1 cM 4,600 SNPs
 Advantage: high-throughput (genotyping and allele
calling are easy to automate), densely spaced
 Disadvantage: high cost, low heterozygosity
Affymetrix GeneChip SNP
Genotyping Platform
(Microarray-based)
Affymetrix Platform
Affymetrix Platform
Hybridization Oven
SNP (Single Nucleotide Polymorphism) 基因型鑑定平台

Affymetrix GeneChip System – 封閉式系統、 高通量
Major
Instruments (ultra-high throughput, half-million genotypings/day)
(SNP,
Sequenom MassARRAY及Affymetrix 為功能上互補之技
Affymetrix)
術平台
PCR Machine
Probe Array
Fluidics Station
Scanner
Data Analysis Software
SNPlex Overview
OLA/PCR on CE for Genotyping
SNPlex OLA/PCR Assay
Universal PCR Priming site
GER
P
ZipCode
A ASOX
G ASOY
LSO
GER
C
gDNA
Target
1. OLA
2. Clean-up
GER = Genome Equivalent Region
ASO = Allele Specific Oligo
LSO = Locus Specific Oligo
Ligation Product Formed
3. Multiplexed Universal PCR
Univ. PCR Primer
Biotin
Univ. PCR Primer
4. Capture
(Streptavidin)
5. Drag Chute Hybridization
•
Detection
6. Wash and release
Drag Chutes
•
••
•••
•••
•
7. Load on CE instrument
SNP 1
SNP 2
Universal Drag Chute
Fluorescent
Label
••
Mobility Modifiers
cZipcode
SNPlex 30-plex assay (48-plex capable)
16G
18G
1T
4A
27A
28C
23T
17G
31T
24C
20G
21G
19C
7T
20A
26A
22G
29A
3G
3A
4T
5A
6A
29G
10G
25G
32G
12C
9C
2G
14G
17C
25A
16C
21A
30T
11A
13A
30A
13G
Comparison between GeneChip and
Bead-Array SNP genotyping
System


Bead-Array: Close
GeneChip: Close
Throughput

Bead-Array (million SNPs/day) > GeneChip (half million/day)
Cost


Equipments: GeneChip (300K US$) < Bead-Array (1.5-2 million
US$)
Genotyping: GeneChip (0.01 US$/SNP) < Bead-Array (0.05 US$
Equipment cost
Labor

Bead-Array less labor intensive
Accuracy

Equivalent
Could the high-density SNP
panels be used in the
case/control association studies?
Stevens-Johnson Syndrome (SJS)
• Severe adverse drug reaction
• Caused by medication
• Widespread erythematous,
cutaneous macules with
blisters
• severe mucosal erosions
• >30% skin detachment
Toxic epidermal necrolysis
(TEN)
Incidence and Epidemiology of
Stevens-Johnson Syndrome
• Potentially life-threatening, high morbidity and mortality (5~15%).
• over 100 drugs can cause SJS
Taiwan
Incidence
8
*Western
countries
1-3
(per million people per year)
Major culprit drug
Carbamazepine
(CBZ)
Sulfonamides
*Ref: N Engl J Med. 1994, 1995
Aim: to identify the susceptibility gene for
Carbamazepine-induced Stevens-Johnson syndrome (CBZ-SJS)
• candidate genes approach
A. Drug metabolizing enzymes:
B. Immune response-related genes:
Bioactivation, detoxification
1. Phase I enzymes:
Cytochrome P450 superfamily:
CYP2C9, 2B6, 2C19, 2D6, 2E1, 3A4,
2C8, and 1A2…..
1. Antigen recognition:
HLA (human leukocyte antigen)
HLA-A, B, C, DQ, DP, DR
T Cell Receptor: Vb...
2. Phase II enzymes:
Microsomal Epoxide hydrolase
Arylamine N-acetyltransferase
UDP-glucuronosyl-transferase…..
3. Receptors
4. Transporters: MDR1 P-glycoprotein…
2. Mediators
Cytokines/Chemokines: TNF-a
Complements
Apoptosis proteins: Fas
Enzymes: perforin, granzyme, Leukotriene
synthase, …….
Extracellular matrix components
Comparison of allele frequencies in the MHC
region (Candidate gene approach)
• 56 CBZ-SJS patients vs. 101 tolerant control
46 SNPs: p<0.01 : MHC class I, II and III
6 SNPs: p<10-10 : between HLA-C and DRA
rs3130690: p<10-30 : near HLA-B locus
-log(p value)
40
SJS patients vs. tolerant group
30
20
10
0
A
29.8
30.8
C B
31.8
DRA
Position of SNP on Chr6 (Mb)
32.8
33.8
Genome-wide scan approach with
100,000 SNP markers
40 ADR controls vs. 41 SJS patients
Affymetrix GeneChips – 100K Xba/Hind 240
Call Rate
Reference sample done by NGC
Hind
Xba
Overall
Call Rate
93.71%
53641/57244
94.30%
555597/58960
94.26%
109238/116204
Concorda
nt Rate
99.47%
99.61%
99.54%
Call Rates at NGC
Mean ±
1S.D.
(Min , Max)
Hind*
Xba
Overall
0.9845± 0.0101 0.9898 ± 0.0070 0.9871± 0.0090
ADR cases
(0.9501 , 0.9948) (0.9730 , 0.9977) (0.9501 , 0.9977)
ADR
controls
0.9903 ± 0.0061 0.9945± 0.0017 0.9924± 0.0049
(0.9685 , 0.9977) (0.9889 , 0.9983) (0.9685 , 0.9983)
Overall
0.9874± 0.0088 0.9921± 0.0056 0.9898± 0.0077
(0.9501, 0.9977) (0.9730 , 0.9983) (0.9501 , 0.9983)
* One sample with call rate = 0.9385 was excluded.
Call Rate
Xba vs. Hind
Call Rate
Cases vs. Controls
Minor Allele Frequency
Minor
Allele
Cumulative Cumulative
Freq. Frequency Percent Frequency
Percent
0.0
18977
16.33
18977
16.33
0.0-0.1
25983
22.36
44960
38.69
0.1-0.2
20723
17.83
65683
56.52
0.2-0.3
18167
15.63
83850
72.16
0.3-0.4
16394
14.11
100244
86.27
0.4-0.5
15960
13.73
116204
100.00
Allele 1 Frequency
Allele 1 Frequency
Hind and Xba
Allele 1 Frequency
Allele 1 Frequency
Hind and Xba
Allele 1 Frequency
Hind and Xba
Table of map by group
group
map
0.00
.0-.1
.1-.2
.2-.3
.3-.4
.4-.5
.5-.6
.6-.7
.7-.8
.8-.9
.9-.99
1.0
Total
Hind
5559
4.78
9.71
58.53
6733
5.79
11.76
50.05
4685
4.03
8.18
46.43
4317
3.72
7.54
47.43
3679
3.17
6.43
45.78
3542
3.05
6.19
45.89
3670
3.16
6.41
44.38
3714
3.20
6.49
44.83
4199
3.61
7.34
45.82
5068
4.36
8.85
48.13
6435
5.54
11.24
51.15
5643
4.86
9.86
59.53
57244
49.26
Xba
3938
3.39
6.68
41.47
6720
5.78
11.40
49.95
5405
4.65
9.17
53.57
4784
4.12
8.11
52.57
4358
3.75
7.39
54.22
4176
3.59
7.08
54.11
4599
3.96
7.80
55.62
4571
3.93
7.75
55.17
4965
4.27
8.42
54.18
5462
4.70
9.26
51.87
6145
5.29
10.42
48.85
3837
3.30
6.51
40.47
58960
50.74
9497
8.17
13453
11.58
10090
8.68
9101
7.83
8037
6.92
7718
6.64
8269
7.12
8285
7.13
9164
7.89
10530
9.06
12580
10.83
9480
8.16
116204
100.00
Total
9497 + 9480 = 18977 non-polymorphisms
Allele 1 Frequency
by Ethnic Group
Non-SNP Allele Frequency by Ethnic Group
Frequency
Percent
Table of Han by Asian
Table of Han by Caucasian
Table of Han by African
Asian
Non
SNP
SNP
Caucasian
Non
SNP
SNP
African
Non
SNP
SNP
Han
Total
Han
Han
Total
Total
Non
SNP
14260
12.27
4717
4.06
18977
16.33
Non
SNP
7131
6.14
11846
10.19
18977
16.33
Non
SNP
1000
0.86
17977
15.47
18977
16.33
SNP
1975
1.70
95252
81.97
97227
83.67
SNP
3764
3.24
93463
80.43
97227
83.67
SNP
2093
1.80
95134
81.87
97227
83.67
16235
13.97
99969
86.03
116204
100.00
10895
9.38
105309
90.62
116204
100.00
3093
2.66
113111
97.34
116204
100.00
Total
Total
Table of Asian by Caucasian
Table of Asian by African
Caucasian
Non
SNP
SNP
African
Non
SNP
SNP
Asian
Total
Asian
Total
Table of Caucasian by African
Total
Caucasian
African
Non
SNP
SNP
Total
Non
SNP
6768
5.82
9467
8.15
16235
13.97
Non
SNP
955
0.82
15280
13.15
16235
13.97
Non-SNP
913
0.79
9982
8.59
10895
9.38
SNP
4127
3.55
95842
82.48
99969
86.03
SNP
2138
1.84
97831
84.19
99969
86.03
SNP
2180
1.88
103129
88.75
105309
90.62
10895
9.38
105309
90.62
116204
100.00
3093
2.66
113111
97.34
116204
100.00
3093
2.66
113111
97.34
116204
100.00
Total
Total
Total
Allele 1 Frequency
Han vs. American African
Allele 1 Frequency
Han vs. Caucasian
Allele 1 Frequency
Han vs. Asian
Heterozygosity
Test of HWE Proportion
Test of Allele Type Association
Chr 6 HLA Region
-log(p)
Chr 6 HLA Region
Comparison of allele frequencies in the MHC
region (Candidate gene approach)
• 56 CBZ-SJS patients vs. 101 tolerant control
46 SNPs: p<0.01 : MHC class I, II and III
6 SNPs: p<10-10 : between HLA-C and DRA
rs3130690: p<10-30 : near HLA-B locus
-log(p value)
40
SJS patients vs. tolerant group
30
20
10
0
A
29.8
30.8
C B
31.8
DRA
Position of SNP on Chr6 (Mb)
32.8
33.8
Affymetrix Platform - Summary
Overall average successful rate of Affymetrix 100K
GeneChips is 0.9898± 0.0077
Non-polymorphic SNPs account for 16.33% which
is significantly higher than Caucasian (9.38%) and
African American (2.66%)
Previous candidate gene approach can be replicated
in genome-wide scan approach with 100,000 SNP
Challenge of ultra-high throughput
SNP genotyping
Enormous SNP data for storage, transfer, and
extraction
-
billions SNP genotyping data per study
Statistical method for large number of SNP
data
Calculation power