Some Practical Considerations for Planning Candidate Gene
Download
Report
Transcript Some Practical Considerations for Planning Candidate Gene
Consideration for Planning a Candidate
Gene Association Study With TagSNPs
Shehnaz K. Hussain, PhD, ScM
[email protected]
Epidemiology 243: Molecular Epidemiology
Objectives
Molecular genetics primer
Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
Factors influencing statistical power
Central dogma
ATCG
DNA
mRNA
Protein
What are SNPs?
More than 99% of all nucleotides are the same
in all humans
1% of nucleotides are polymorphic
SNPs>> insertions-deletions
Bi-nucleotide – T (80%)
Where do SNPs occur?
Exons
Introns
Flanking regions
A (20%)
What are haplotypes?
A haplotype is the pattern of nucleotides on a
single chromosome
Two “copies” of each chromosome
The haplotype inference problem
?
T
T
?
C
G
G
T?
A
A
TA TT CG GG TA AA
?
A
T
?
G
G
?
A
A
What is linkage disequilibrium?
Linkage disequilibrium (LD) describes the nonrandom association of nucleotides on the
same chromosome in a population
One nucleotide at one position (locus) predicts the
occurrence of another nucleotide at another locus
No LD
LD
What are markers?
Disease
Phenotype
Test for association
between phenotype and
marker loci
Test for genetic
association between the
phenotype and the DSL
LD
Candidate gene
Marker loci
(SNPs)
Disease
Susceptibility
Locus
What are tagSNPs?
TagSNPs are a subset of all SNPs in a gene
that mark groups of SNPs in LD
Avoids redundant genotyping
LD
Marker loci
(SNPs)
LD
Disease
Susceptibility
Locus
The joint effect of tagSNPs in
cytokine genes and cigarette
smoking in cervical cancer risk
T-cell proliferation
IL-2
IL-2 gene
IFNγ gene
IL-2
receptor Proliferation
Proliferation
of
ofTH1-cells
TH1-cells
IFNγ
Activated T-cell
Background
Cigarette smoking ↑ 1.5- to 3-fold cancer risk
Cigarette smoking ↓ levels of IL-2 and IFNγ
(cervical and circulating)
↓ levels of IL-2 and IFNγ
HPV persistence in the cervix
Cervical neoplasia
Decreased survival from invasive cervical cancer
Model
Cigarette smoking
SNPs in IL-2,
IL-2R, and IFNG
HPV-associated
squamous cell
cervical cancer
Methods
Study design
Population-based case-only study
Subjects
308 Caucasian squamous cell cervical cancer cases
diagnosed 1986-2004
Residing in 3 western Washington counties
Data collection
Structured in–person interviews
DNA isolated from buffy coats
Objectives
Molecular genetics primer
Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
Factors influencing statistical power
Multi-stage tagSNP design
Select reference panel
Re-sequence panel, identify SNPs
(many markers, few subjects)
Choose tagSNPs
Genotype tagSNPs in main study
(few markers, many subjects)
1. Select reference panel
Definition
A sample of your study population
Most representative
Samples from the Coriell Repository
Ability to integrate your data with other
resources
= Candidate gene SNPs
= HapMap SNPs
2. Re-sequence reference panel
Amplify and Sequence DNA
Gene
Phred
Phrap
(Ewing, 1998)
(Ewing, 1998)
PolyPhred
(Nickerson, 1997)
Alternatives to re-sequencing
Program for Genomic Applications (PGA)
SeattleSNPs – inflammation
NIEHS SNPs – environmental response
Innate Immunity
International HapMap Project
5 million SNPs in four ethnically distinct
populations
3. Choose tagSNPs (LD)
Option
LDSelect
Tagger
(Carlson, 2002) (de Bakker, 2005)
r2 threshold (0.80)
Yes
Yes
SNP exclusions/inclusions
No
Yes
SNP design score
No
Yes
LDSelect output for IL-2
SeattleSNPs, r2≥0.80, MAF ≥0.05, Caucasians
Bin
Total Number
of Sites
1
2
2
2
TagSNPs
rs2069763
rs2069772
rs2069776
rs2069778
3
2
rs2069777
rs2069779
4
1
rs2069762
Genomic context
Exons (cSNPs)
SIFT (Ng, 2002)
PolyPhen (Ramensky, 2002)
Upstream flanking region
Intron-exon junctions
Sequence conservation
UCSC Genome Browser, PhasCons (Siepel,
Score
2005)
Repeat region
Unique region
Objectives
Molecular genetics primer
Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
Factors influencing statistical power
0
Minor allele frequency and
genetic model
300 cases, 300 controls, alpha=0.05
Log-additive
Dominant
1.0
0.6
0.8
0.4
0.2
1.5
0.0
1.02.0
Effect Size
Power
0.8
Power
Power
Log-additive
0.6
0.8
1.0
0.6
0.8
0.4
0.2
0.0
2.0
1.0
1.52.5
2.0
0.20
0.30
0.6
0.4
0.2
0.0
1.0 2.5
Effect Size
Effect Size
0.10
Recessive
0.4
0.2
1.5
2.5
1.0
Power
1.0
Recessive
0.0
1.5 1.0
2.01.5
Effect Size
0.40
Minor allele frequencies
0.50
2.5
2.0
Effect Size
2.5
LD
SNPs
genotyped
SNPs not
genotyped
r2
Sample size requirement
S1
S2
S1 and S2
-
-
600
600
S1
S2
1.00
600
600
S1
S2
0.85
600
706
N/r2 (Pritchard, 2001)
Genotype error
Generally non-differential
Reduces your power
Every 1% increase in genotyping error rates
requires sample size increased by 2-8%
(Zou et al, 2004, Genetic Epidemiology)
Depends on error model
Power calculators
Quanto
G, E, G X E, G X G
Case-control, case-sibling, case-parent, and
case-only designs
Quantitative or binary outcome
htPowercc
r2
Power for Association With Error (PAWE)
Genotyping errors
TagSNP summary
Efficient yet comprehensive coverage of the
genetic variation in our candidate genes
Reduce costs
Preference should be given to putatively
functional variants:
Literature, gene context, sequence conservation
Influences of statistical power:
MAF, genetic model, LD, and genotyping error
Programs for Genomic Applications
SeattleSNPs, http://pga.mbt.washington.edu
NIEHS, http://egp.gs.washington.edu/
Innate Immunity, http://innateimmunity.net/
International HapMap, http://www.hapmap.org/
Coriell cell repository, www.coriell.org
cSNP predictive analysis:
SIFT, http://blocks.fhcrc.org/sift/SIFT.html
PolyPhen, http://coot.embl.de/PolyPhen
Vista, http://genome.lbl.gov/vista/index.shtml
The following programs can be found at the Rockefeller
site, http://linkage.rockefeller.edu/soft/
Tagger
LDSelect
PAWE
Quanto