Some Practical Considerations for Planning Candidate Gene

Download Report

Transcript Some Practical Considerations for Planning Candidate Gene

Consideration for Planning a Candidate
Gene Association Study With TagSNPs
Shehnaz K. Hussain, PhD, ScM
[email protected]
Epidemiology 243: Molecular Epidemiology
Objectives
 Molecular genetics primer
 Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
 Factors influencing statistical power
Central dogma
ATCG
DNA
mRNA
Protein
What are SNPs?
 More than 99% of all nucleotides are the same
in all humans
 1% of nucleotides are polymorphic
 SNPs>> insertions-deletions
 Bi-nucleotide – T (80%)
 Where do SNPs occur?
 Exons
 Introns
 Flanking regions
A (20%)
What are haplotypes?
 A haplotype is the pattern of nucleotides on a
single chromosome
 Two “copies” of each chromosome
 The haplotype inference problem
?
T
T
?
C
G
G
T?
A
A
TA TT CG GG TA AA
?
A
T
?
G
G
?
A
A
What is linkage disequilibrium?
 Linkage disequilibrium (LD) describes the nonrandom association of nucleotides on the
same chromosome in a population
 One nucleotide at one position (locus) predicts the
occurrence of another nucleotide at another locus
No LD
LD
What are markers?
Disease
Phenotype
Test for association
between phenotype and
marker loci
Test for genetic
association between the
phenotype and the DSL
LD
Candidate gene
Marker loci
(SNPs)
Disease
Susceptibility
Locus
What are tagSNPs?
 TagSNPs are a subset of all SNPs in a gene
that mark groups of SNPs in LD
 Avoids redundant genotyping
LD
Marker loci
(SNPs)
LD
Disease
Susceptibility
Locus
The joint effect of tagSNPs in
cytokine genes and cigarette
smoking in cervical cancer risk
T-cell proliferation
IL-2
IL-2 gene
IFNγ gene
IL-2
receptor Proliferation
Proliferation
of
ofTH1-cells
TH1-cells
IFNγ
Activated T-cell
Background
 Cigarette smoking ↑ 1.5- to 3-fold cancer risk
 Cigarette smoking ↓ levels of IL-2 and IFNγ
(cervical and circulating)
 ↓ levels of IL-2 and IFNγ
 HPV persistence in the cervix
 Cervical neoplasia
 Decreased survival from invasive cervical cancer
Model
Cigarette smoking
SNPs in IL-2,
IL-2R, and IFNG
HPV-associated
squamous cell
cervical cancer
Methods
 Study design
 Population-based case-only study
 Subjects
 308 Caucasian squamous cell cervical cancer cases
diagnosed 1986-2004
 Residing in 3 western Washington counties
 Data collection
 Structured in–person interviews
 DNA isolated from buffy coats
Objectives
 Molecular genetics primer
 Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
 Factors influencing statistical power
Multi-stage tagSNP design
Select reference panel
Re-sequence panel, identify SNPs
(many markers, few subjects)
Choose tagSNPs
Genotype tagSNPs in main study
(few markers, many subjects)
1. Select reference panel
 Definition
 A sample of your study population
 Most representative
 Samples from the Coriell Repository
 Ability to integrate your data with other
resources
= Candidate gene SNPs
= HapMap SNPs
2. Re-sequence reference panel
Amplify and Sequence DNA
Gene
Phred
Phrap
(Ewing, 1998)
(Ewing, 1998)
PolyPhred
(Nickerson, 1997)
Alternatives to re-sequencing
 Program for Genomic Applications (PGA)
 SeattleSNPs – inflammation
 NIEHS SNPs – environmental response
 Innate Immunity
 International HapMap Project
 5 million SNPs in four ethnically distinct
populations
3. Choose tagSNPs (LD)
Option
LDSelect
Tagger
(Carlson, 2002) (de Bakker, 2005)
r2 threshold (0.80)
Yes
Yes
SNP exclusions/inclusions
No
Yes
SNP design score
No
Yes
LDSelect output for IL-2
SeattleSNPs, r2≥0.80, MAF ≥0.05, Caucasians
Bin
Total Number
of Sites
1
2
2
2
TagSNPs
rs2069763
rs2069772
rs2069776
rs2069778
3
2
rs2069777
rs2069779
4
1
rs2069762
Genomic context
 Exons (cSNPs)
 SIFT (Ng, 2002)
 PolyPhen (Ramensky, 2002)
 Upstream flanking region
 Intron-exon junctions
Sequence conservation
 UCSC Genome Browser, PhasCons (Siepel,
Score
2005)
Repeat region
Unique region
Objectives
 Molecular genetics primer
 Databases and tools to conduct in silico
analyses for tagSNP selection/prioritization
 Factors influencing statistical power
0
Minor allele frequency and
genetic model
300 cases, 300 controls, alpha=0.05
Log-additive
Dominant
1.0
0.6
0.8
0.4
0.2
1.5
0.0
1.02.0
Effect Size
Power
0.8
Power
Power
Log-additive
0.6
0.8
1.0
0.6
0.8
0.4
0.2
0.0
2.0
1.0
1.52.5
2.0
0.20
0.30
0.6
0.4
0.2
0.0
1.0 2.5
Effect Size
Effect Size
0.10
Recessive
0.4
0.2
1.5
2.5
1.0
Power
1.0
Recessive
0.0
1.5 1.0
2.01.5
Effect Size
0.40
Minor allele frequencies
0.50
2.5
2.0
Effect Size
2.5
LD
SNPs
genotyped
SNPs not
genotyped
r2
Sample size requirement
S1
S2
S1 and S2
-
-
600
600
S1
S2
1.00
600
600
S1
S2
0.85
600
706
 N/r2 (Pritchard, 2001)
Genotype error
 Generally non-differential
 Reduces your power
 Every 1% increase in genotyping error rates
requires sample size increased by 2-8%
(Zou et al, 2004, Genetic Epidemiology)
 Depends on error model
Power calculators
 Quanto
 G, E, G X E, G X G
 Case-control, case-sibling, case-parent, and
case-only designs
 Quantitative or binary outcome
 htPowercc
 r2
 Power for Association With Error (PAWE)
 Genotyping errors
TagSNP summary
 Efficient yet comprehensive coverage of the
genetic variation in our candidate genes
 Reduce costs
 Preference should be given to putatively
functional variants:
 Literature, gene context, sequence conservation
 Influences of statistical power:
 MAF, genetic model, LD, and genotyping error
 Programs for Genomic Applications
 SeattleSNPs, http://pga.mbt.washington.edu
 NIEHS, http://egp.gs.washington.edu/
 Innate Immunity, http://innateimmunity.net/
 International HapMap, http://www.hapmap.org/
 Coriell cell repository, www.coriell.org
 cSNP predictive analysis:
 SIFT, http://blocks.fhcrc.org/sift/SIFT.html
 PolyPhen, http://coot.embl.de/PolyPhen
 Vista, http://genome.lbl.gov/vista/index.shtml
 The following programs can be found at the Rockefeller
site, http://linkage.rockefeller.edu/soft/




Tagger
LDSelect
PAWE
Quanto