CarolineLee.pps 8.9Mbytes
Download
Report
Transcript CarolineLee.pps 8.9Mbytes
Pharmacogenomics and Personalized
Medicine‐ advances and possible roles of
bioinformatics
Caroline Lee, PhD
Associate Professor, Department of Biochemistry,
National University of Singapore
Associate Professor, DUKE-NUS Graduate
Medical School
Principal Investigator, National Cancer Centre
Jan 15, 2001
Lab Members
Wang Yu
Steven
Wolf
Peiyun
Soo Ting
Cheryl
Yin Yee
Zihua
Jianwei
Jingbo
Jingbo
Steven
Theng
Grace
Lester
Naajia
Thuan Tzen
Way Champ
Yiwei
Similar Yet Not the Same!
Genotypically, we are similar
Phenotypically, we are different
99.9% of the genome between different
individuals are identical
Predisposition to diseases
Response to drugs
Hence the 0.1% difference in our genome
(polymorphisms) is important in determining
our differences
Inter-Individual Variation???
Various forms of variations
(polymorphisms) exist….e.g.
dinucleotide repeats,
microsatellite repeats, copynumber repeats…..
Commonest are SINGLE
NUCLEOTIDE
POLYMORPHISMS (SNPs)
Account for 90% of interindividual variation
Why are there so
much
interest in SNPs?
The Promise of
Personalized
Medicine!!
Disease Prevention, Management and Treatment
Tailored according to the Individual’s Genetic Profile!!
Why is “Personalized
Medicine” so hotly
pursued??
Because Current Medicine are
Optimized for Many, Not for Everyone!
BUT people differ in susceptibility to disease
and response to medicine
“One-Size-Fits-All” Doesn’t Really Fit
In the US alone, > 1 billion prescriptions for tens of billions
of drug doses are written for >10,000 different medications
Yet, a drug that is effective for an individual may not be
effective for another individual…or worse another individual
may suffer from ADVERSE DRUG REACTIONS (ADR)
ADR is one of the leading cause of hospitalization and death in
the US
Enormous economic cost on health care providers
1994 in the US
ADR account for >2.2 million serious cases and over 100,000 deaths
FDA Reports:
8% of Phase I Trial Drugs are ultimately approved
4% of approved drugs are withdrawn
Caskey, 2007. The Drug Development Crisis. Efficiency and Safety. Ann Rev Med 58:1-6
John is diagnosed with hypertension
Current approach
Reactive
Drug 1
The hypertension unresolved
Serious side effect
Drug 2
Hypertension eased
Side effect
Drug 3
The hypertension controlled
No side effects.
Personalized medicine
Proactive
Drug response
prediction test
Drug 3
The hypertension is controlled without significant side effects
In many visits!
With 3 drugs!
In one visit!
With one drug!
CURRENT PROBLEM about SNPs !!
•~56 million SNPs deposited in NCBI !!
•>14 million non-redundant SNPs
•~6.6 million validated SNPs
However, NOT all the SNPs
are functionally significant
•Affect function
•Associated with Disease /
Drug Response
Shi (2001) Enabling Large-Scale Pharmacogenetic Studies by High Throughput
Mutation Detection and Genotyping Technologies. Clinical Chemistry 47(2):164-172
Current Strategies for Association Studies
Whole Genome Association (WGA)
Primarily tag-SNPs or t-SNPs (e.g. llumina) or ‘quasi-random’
(QR-SNPs) (e.g Affymetrix)
Successful for some diseases (e.g. Diabetes) but not for others
(e.g. Parkinsons)
Major Problems:
Does not directly interogate causative SNP
Multiple Testing / Type I error
Candidate Gene Approach
Even a single gene e.g. ABCB1 which is ~210 kb long has >1000
SNPs.
Selection of SNPs for association based on
Non-synonymous SNPs
Not all non-synonymous SNPs are functionally important
Some synonymous /intronic SNPs may be functional / causative e.g changes ESE
sites or codon usage/folding of protein (e.g. ABCB1 synonymous SNP3435)
“gut” feeling??
Current Strategies for the selection of a subset of
SNPs for Association Study
Quasi-random SNPs (QR-SNPs) (Affymetrix)
SNPs selected “unbiasedly” and quasi-randomly to cover the entire
genome
Selection of SNPs limited by sequence constraints of the Affymetrix genotyping
technology
High Type I error
Frequently NOT interogating the functionally significant SNPs
Tagging-SNPs (t-SNPs)
Subset of SNPs that still retains as much of the genetic variation as
the full set.
Not all genes are amenable to tagging-SNP strategy (e.g. low LD
genes and recent positively selected SNPs)
Still High Type I error
Frequently NOT interogating the functionally significant SNPs
Concept of tag-SNPs (t-SNPs)
–Tagging SNPs
•Genetic variants that are near each other tend to be inherited
together (Linkage Disequilibrium or LD)
•Architecture of Genetic Variants show that there are blocks of SNPs
in high LD
–Determine the extent and number of blocks in the human
genome
–Genotype representative SNP in each block.
www.rikenresearch.riken.jp/frontline/400/
Our Approach
Integrate Different Resources and Computational
Approaches to Predict Potentially Functionally
Significant SNPs
Reported Functional Significance
OMIM, PubMed, dbGAP, HGMD
Inferred / Predicted Functional
Significance
SNPs under Natural Selection
SNPs
predicted to reside in
functionally important domains
SNPs
predicted
to
alter
consensus
sequences
of
functionally important sites
Potential Functional Significant SNPs
General
SNPs reported to be associated with
disease or function
OMIM, PubMed, HGMD, dbGAP
SNPs that show evidence of selection
Recent Positive Selection (RPS)
Population Differentiation
Conserved Non-coding Regions (CNS)
SNPs that show functionality
at mRNA Level
Seed, mature non-seed, stem, loop
Non-synonymous SNPs
Domain SNPs reside in
Affect glycosylation/phosphorylation, etc
Predicted to be deleterious
Synonymous SNPs
Show significant codon-usage differences
Affect Nonsense Mediated Decay
(NMD)
Change ESE/ESS sites
Intronic SNPs
SNPs in coding region
Change TF Binding Sites
Coding Region SNPs
SNPs residing in miRNA coding
region
Promoter SNPs
Reside in Splice Junction
Change ISRE
3’UTR SNPs
Change miRNA binding
Reside on conserved 3’UTR region
Change TF Binding Sites
Characteristics of Potential Functional Significant
SNPs (pfs SNPs)
930,324 out of 14,171,351 total SNPs (~6.6%) in the
genome were found to be potentially functionally
significant.
Pfs SNPs are enriched in coding and promoter regions
Nonsynonymous
Nonsynonymous
Synonymous
Coding
Genic
Coding
Intergenic
Noncoding
Noncoding
Intergenic
Synonymous
Genic
Intron
Intron
Promoter
ALL SNPs (dbSNP129)
Promoter
3’UTR
pfsSNPs
3’UTR
SNPs within the pfsSNPs database is different
from those in the t-SNP and QR-SNP databases
t-SNPs
640,520
179,955
88,874
620,865
74,562
584,261
QR-SNPs
pfsSNPs
146,023
The 3 datasets are UNIQUE with less than 10% of SNPs
that are common amongst the different methods of SNP
selection.
Gross chromosome structure of the 3 datasets are similar
50000000
10000000
0
15000000
0
20000000
0
1
50000000
10000000
0
15000000
0
50000000
10000000
0
15000000
0
20000000
0
2
3
50000000
10000000
0
15000000
0
4
50000000
10000000
0
15000000
0
50000000
10000000
0
5
13
50000000
10000000
0
15000000
0
50000000
10000000
0
6
14
50000000
10000000
0
15000000
0
50000000
7
15
50000000
10000000
0
50000000
16
8
50000000
10000000
0
50000000
50000000
18
17
9
50000000
10000000
0
50000000
10
50000000
20
19
50000000
50000000
10000000
0
11
21
50000000
10000000
0
22
50000000
10000000
0
12
Y
X
SNP coverage for pfsSNPs (red), t-SNPs (green) and QR-SNPs
(blue) are similar evenly distributed across the chromosomes
pfSNPs dataset has more ungenotyped and
monomorphic SNPs than t-SNP / QR-SNP datasets
0.6
CHB
JPT
CEU
YRI
t-SNP
CHB JPT CEU YRI
4.57 4.53 4.11 3.76
16.40 17.88 10.82 7.91
QR-SNP
CHB JPT CEU YRI
0.74 0.75 0.80 1.16
16.52 20.13 12.38 5.59
0.5
0.4
0.3
0.2
0.1
0
% Not Genotyped
% Monomorphic
pfSNP
CHB JPT CEU YRI
47.89 47.88 47.98 48.22
25.55 26.77 20.59 15.14
pfsSNPs are More Enriched in Regions of
Functional Significance
Genome
Nonsynonymous
PFS
Synonymous
Nonsynonymous
Coding
Genic
Coding
Intergenic
Noncoding
Noncoding
Intergenic
Synonymous
Genic
Intron
Intron
t-SNPs
Promoter
3’UTR
Promoter
Nonsynonymous
QR-SNPs
Synonymous
Nonsynonymous
Intergenic
Noncoding
Noncoding
Genic
Genic
Intron
Intron
Promoter
3’UTR
Synonymous
Coding
Coding
Intergenic
3’UTR
Promoter
3’UTR
t-SNP and QR-SNP datasets contain low
percentage of pfSNPs
76.4%
100
70
4.2%
10.2%
7.1%
9.1%
22.4%
26.8%
10.1%
18.8%
7.8%
8.8%
30.5%
8.1%
5.2%
10
6.9%
20
20.3%
40
28.3%
50
QR-SNPs
37.4%
60
30
t-SNPs
53.5%
80
15.1%
% of Regional pfsSNPs
90
0
TFBS
NMD
ESE/ESS
mRNA
Promoter
Domain
Deleterious
aa ∆
ISRE
Splice
Site
miR
Binding
Conserved
Region
miR coding
Protein
Coding Region
Intron
3’ UTR
miRNA
t-SNPs contain more putatively functionally
significant SNPs than QR-SNPs
Which SNP selection Method is Better Associated
with Differences in Gene Expression?
2 publications associated genotype
(from HapMap) with expression:
pf sSNP
408
173193
2.36
Kwan et al
Expression related SNP
No. SNPs genotyped in HAPMAP
Function SNP per 1000 SNP
QR-SNP
103
838072
0.12
t-SNP
177
945242
0.19
pf sSNP
155
486642
0.32
0.030%
0.25%
0.025%
0.20%
0.020%
0.15%
0.10%
0.05%
0.015%
QR-SNPs
t-SNP
672
456366
1.47
0.30%
t-SNPs
QR-SNP
275
251899
1.09
0.035%
pfsSNPs
Stranger et al
Expression related SNP
No. SNPs genotyped in HAPMAP
Function SNP per 1000 SNP
0.35%
QR-SNPs
Stranger: 1218 expression associated
SNPs from 1,156,868 SNPs in
HapMap I, r16.
Kwan:
412
transcript-isoform
associated SNPs from 4,000,107
SNPs in HapMap II, r21
t-SNPs
pfsSNPs
0
0.010%
0.005%
0
Stranger et al
Kwan et al
pfsSNPs are better associated with Differences in Gene Expression than
t-SNPs /QR-SNPs
Stranger et al (2007) Science 315:848-853
Kwan et al (2008) Nat Genetics 40:225
Web Resource that provide
detailed description of potential
functionality of pfs SNPs
Target Audience of the Web-Resource
Scientists interested in
Whole Genome Association Studies
Gene-Based Association Studies
Identify pfsSNPs in
Specific Genes
Genes expressed in Specific Tissues
Genes expressed in Specific Chromosome Region
Genes in Specific Pathways/Gene Ontology Terms
Genes associated with Specific Disease
Genes associated with Specific Drugs
Specific genomic regions
Specific populations at certain threshold frequency.
Designing Experiments to address functionality of SNPs
Identifying pfsSNPs that are in LD with the scientist’s
SNP-of-interest.
Useful for scientist who has performed GWAS using the
Affymetrix QR-SNPs or the Illumina t-SNPs
Query Page
Query Page
Query Page
Biologist-Friendly Features of the WebResource
Highly Customizable Query Interface.
Auto-Complete Prompt-As-You-Type feature in
the Query Interface.
SNP ID can be identified through information with
regards to the gene, specific genomic region or
amino acid number.
Excel-Like Presentation of Results with InDepth Resource Reference
Information Sharing and Contribution by the
Scientific Community.
Typical Results (in Excel Format)
Demonstration of WebResource
http://pfs.nus.edu.sg/
Acknowledgements
Wang Jingbo (RA and Part-Time Graduate
Student)
Mostafa Ronaghi (Stanford University)