Introduction

Download Report

Transcript Introduction

Presented by:
Andrew McMurry
Boston University Bioinformatics
Children’s Hospital Informatics Program
Harvard Medical School Center for BioMedical Informatics
This Presentation Available at:
http://pixelshelf.com/~justandy/f-snp.ppt
Outline

Incidental Findings and Disconnected Patient Cohorts

Disease Association Studies Using SNPs

How SNPs cause disease

Computationally predict affect of SNPs within introns, exons,
and regulatory regions

The Future Is Now:
SNPs, Personalized Medicine, and Translational Research
Incidental Findings and Disconnected Patient Cohorts

IF the central dogma of Biology is:
“From DNA ->RNA ->Protein”

THEN where is the patient data for association studies?
Very little patient data spanning DNA/RNA/
protein/phenotype across a single cohort
Need to obtain “robust” sample sizes to avoid incidental
findings due to multiple testing [1]
[1] Isaac Kohane, Daniel Masys, and Russ Altman.
"The Incidentalome: A Threat to Genomic Medicine"
JAMA 296(2): 212-215. July 12, 2006.
Disease Association Studies Using SNPs

DNA sequencing technologies still very expensive 
Stunningly few patients
Minimal sequence coverage

Could change in time with Solexa/454

Even with solexa/454 there is a massive task of piecing together
the results (often max sequence read shorter than single repeated
gene)

Rate limiting step: Adoption rate of DNA sequencing

Use what is available in abundance! SNP chips 
Abundance of SNP chips in public repos on many diseases
Whole genome coverage 500k SNPs for $250
Disease Association Studies Using SNPs
DNA to RNA to Protein

Associating DNA & RNA
GEO alone well over 100k Gene Expression Arrays
What if we could correlate SNPs affect on Gene Expression?

Associating DNA & Gene Product (protein)
Countless public protein databases
What if we could correlate SNPs affect on Protein Coding?

Association studies involving multiple genomic measurements
What are the existing studies and models (HMMs/Bayes nets)
that could be strengthened with evidence from SNP chips?
How SNPs cause disease

Intron


Protein Coding

•
•
Incorrect final mRNA transcript
Transcriptional Regulation
•
•
Missense
• Synonymous
 Same Amino Acid
• Non Synonymous  Different Amino Acid
Nonsense
• Premature STOP
Splicing Regulation
•
•
Likely no affect
Differential gene expression
Post Translational
•
Protein phosphorylation
So how do we measure all these affects of SNPs?
F-SNP : integrated approach
1.
Classify SNP site using dbSNP
•
•
•
•
•
Intron
Coding Region
Splice Site
TF binding Site
Post-Translational Site
2. Evaluate using the specialized algorithms/dbs
•
•
•
•
3.
Coding region
Splice Site
TF binding Site
Post-Translational Site
(missense/nonsense mutations)
(intronic/exonic sites)
(promoter/repressor/etc)
(Phospho/Tyrosine/0-glycosylation)
“Majority Vote” across algorithms
F-SNP decision procedure for functional SNPs
F-SNP: User Interfaces & Data Download

Public Web Site

Federated Query =
entire database cannot be downloaded

Currently:
no SOAP (webservice) support
no RSS support
No source code available

However:
Paper gives explicit instructions on how to reproduce the
algorithm and construct the database using dbSNP, OMIM,
etc.
“Large N Study” using F-SNP
Functional Category
# of Assessed SNPs # of Functional SNPs
Protein Coding
154,140
66,899
Splicing Regulation
73,051
8,075
Transcriptional Regulation
453,710
78,296
Post Translation
64,736
4,477
Total
559,322
115,356
Evaluate Individual SNP (rs28897699)
SNP summary and Functional Predictions
SNP Primary Information (rs28897699)






Locus
Alleles
Ancestral Allele
Validation (if any)
Region
Link to References
F-SNP: Functional Predictions
F-SNP Prediction Detail:
PolyPhen = benign affect on protein coding
F-SNP Prediction Detail:
SNPs3D = deleterious to protein coding
NCBI Gene Information
Product breast cancer 1, early onset
Other names,BRCA1,BRCAI,BRCC1,IRIS,PSCP,RNF53
NCBI Entrez Gene Summary: This gene encodes a nuclear phosphoprotein that plays a role in
maintaining genomic stability and acts as a tumor suppressor. (…) Mutations in this gene are
responsible for approximately 40% of inherited breast cancers and more than 80% of inherited
breast and ovarian cancers. Alternative splicing plays a role in modulating the
subcellularlocalization and physiological function of this gene. Many alternatively spliced
transcript variants have been described for this gene but only some have had their full-length
natures identified. (…)
F-SNP functional prediction
on Protein Coding
 2 votes benign, 1 deleterious, 1 nonsynonymous
on Splicing Regulation
 predicted functional impact (by majority vote)
Gene level view of BRCA1
Query by gene name = “BRCA1”
Returns list of SNPs in BRCA1
Returns list of Cancers associated with BRCA1
Gene level view of BRCA1
our SNP has functional impact
our SNP has neighboring functional SNPS
Disease Level View : Breast Cancer
Disease Level View : Breast Cancer
Show all disease genes associated with breast cancer
Denote if SNPs are present in those genes (5k up/downstream)
Recap of Disease Level View
The Future Is Now:
SNPs, Personalized Medicine, and Translational Research
SNP profiling becoming part of routine care [2]
Increase # of clinically annotated SNP chips 
Increase # of disease association studies using SNPs
Increase in NIH focus on “translational research” that bridges routine
care delivery with research efforts
Genome Wide Association Studies (GWAS) that actually get funded
[2] Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel
“LM. Medicine. Reestablishing the researcher-patient compact.”
Science. 2007 Nov 16;318(5853):1068.
F-SNP Summary

Incidental Findings and Disconnected Patient Cohorts

Central dogma of biology DNA->RNA-Protein, yet we lack cohort spans all measurements

Using limited sample size will inevitably lead to incidental outcomes

Disease Association Studies Using SNPs

Don’t wait for DNA sequencing to become widespread

SNPs are becoming an abundant resource and not going to disappear

How SNPs cause disease

Protein Coding

Splicing Regulation

Transcription Regulation

Post Translation

Computationally predict affect of SNPs within introns, exons, and regulatory
regions




Multitude of existing SNP analysis tools and resources
F-SNP provides a single web based resource to mine SNP disease associations
Query and analysis by SNP, Gene, Disease
The role of SNPs in Personalized Medicine & and Translational Research