Transcript Slide 1
PolyPhen and SIFT: Tools for
predicting functional effects of SNPs
Epi 244
Spring 2009
Sam S. Oh
Human genome variation
• 3.2 billion base pairs (bp)
• 99.9% similarity across individuals
– 3.2 million bp dissimilar
• ~11 million SNPs
– Coding vs. non-coding (intron and intergenic regions)
– Most are synonymous
Frazer et al. Nat Rev Genet,
2009;10:241-251
DNA → RNA → Protein
Example: sickle-cell anemia
• A to T SNP of beta-globin gene results in
glutamate (hydrophilic) to valine
(hydrophobic) substitution
Example: MTHFR
• Folate metabolism
Finding MTHFR SNPs
Highlight all
refSNP numbers
(use scroll bar)
and copy
Note Build number
(currently Build 130)
Highlight all
refSNP numbers
(use scroll bar)
and copy
SIFT
• Sorting Intolerant From Tolerant
• Predicts tolerability of AA substitution
effects (i.e., non-synonymous SNPs)
based on
– Sequence homology
– Physical properties of amino acids
• Can be applied to naturally occurring
nonsynonymous polymorphisms and
laboratory-induced missense mutations
Compare Build numbers
Copy all SNP IDs and
paste into SIFT. Choose
“Submit Query”
Getting more info for rs2274974
Enter “rs2274974”
Flanking sequence,
IUPAC code,
Allele info
flanking seq
Build number
mRNA name
Protein name
Contig name
Position of SNP in
mRNA, protein, contig
Scroll down
Select protein
Note AA1, AA2,
and position
Copy FASTA-formatted
protein sequence
Paste FASTA-formatted
protein sequence
Enter AA substitution
[Letter1-position-Letter2]
Substitution occurs
at AA 566
Scroll down
Check tolerance of
AA substitutions
Scroll down
“Substitution at pos 566 from G to E is predicted to
AFFECT PROTEIN FUNCTION with a score of 0.01.
Tolerance of specified
substitution
Polymorphism Phenotyping
• Tool for prediction of possible impact of amino acid
substitution (i.e., non-synonymous SNPs) on protein
structure and function based on:
– Amino acid sequence
• What part of the protein did the SNP occur? (E.g., active site,
binding site, transmembrane region)
– Multiple alignments with homologous proteins and mammalian
orthologues
• How compatible is the substitution based on proteins of comparable
sequence?
– 3D structural properties with the substituted amino acid
• What is the substitution’s effect on the protein’s physiochemistry?
(E.g., hydrophobicity, electrostatic interactions, ligand binding)
PolyPhen data flow
Four potential predictions
• Probably damaging
– It is with high confidence supposed to affect protein
function or structure
• Possibly damaging
– It is supposed to affect protein function or structure
• Benign
– Most likely lacking any phenotypic effect
• Unknown
– Lack of data do not allow PolyPhen to make a
prediction
Copy FASTA-formatted
protein sequence
Enter AA position, ancestral
AA, and substituted AA
In dbSNP Build 129, corresponds to
protein NP_005948.3
Enter SNP rs#
Query vs. SNP Collection
Prediction
PSIC
db SNP Build#
Query
SNP Collection
Probably
damaging
2.093
Probably
damaging
2.172
N/A
126
References
• NCBI dbSNP
– http://www.ncbi.nlm.nih.gov/sites/entrez
• SIFT
– http://sift.jcvi.org/
• PolyPhen
– http://genetics.bwh.harvard.edu/pph/index.html