Single Nucleotide Polymorphisms

Download Report

Transcript Single Nucleotide Polymorphisms

Outline to SNP bioinformatics
lecture
• Brief introduction
• SNPs in cell biology
• SNP discovery
• SNP assessment
• SNP databases
• SNPs in genome browsers
Single Nucleotide
Polymorphisms
• Must be present in at least 1% of the
population
• Most (90%) of the sequence variation
between two genomes
• Two humans differ 0.1%
• 1/300 bp in the human genome
– Lower in coding regions
• 10 million in the human genome
Categories of SNPs
• Missense/Non-synonymous
–
–
–
–
Changes an amino acid
About half of the SNPs in coding sequence
Can alter function and or structure of the protein
Cause of most monogenetic diseases
• Hemochromatosis (HFE)
• Cystic fibrosis (CFTR)
• Hemophilia (F8)
• Nonsense
– Introduces a stop codon
– Same consequences as non-synonymous
Categories of SNPs
• Synonymous
– Does not alter the coding sequence
– May alter splicing
• Non-coding
– Can be located in promoter or regulatory
regions
– Can impact the expression of the gene
• All SNPs can be used as markers
Use to cell biologist
• Association studies
– Use SNPs as markers to find regions associated with
phenotype
• Causative SNPs
– Altered protein
– Altered expression
• Regions of altered conservation between
strains/species/individuals
• Evolutionary analyses
• Etc…
SNP discovery
• Discovery of SNPs usually from sequencing
• Discovery is based on separating
sequencing errors from ’real’ differences
and assessing the frequency in the
sequenced population
• Separation of parologous sequences
• Validation, genotyping
SNP discovery resources
• Polybayes
– SNP discovery in redundant sequences
• Polyphred
– SNP discovery based on phred/phrap/consed
• NovoSNP
– Graphical identification of SNPs
Example: PolyPhred
• Detects
heterozygotes from
chromatograms
• Runs together with
phred/phrap/consed
• Command line
SNP assessment
• Assess SNPs for functional effects
– Non-synonymous SNPs
•
•
•
•
Conservation across species
Amino acid properties
Protein structure
Transmembrane regions, signal peptides etc.
SNP assessment resources
•
•
•
•
•
•
•
•
SIFT
PolyPhen
Pmut
SNPs3D
PANTHER PSEC
TopoSNP
MAPP
Etc
Example: SIFT
• Sorting Intolerant From Tolerant
• Builds an alignment of similar sequences
• Calculates a score based on the aa in the
alignment
• Takes the environment into account
• Takes the properties of the aa into account
• Does not use structure
SNP databases
•
•
•
•
Maps of SNPs in human, mouse, etc
Haplotype maps
Functional SNPs
Disease databases
SNP databases
•
•
•
•
•
•
dbSNP
F-SNP
HGVBase
PolyDoms
OMIN
Etc…
Example: dbSNP
• 50 million submissions
• 18 million clusters
• 7 million in genes
• 44 organisms
• 91 million SNPs submitted
dbSNP
• Search for SNPs, location, etc
• Information submitted on method, flanking
sequence, alleles, population, sample size,
validation etc
• Information computed on SNPs at same
location including functional analysis,
population diversity etc
SNPs in genome browsers
• Ensembl
• UCSC
Example: UCSC
HapMap
• Aim: a haplotype map of the human genome
describing common patterns of sequence variation
• A haplotype map is based on alleles of SNPs close
together are inherited together
• HapMap will identify which SNPs are informative
in mapping, reducing the number of SNPs to
genotype by a magnitude
• Populations from Asia, Europe and Africa
• 2nd generation map with over 3.1 million SNPs
Ng PC, Henikoff S.
Predicting the effects of amino acid substitutions on protein function.
Annu Rev Genomics Hum Genet. 2006;7:61-80. Review.
Bhatti P, Church DM, Rutter JL, Struewing JP, Sigurdson AJ.
Candidate single nucleotide polymorphism selection using publicly available tools: a guide
for epidemiologists.
Am J Epidemiol. 2006 Oct 15;164(8):794-804. Epub 2006 Aug 21.
Clifford RJ, Edmonson MN, Nguyen C, Scherpbier T, Hu Y, Buetow KH.
Bioinformatics tools for single nucleotide polymorphism discovery and analysis.
Ann N Y Acad Sci. 2004 May;1020:101-9. Review.
The International HapMap Consortium.
A second generation human haplotype map of over 3.1 million SNPs.
Nature 449, 851-861. 2007.