PS401 – Lec 10
Download
Report
Transcript PS401 – Lec 10
Sequence Analysis
MUPGRET June workshops
Today
What can you do with the sequence?
What can you do with the ESTs?
The case of SNP and Indel
What can you do with the
sequence?
Gene prediction
Motif identification
Promoter identification
Survey gene expression across tissues
Full length gene isolation
NCBI Tools
National Center for Biotechnology
National Library of Medicine, NIH
Created in 1988 to develop information
systems for molecular biology.
Provides data retrieval systems and
computational resources.
Database Resources
Database retrieval tools
BLAST family of sequence-similarity
search programs.
Resources for gene-level sequences
Resources for genome-scale analysis
Database Resources
Resources for analyzing gene expression
patterns and phenotypes
Molecular modeling database, conserved
domain database, conserved domain
architecture retrieval tool.
Database Retrieval Tools
Entrez-for DNA and protein sequences
PubMed Central-for literature
Taxonomy-organisms and associated
sequences
LocusLinks-provides links from sequence
info to map and other information.
BLAST family
Basic local alignment search tool
Sequence similarity search against various
databases in GenBank
Gapped alignments with links to various
other databases such as unigene or
locuslink.
BLAST
pairwise alignment but can do multiple
alignments with “query-anchored” feature.
each alignment has a statistical significance
(e-value)
Accounts for amino acid sequence
Outputs a list of matches including start,
stop, score, and e-value.
5 BLAST Programs
BLASTN – Nucleotide vs. Nucleotide
BLASTP – Protein vs. Protein
BLASTX – Protein vs. nucleotide
translation
TBLASTN – Nucleotide translation vs.
Protein
TBLASTX – Nucleotide translation vs.
nucleotide translation.
BLAST family
BLAST2Sequences-dot plot of alignment
MegaBLAST-nearly exact matches
PSI-BLAST – match to protein that reduces
false positive hits
Blink – Allows display of alignments by
taxonomic criteria, database origin, relation
to a complete genome, relation to a 3D
protein structure or conserved domain.
Gene-Level Sequences
UniGene – Identifies a non-redundant set of
EST based on GenBank sequences.
ProtEST – displays pre-computed BLAST
alignments between protein sequences from
model organisms and the 6-frame
translation of the UniGene nucleotide
sequences.
Gene-Level Sequences
HomoloGene – Curated and calculated gene
lrthologs and homologs for 14 organsisms.
RefSeq – Curated reference sequences for
mRNAs, genomic sequences, etc.
ORF Finder – 6-frame translation with
graph of ORF position.
ePCR – locates sequence tagged sites.
dbSNP – Contains SNP and InDel
Genome-Scale Analysis
Entrez Genomes – taxonomic, genome or
chromosome view of the current sequence
data for an organism.
COGs – List of orthologous protein groups
from completely sequenced organisms.
Retroviroal genotyping tools – Important in
viral genetic diversity, tracking outbreaks,
and vaccine development.
Genome-Scale Analysis
Eukaryotic Genomic Resources – location
of Plant Genomes Central with information
from various plant genome projects.
Map Viewer – Displays genome assemblies
using chromosome map views.
Model Maker (MM) – Generates transcript
models using exon data from prediction or
from GenBank alignments.
Genome-Scale Analysis
Evidence Viewer – Graphical summary of
alignments relative to contigs including
insertion/deletion or mismatches.
Human-Mouse Homology Maps – List of
genes in homologous segments.
Cancer Chromosome Aberration Project –
List of recurrent chromosome aberrations
associated with cancer.
Gene Expression/Phenotype
SAGEmap – A way to look at SAGE data
inlcuding two-way mapping between SAGE tag
and UniGene.
Gene Expression Omnibus (GEO) – Data
repository and retrieval system for expression data
from all sources.
OMIM – Catalog of human genes and genetic
disorders including phenotypes and polymorphism
information.
MMDB, CDDB, CDART
Molecular Modeling Database
Based on Protein Data Bank
Conserved Domain Database
PSI-BLAST-derived scores indicating
domains in the protein data bank.
Conserved Domain Architecture Retrieval
Tool – Identifies conserved domains and
displays their structure.
Sequence Analysis References
Korf, Yandell, and Bedell. 2003. An
Essential Guide to the Basic Local
Alignment Search Tool: BLAST. O’Reilly
& Associates, Sebastopol, CA.
Markel and Leon. 2003. Sequence Analysis
in a Nutshell: A Guide to Common Tools
and Databases. O’Reilly & Associates,
Sebastopol, CA.
Sequence Analysis References
Baxevanis and Ouellette. 2001.
Bioinformatics: A Practical Guide to the
Analysis of Genes and Proteins. Wiley
Interscience, New York.
Mount. 2000. Bioinformatics: Sequence and
Genome Analysis. Cold Spring Harbor
Laboratory, New York.