PS401 – Lec 10

Download Report

Transcript PS401 – Lec 10

Sequence Analysis
MUPGRET June workshops
Today

What can you do with the sequence?
 What can you do with the ESTs?
 The case of SNP and Indel
What can you do with the
sequence?

Gene prediction
 Motif identification
 Promoter identification
 Survey gene expression across tissues
 Full length gene isolation
NCBI Tools

National Center for Biotechnology
 National Library of Medicine, NIH
 Created in 1988 to develop information
systems for molecular biology.
 Provides data retrieval systems and
computational resources.
Database Resources

Database retrieval tools
 BLAST family of sequence-similarity
search programs.
 Resources for gene-level sequences
 Resources for genome-scale analysis
Database Resources

Resources for analyzing gene expression
patterns and phenotypes
 Molecular modeling database, conserved
domain database, conserved domain
architecture retrieval tool.
Database Retrieval Tools

Entrez-for DNA and protein sequences
 PubMed Central-for literature
 Taxonomy-organisms and associated
sequences
 LocusLinks-provides links from sequence
info to map and other information.
BLAST family

Basic local alignment search tool
 Sequence similarity search against various
databases in GenBank
 Gapped alignments with links to various
other databases such as unigene or
locuslink.
BLAST

pairwise alignment but can do multiple
alignments with “query-anchored” feature.
 each alignment has a statistical significance
(e-value)
 Accounts for amino acid sequence
 Outputs a list of matches including start,
stop, score, and e-value.
5 BLAST Programs
BLASTN – Nucleotide vs. Nucleotide
 BLASTP – Protein vs. Protein
 BLASTX – Protein vs. nucleotide
translation
 TBLASTN – Nucleotide translation vs.
Protein
 TBLASTX – Nucleotide translation vs.
nucleotide translation.

BLAST family

BLAST2Sequences-dot plot of alignment
 MegaBLAST-nearly exact matches
 PSI-BLAST – match to protein that reduces
false positive hits
 Blink – Allows display of alignments by
taxonomic criteria, database origin, relation
to a complete genome, relation to a 3D
protein structure or conserved domain.
Gene-Level Sequences
UniGene – Identifies a non-redundant set of
EST based on GenBank sequences.
 ProtEST – displays pre-computed BLAST
alignments between protein sequences from
model organisms and the 6-frame
translation of the UniGene nucleotide
sequences.

Gene-Level Sequences
HomoloGene – Curated and calculated gene
lrthologs and homologs for 14 organsisms.
 RefSeq – Curated reference sequences for
mRNAs, genomic sequences, etc.
 ORF Finder – 6-frame translation with
graph of ORF position.
 ePCR – locates sequence tagged sites.
 dbSNP – Contains SNP and InDel

Genome-Scale Analysis
Entrez Genomes – taxonomic, genome or
chromosome view of the current sequence
data for an organism.
 COGs – List of orthologous protein groups
from completely sequenced organisms.
 Retroviroal genotyping tools – Important in
viral genetic diversity, tracking outbreaks,
and vaccine development.

Genome-Scale Analysis
Eukaryotic Genomic Resources – location
of Plant Genomes Central with information
from various plant genome projects.
 Map Viewer – Displays genome assemblies
using chromosome map views.
 Model Maker (MM) – Generates transcript
models using exon data from prediction or
from GenBank alignments.

Genome-Scale Analysis
Evidence Viewer – Graphical summary of
alignments relative to contigs including
insertion/deletion or mismatches.
 Human-Mouse Homology Maps – List of
genes in homologous segments.
 Cancer Chromosome Aberration Project –
List of recurrent chromosome aberrations
associated with cancer.

Gene Expression/Phenotype
SAGEmap – A way to look at SAGE data
inlcuding two-way mapping between SAGE tag
and UniGene.
 Gene Expression Omnibus (GEO) – Data
repository and retrieval system for expression data
from all sources.
 OMIM – Catalog of human genes and genetic
disorders including phenotypes and polymorphism
information.

MMDB, CDDB, CDART

Molecular Modeling Database
 Based on Protein Data Bank
 Conserved Domain Database
 PSI-BLAST-derived scores indicating
domains in the protein data bank.
 Conserved Domain Architecture Retrieval
Tool – Identifies conserved domains and
displays their structure.
Sequence Analysis References

Korf, Yandell, and Bedell. 2003. An
Essential Guide to the Basic Local
Alignment Search Tool: BLAST. O’Reilly
& Associates, Sebastopol, CA.
 Markel and Leon. 2003. Sequence Analysis
in a Nutshell: A Guide to Common Tools
and Databases. O’Reilly & Associates,
Sebastopol, CA.
Sequence Analysis References

Baxevanis and Ouellette. 2001.
Bioinformatics: A Practical Guide to the
Analysis of Genes and Proteins. Wiley
Interscience, New York.
 Mount. 2000. Bioinformatics: Sequence and
Genome Analysis. Cold Spring Harbor
Laboratory, New York.