Basic Local Alignment Search Tool

Download Report

Transcript Basic Local Alignment Search Tool

DNA序列分析
David Shiuan
Department of Life Science
Institute of Biotechnology and
Interdisciplinary Program of Bioinformatics
National Dong Hwa University
DNA序列分析 (I)



BLAST comparison
ORF (open reading frame) Finder
Promoter Search
- Promoter Prediction (BCM)
- EPD (Eukaryote Promoter Database)
- NNPP prokaryote promoter prediction (BCM)
- ProtScan (BIMAS)
DNA序列分析 (II)





Sequence Alignment (Clastal W)
Tree Analysis (MEGA, PAUP, UPGMA)
Motif Prediction
Restriction Analysis (TCGA)
RNAFOLD (GCG)
Basic Local Alignment Search Tool

A sequence comparison algorithm
optimized for speed used to search
sequence databases for optimal local
alignments to a query.

Algorithm : A fixed procedure embodied in
a computer program.
Basic Local Alignment Search Tool

The initial search is done for a word of length
"W" that scores at least "T" when compared to
the query using a substitution matrix. Word hits
are then extended in either direction in an
attempt to generate an alignment with a score
exceeding the threshold of "S". The "T"
parameter dictates the speed and sensitivity of
the search.
Calculating alignment scores
BLOSUM62 Substitution Scoring Matrix


The BLOSUM 62 matrix shown here is a 20 x
20 matrix, in which every possible identity and
substitution is assigned a score based on the
observed frequencies of such occurences in
alignments of related proteins.
Identities are assigned the most positive scores.
The NCBI BLAST family of programs





blastp compares an amino acid query sequence against a
protein sequence database
blastn compares a nucleotide query sequence against a
nucleotide sequence database
blastx compares a nucleotide query sequence translated in
all reading frames against a protein sequence database
tblastn compares a protein query sequence against a
nucleotide sequence database dynamically translated in all
reading frames
tblastx compares the six-frame translations of a
nucleotide query sequence against the six-frame
translations of a nucleotide sequence database.
Peptide Sequence Databases
for BLAST search

nr


month


All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF
All new or revised GenBank CDS
translation+PDB+SwissProt+PIR+PRF
released in the last 30 days.
swissprot

Last major release of the SWISS-PROT protein
sequence database (no updates)
Filtering of low-complexity segments
E-value for the score S

the expected number of HSPs with score
at least S is given by the formula
E = K m n e – lS
HSP : high-scoring segment pairs
m and n : sequence lengths
K and lambda : parameters
Promoter Search




ProtScan (at BIMAS)
EPD (Eukaryote Promoter Database)
Promoter Prediction (BCM)
NNPP (Prokaryote Promoter Prediction at BCM)
About the neural network method




NNPP is a method that finds eukaryotic and
prokaryotic promoters in a DNA sequence.
It has been shown that multiple functional sites
in the primary DNA are involved in the
polymerase binding process.
These elements, such as the TATA-box and the
transcription start site ("Initiator") for
eukaryotes.
These promoter elements are present in various
combinations separated by various distances in
the sequence.