Presentation Slides

Download Report

Transcript Presentation Slides

BLAST – A heuristic
algorithm
Anjali Tiwari
Pannaben Patel
Pushkala Venkataraman
1
2
Basic Local Alignment
Search Tool
BLAST
Rapid Searching of
Protein & nucleotide
DBs
Seeking similar
sequences
GenBank
Database
nr = non redundant database
SwissProt
nr
PIR
PDB
PRF
3
Program
Blastp
Blastn
Blastx
Tblastn
Tblastx
Query
Amino acid
Nucleotide
Nucleotide
Amino acid
Nucleotide
Database
Amino acid
Nucleotide
Amino acid
Nucleotide
Nucleotide
Search Level
Amino acid
Nucleotide
Amino acid
Amino acid
Amino acid
BLAST – 3 STEP ALGORITHM
Compile Words
Scan DB
Extend
4
Some definitions
Alignment
Process of lining up 2 or more
sequences to asses similarity
BLOSUM62
A 20*20 substitution matrix
for amino acids
Gap
Space introduced into
alignment to compensate
for insertions/deletions in
1 sequence relative to
another
5
Similarity
Measures
Similarity
Matrix - BLOSUM
Local Search
Algorithms
Identities & Conservative
Replacements = +ve
Unlikely
Replacements = -ve
6
General Concept of working of BLAST
Query Input
1000’s of
sequences
Calculate
HSP
Calculate
MSP
MSP – Maximal Segment Pair
HSP – High Scoring Pair
Display
output
7
Key Idea – BLAST1
Compile a list of high scoring words of
length w from query (w=3 for proteins,
11 for nucleic acids)
Scan for word hits in the database
of score greater than
threshold, T
Extend word hit in
both directions to find High
Scoring Pairs with scores greater
than S
Step 1
Step 2
Step 3
8
Example
Step -1
Query – QQGPHUIQEGQQGKEEDPP
Words of length 3 –w = QQG, QGP, GPH, PHU, HUI…
Take first triple – QQG
Make neighborhood words – w’ = QQG, QEG, GQG…
Find high scoring triples – Blosum(w, w’) > T where T = Threshold
parameter
Suppose Blosum (QQG, QEG) =18
Blosum(QQG,GQG) = 12
Blosum(QQG, QQG)= 16
T=13
Choose QQG and QEG since Blosum Value > T value
9
Step -2
Suppose Database Sequence = PKLMMQQGKQEGM
Matching Word Pairs in
DB sequence
10
Step -3
Query
DB Sequence
QQGPHUIQEGQQGKEEDPP
Blosum(QQG, QQG) =16
PKLMMQQGKQEGM
QQGPHUIQEGQQGKEEDPP
PKLMMQQGKQEGM
Blosum(QQGK, QQGK) =21
QQGPHUIQEGQQGKEEDPP
Blosum(QQGKE, QQGKQ)
PKLMMQQGKQEGM =23
QQGPHUIQEGQQGKEEDPP
Blosum(QQGKEE,
PKLMMQQGKQEGM QQGKQE) =28
QQGPHUIQEGQQGKEEDPP Blosum(QQGKEED,
PKLMMQQGKQEGM
QQGKQEG) =27
11
Extension to the right stops here because BLOSUM
value is beginning to decrease
ADVANTAGES
DISADVANTAGES
•Faster than Dynamic Programming
•Removes low complexity regions
•Spends less time on uninteresting
search
•Statistical significance of results can
be obtained & these are very good
•Finds & reports only local
alignments
•Finds too many word hits per
Sequence thus reducing speed
•Does not allow for gaps in sequence
*** New Models to combat disadvantages ***
BLAST2, PSI Blast
12
BLAST2 – Combination of 2 Hit & Gapped
2 Hit Method - 3 Step method
Step 1 and Step 2 as BLAST –1
Step – 3 is where they differ BLAST now looks for 2 words in a sequence
instead of 1 while aligning. The 2 words are at a distance < A and are
not overlapping. Typically A=40
A
13
Gapped Blast


Gapped alignment is introduced to get an optimal alignment
Two sequences:
Seq A = ACGTA
Seq B = ACATA
Normal alignment is
ACGTA
ACATA
But if a penalty of mismatch is larger than
the penalty of gap then the best optimal alignment is as below.
AC-GTA
ACG-TA
ACA-TA
AC-ATA
14
Gapped BLAST - Allows gaps to come while aligning
Query – ATTGTCAAAGACTTGAGCTGATGCAT
DB
GGCAGACATGACTGACAAGGGTATCG
ATTGTCAAAGACTTGAGCTGATGCAT
GGCAGACATGA CTGACAAGGGTATCG
Mismatch
Gap
15
PSI – BLAST- Position specific iterated BLAST. Used for
multiple alignments
Query Sequence
BLAST search
of DB
Sequences with high
scores collected
New sequences added
& process iterated
Multiple alignment &
profile made
DB searched with
profile
16
References
•Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman,
D.J. (1990) "Basic local alignment search tool." Journal of
Molecular Biology 215:403-410.
•Altschul, S.F.,Thomas L.M., Alejandro A.S, Jinghui Z,
Zheng Z, W. Miller & David J.L. (1997) “Gapped BLAST
and PSI-BLAST: a new generation of protein database
search programs.” Nucleic Acids Research.
•http://www.ncbi.nlm.nih.gov/
•http://bioinf.man.ac.uk/ember/prototype/
17
References (Continued)
•
•
•
•
http://www.psc.edu/biomed/training/tutorials/sequence
/db/index.html
http://aracyc.stanford.edu/~jshrager/jeff/mbcs/match.ht
ml
http://www.ime.usp.br/~durham/cursos/ibi5032/pub/do
c/allignmentTutorial.pdf
http://ibivu.cs.vu.nl/teaching/masters/seq_analysis/sa_l
ecture3.pdf
18