Tutorial_4 (2016) - Protein Alignments

Download Report

Transcript Tutorial_4 (2016) - Protein Alignments

Tutorial 4
Substitution matrices and PSI-BLAST
1
Agenda
• Why study distant homologies?
• Substitution Matrices
– PAM - Point Accepted Mutations
– BLOSUM - Blocks Substitution Matrix
• PSI-BLAST
Cool story of the day:
Why should we care about
cellular fusion in worms?
2
How proteins evolve
• Throughout evolution proteins change
• Some change more than others, and at different
rates in different regions of the protein.
3
Why study distant homologies?
• When we study a new organism we may find a
lot of unknown sequences that we would like
to characterize.
We might not be able to find
any close homologies.
• Substitution matrices model different
evolutional distances.
• PSI-BLAST enable to find more distant
relations between proteins.
4
Amino acids were not born equally
Both substitution matrices and PSI-BLAST
are designed to model the process by which
AAs mutate.
5
Substitution Matrix
• Scoring matrix S of size 20x20
• Si,j represents the gain/penalty due to
substituting AAj by AAi (i – line , j – column)
– Based on likelihood this substitution is found in
nature
– Computed differently in PAM and BLOSUM
• Each matrix is tailored to a particular
evolutionary distance
6
Computing probability of Mutation (Mi,j)
• PAM - Point Accepted Mutations
– Based on a small set of proteins that are closely
related
– Other than PAM1 the matrices are theoretical.
• BLOSUM - Blocks Substitution Matrix
– Based on a wider database of proteins that
includes families of proteins with conserved
regions.
– The matrices are empirical.
7
PAM
• Based on a small set of proteins that are closely
related
• PAM1 Captures mutation rates between close
proteins – protein with 1% divergence
• Problematic when comparing distant proteins.
The 1% divergence does not capture more
sporadic mutations
8
PAM-X
• In order to apply for more distant proteins
PAM-1 was self-multiplied. This models the
evolutionary process of accumulation of
mutations.
• The higher the number of the matrix – the
more suitable it is to find distant
homologies.
• Other than PAM1 the matrices are
theoretical.
9
BLOSUM
• Scores for each position are derived from
observations of the frequencies of substitutions in
blocks of local alignments in related proteins.
• BLOSUM62 contains all blocks whose members
shared at most 62% identity with any other member
of that block.
10
BLOSUM-X
Groups of
proteins
with similar
functions
BLOCKS
DB
Up to 50% similarity
Up to 50% similarity
Substitution Matrix A
Up to 32% similarity
Substitution Matrix B
11
PAM vs. BLOSUM
PAM
Based on global alignments of closely
related proteins.
BLOSUM
Based on local alignments.
BLOSUM 62 is calculated from
The PAM1 is calculated from
comparisons of sequences with no more comparisons of sequences with no more
than 62% identity in the blocks.
than 1% divergence.
Other PAM matrices are extrapolated
from PAM1.
All BLOSUM matrices are based on
observed alignments.
They are not extrapolated from
comparisons of closely related proteins.
BLOSUM are the substitution matrices in use
12
Use Recommendations
PAM100 ~ BLOSUM90
PAM120 ~ BLOSUM80
PAM160 ~ BLOSUM60
PAM200 ~ BLOSUM52
PAM250 ~ BLOSUM45
Closely Related
Highly Divergent
Query length Matrix
Gap costs
<35
PAM30
9,1
35-50
PAM70
10,1
50-85
BLOSUM80
10,1
>85
BLOSUM62
11,1
http://www.ncbi.nlm.nih.gov/blast/html/sub_matrix.html
13
Example
• Query: an uncharacterized (hypothetical)
protein
• Data Base: nr
• Blast Program: BLASTP
• Matrices: PAM30 / PAM250
BLOSUM45 / BLOSUM90
14
15
16
PSI-BLAST
Position Specific Iterative BLAST
Aimed to find more distant proteins than BLAST allows
17
PSI-BLAST Steps
1. Search a query against a protein database
2. Constructs a specialized multiple sequence
alignment based on the top results.
3. Creates a position-specific scoring matrix (PSSM).
4. The PSSM is used as a query against the database
5. PSI-BLAST estimates statistical significance (E values)
Repeat steps 3-5 iteratively. Iterations
Query
Results
Search
Protein DB
PSSM
18
Example
We will use a
sequence of an
uncharacterized
(hypothetical)
protein:
19
Threshold for initial
BLAST Search
(default: 10)
Threshold for inclusion in
PSI-BLAST iterations
(default: 0.005)
20
The results are
all hypothetical
proteins
21
22
Cool Story of the day
Why should we care about
cellular fusion in worms?
Cellular fusion
In cellular fusion two cells unite and form one cell
• Fertilization
• Muscle cells are composed of rows of fused cells
• Placenta is made up of powerful multinucleated cells
that are actually numerous individual cells that have
fused
• The eyes' lenses are formed of rows of fused cells
• In bones too cellular fusion occurs.
• The fusion processes are also involved in cancer, viral
infections and stem cells.
http://www1.technion.ac.il/_local/includes/blocks/scinews-items/100513-elegans/news-item-en.htm
24
Cellular fusion in C.elegans
• The exact way fusion takes place is still not
completely clear and is the focus of work in
Prof. Podbilewicz's lab.
• The worm suits cell fusion research because
in its skin intensive cell-cell fusion processes
take place and can be easily followed.
• They identified the protein responsible for
the worm's fusion activity - the EFF-1
protein.
• The researchers showed that in mutant
worms skin cells do not fuse and the cells
begin to migrate through the body.
Beni Podbilewicz
25
26
“...we identified fusion family
(FF) proteins within and beyond
nematodes, and divergent
members from the human
parasitic nematode
Trichinella spiralis and the
chordate Branchiostoma
floridae could also fuse
mammalian cells…”
27