w0506_tutorial8
Download
Report
Transcript w0506_tutorial8
Introduction to Bioinformatics Tutorial no. 8
Predicting protein structure
PSI-BLAST
PHDsec and PSIpred
PHDsec
PSIpred
Rost & Sander, 1993
Based on sequence family alignments
Jones, 1999
Based on PSI-BLAST profiles
Both consider long-range interactions
PSIpred Input
Input
sequence
Type of Analysis
PSIpred Input (2)
Filtering Options
Email address
GO!
PSIpred Output
Conf: Confidence (0=low, 9=high)
Pred: Predicted secondary structure (H=helix, E=strand, C=coil)
AA: Target sequence
Confidence level
Conf: 988766667637889999877999871289878877049963202468899999997887
Pred: CCCCCCCCCCHHHHHHHHHHHHHHHHHCCCCCCHHHCCCCCHHHCHHHHHHHHHHHHHHH
AA: MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRE
10
20
30
40
50
60
Predicted structure
Conf: 742888731467888768899999999999999987557888998875227887303678
Pred: HHCCCCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCCCCCCHHHH
AA: LASKKNPKLINALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIA
70
80
90
100
110
120
PHDsec Input (1)
Email address
Type of
prediction
Additional
output
Output
format
Reduce
processing
PHDsec Input (2)
Type (number) of
input sequences
Upload file
Enter sequence
Wait for results?
PHDsec Output (1)
Protein
classification
Structure
proportions
Amino acid
proportions
PHDsec Output (2)
Estimated
structure
Confidence
level
Structure
with high
confidence
PSI-BLAST
Position-Specific Iterative BLAST
Finds more distantly related sequences
Extension to BLASTP
Distant sequences with insignificant E values
Even in distantly related sequences,
important domains can be highly conserved
PSI-BLAST gives more weight to those
PSI-BLAST Profile
When close sequences are aligned –
areas of conservation.
Scoring matrix becomes position specific
Each column has a unique set of a.a.
frequencies.
Score is column specific, based on a.a.
frequency.
More frequent a.a. -> higher score.
A new sequence is scored based on the
new scoring matrix.
123456
AMTYQR
CTTYQS
SMTYQA
Position-Specific Scoring Matrix
A PSI-BLAST Iteration
Collect all database sequence segments
that have been aligned with query
sequence with E-value below set threshold
(default 0.01)
Construct position specific scoring matrix
for collected sequences. Rough idea:
Align all sequences to the query sequence as
the template.
Assign weights to the sequences
Construct position specific scoring matrix
Find sequences that mach the profile
Using PSI-BLAST (1)
Available
from main
BLAST page
Or switch on
in BLASTP
E value threshold for initial inclusion
in multiple alignment for profile
Using PSI-BLAST (2)
Align selected
sequences, generate
profile, search again
Number of results to
show next iteration
New result
Select whether to include
in next iteration
Exercise 1
1. There is a protein with an unknown structure:
>some protein
MEAFLGTWKMEKSEGFDKIMERLGVDFVTRKMGNLVKPNLIVTDLGGGKYK
MRSESTFKTTECSFKLGEKFKEVTRFTRGHFFMITVENGVMKHEQDDKTKV
TYIERVVEGNELKATVKVDEVVCVRTYSKVA
Can BLAST help us to predict its SS?
2. Use any secondary structure prediction method to predict the
secondary structure of 1O8V and compare it to the solved structure.
NOTICE! The secondary structure definition in PDB is given in a 7 letter code
instead of 3 letter code (H, E, C). For comparison purposes consider: G H and
I as H; E as E ; all the rest including spaces as C.
3. What can you conclude about the secondary structure prediction in
this case?
4. Are the results consistent with the confidence value of the prediction?
5. Can you explain the prediction results based on the real structure?
Exercise 1
•
Exercise 2
Prion is the protein which responsible to the Mad Cow Disease.
In the normal situation the amino acids in a specific region are
arranged in α-helix (H1). In the abnormal situations this region
undergoes a change into a β-strand conformation.
•
This conformational change is thought to be the origin of the
disease, which brings to a rapid degeneration of the nerve
system, and usually causes death.
•
It is assumed that the prion molecules, which changed
conformations, accelerate the conformational change of
additional molecules.
1. Check what conformation is predicted for this protein.
2. The PDB code of the prion protein is 1ag2. The helix is located
at positions 21-30 on the sequence in this file. Does the
predicted SS correlates with the real one in the region of
interest?