Is it good at TM Prediction? Yes, it is TM Pro!

Download Report

Transcript Is it good at TM Prediction? Yes, it is TM Pro!

TM PRO
&
Comparison of Algorithms for “Protein
Stability Prediction Upon Mutations”
Madhavi Ganapathiraju
Graduate student
Carnegie Mellon University
1
Overview
• TMpro evaluations on PDBTM, TMPDB and MPTOPO
are complete
• Additional inputs to TMPro are being studied
– Yule values (not successful)
– Evolutionary Profile (promising)
• TMPro website has been completed
• Evaluation of algorithms to predict protein stability
changes upon mutations
2
Part 1: TM pro
3
TMPro Evaluations
Segment
Method 
Qok
Segment
F Score
Segment
Recall
Residue
level
Segment
Precision
Q2
Misclassified
as
Soluble
MPtopo (101 TM proteins)
2a
TMHMM
66
2b
TMpro NN
60
91

93
89

92

94
84
94
79
5

0

PDBTM (191 TM proteins)
3a
TMHMM
68
3b
TMpro NN
57
90

93
89

93
90

93
84

81
13

2

4
TMPro
web-server
is fully functional!
Competition
for TMpro
Logo
Prize:
See your
logo
on the web!
5
Attempts to overcome confusion with
globular soluble helices (1)
• Yule value features to be added
– Yule value features that discriminate amino acid neighbor
propensities between TM and nonTM helices were computed
earlier
– Tried to add these features as input to NN predictor, but
could not achieve quantitative improvement
– I will discuss this in future when I have any results to present
6
Attempts to overcome confusion with
globular soluble helices (2)
• Evolutionary profile information
– It is known that knowledge of evolutionary profile of a
protein can improve prediction accuracy to a great extent
• TMPro is capable of predicting TMs without requiring
knowledge of profile
– Useful when you cannot extract sequence alignments from
known proteins
• But where profile is known, we would like to use that
additional information
7
Profile generation
Those of you who have worked
with evolutionary analysis before, please give feedback
• Get multiple sequence alignments
• Compute position specific scoring matrix for each
protein
– 21 rows (20 amino acids, and 1 row for gaps)
PSSM (i,j) =
log(C(i,j)/total counts at position j)
log(C(i,j)/unigram count of i in the protein)
• Profile is generated for each protein in the training
and test sets
8
Doubts
What labels to assign to gaps?
• We have labels for training sequences
– But when original sequence has gaps when aligned, how to
interpret the labels of the gaps?
2a65
2A65_A
AAC07817
YP_001956
369
369
369
364
--n------n----n------nnn-----n------n-----------------M------D------E----L------KLS-----R------K-----------------H------.------.----.------...-----.------.-----------------.------.------.----.------...-----.------.-----------------.------E------S----F------G.K-----.------.-----------------T-----
377
377
377
372
2a65
2A65_A
AAC07817
YP_001956
378
378
378
373
-M------M------M------M-------M----------M---------MM-------A------V------L------W-------T----------A---------AI-------.------.------.------.-------.----------.---------..-------.------.------.------.-------.----------.---------..-------S------C------.-----------------------------------IL-------
385
385
385
377
Even TM regions are having gaps such as shown above
9
Doubts
What do with missing segment info for some sequences
• When nothing is shown (gap/alignment) for some
sequences, I am counting those as gaps
XP_659910
AAW43619
CAB59195
XP_466001
AAA20832
47
100
59
107
103
L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT
.....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST
----.N.RP.-A..VIGSARFAYMAWTRVA
SKRA.-A.FVLSGGRFIYASLLRLL
SKRA.-A.FVLTGGRFVYASLVRLL
86
136
83
130
126
10
Using profile for prediction
Studied independent of TMpro
Neural network with 21 input, 21 hidden and 1 output neurons
Predicted output
(nonmembrane=0, membrane =1)
Experimental
observed locations
of TM helices
Residue Number
11
Another output
12
NN architecture needs to be modified
But instead I did post-processing of Neural network output
Computed Wavelet Transform
Mexican hat wavelet, scale = 10
13
Some more wavelet outputs
Note that these are from the training data itself..
Yet to check how it performs overall
14
Part 2: Stability upon Mutations
15
Evaluation of predictions of protein
stability changes upon mutations
• Effects of mutations on 2 TM proteins are available in
our group
– The two proteins are rhodopsin and bacteriorhodopsin
– Data available for how much mis-folding occurs
– How stability of protein is affected
• There are algorithms that can also predict these
changes
• We compared how accurate or reliable the prediction
methods are, by comparing their results with our
experimental data
16
3 Prediction algorithms
• I mutant 2.0
– Support vector machine
– Features: amino acid neighbors in 9nm sphere, temperature,
pH, relative solvent accessibility surface are
– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi
• DFIRE
– Knowledge based statistical potentials
– http://phyyz4.med.buffalo.edu/hzhou/mutation.html
• FOLDX
– Statistical mechanics.. Account for various energy terms
– http://fold-x.embl-heidelberg.de:1100/
17
Authors’ claims in 3 papers
18
Our results
Rhodopsin (PDB: 1U19)
Number of known
mutations
Folding
147
Meta 2
159
Both
279
I mutant
35.4
56.0
55.3
DFIRE
37.1
47.5
38.7
FOLD-X
55.7
67.2
52.7
Bacteriorhodopsin (PDB: 1QM8)
Folding
Meta 2
Both
Number of known
mutations
52
32
84
I mutant
54.7
78.1
64.3
DFIRE
57.7
73.3
63.0
FOLD-X
50
46.9
50.6
19
Bias in # of mutations that
increase/decrease stability
Database bias affects apparent accuracies of algorithms
I-mutant for example, predicts decrease in stability for a majority of the
mutations.
Whether the mutations studied through experiments preserve the
natural bias of decreasing stability mutations, affects the apparent
accuracy of the prediction algorithms
Experimental I-mutant
Rhodopsin
63
75
Bacteriorhodopsin
81
97
DFIRE
46
81
FOLDX
66
65
20
Correlation with known data
Rhodopsin
Bacteriorhodopsin
I-mutant
0.11
-0.09
DFIRE
0.16
0.18
FOLDX
0.24
-0.18
Reported correlations for these methods are quite large (>0.7)
On data compared here the correlations are quite low
21
Notes ..
• Local installation of blast and netblast are on cologne:
– /usr1/blast-2.2.13/
– /usr1/netblast-2.2.13/
• Java SDK on Cologne
– /usr1/j2sdk1.4.2_11/
22
Acknowledgements
Judith Klein-Seetharaman
Christopher Jon Jursa
Pitt Information sciences
(for developing web interface)
23