Is it good at TM Prediction? Yes, it is TM Pro!

Transcript Is it good at TM Prediction? Yes, it is TM Pro!

TM PRO
&
Comparison of Algorithms for “Protein
Stability Prediction Upon Mutations”
Madhavi Ganapathiraju
Graduate student
Carnegie Mellon University
1
Overview
• TMpro evaluations on PDBTM, TMPDB and MPTOPO
are complete
• Additional inputs to TMPro are being studied
– Yule values (not successful)
– Evolutionary Profile (promising)
• TMPro website has been completed
• Evaluation of algorithms to predict protein stability
changes upon mutations
2
Part 1: TM pro
3
TMPro Evaluations
Segment
Method 
Qok
Segment
F Score
Segment
Recall
Residue
level
Segment
Precision
Q2
Misclassified
as
Soluble
MPtopo (101 TM proteins)
2a
TMHMM
66
2b
TMpro NN
60
91

93
89

92

94
84
94
79
5

0

PDBTM (191 TM proteins)
3a
TMHMM
68
3b
TMpro NN
57
90

93
89

93
90

93
84

81
13

2

4
TMPro
web-server
is fully functional!
Competition
for TMpro
Logo
Prize:
See your
logo
on the web!
5
Attempts to overcome confusion with
globular soluble helices (1)
• Yule value features to be added
– Yule value features that discriminate amino acid neighbor
propensities between TM and nonTM helices were computed
earlier
– Tried to add these features as input to NN predictor, but
could not achieve quantitative improvement
– I will discuss this in future when I have any results to present
6
Attempts to overcome confusion with
globular soluble helices (2)
• Evolutionary profile information
– It is known that knowledge of evolutionary profile of a
protein can improve prediction accuracy to a great extent
• TMPro is capable of predicting TMs without requiring
knowledge of profile
– Useful when you cannot extract sequence alignments from
known proteins
• But where profile is known, we would like to use that
additional information
7
Profile generation
Those of you who have worked
with evolutionary analysis before, please give feedback
• Get multiple sequence alignments
• Compute position specific scoring matrix for each
protein
– 21 rows (20 amino acids, and 1 row for gaps)
PSSM (i,j) =
log(C(i,j)/total counts at position j)
log(C(i,j)/unigram count of i in the protein)
• Profile is generated for each protein in the training
and test sets
8
Doubts
What labels to assign to gaps?
• We have labels for training sequences
– But when original sequence has gaps when aligned, how to
interpret the labels of the gaps?
2a65
2A65_A
AAC07817
YP_001956
369
369
369
364
--n------n----n------nnn-----n------n-----------------M------D------E----L------KLS-----R------K-----------------H------.------.----.------...-----.------.-----------------.------.------.----.------...-----.------.-----------------.------E------S----F------G.K-----.------.-----------------T-----
377
377
377
372
2a65
2A65_A
AAC07817
YP_001956
378
378
378
373
-M------M------M------M-------M----------M---------MM-------A------V------L------W-------T----------A---------AI-------.------.------.------.-------.----------.---------..-------.------.------.------.-------.----------.---------..-------S------C------.-----------------------------------IL-------
385
385
385
377
Even TM regions are having gaps such as shown above
9
Doubts
What do with missing segment info for some sequences
• When nothing is shown (gap/alignment) for some
sequences, I am counting those as gaps
XP_659910
AAW43619
CAB59195
XP_466001
AAA20832
47
100
59
107
103
L-......K.----------...KAP----RSNQV.-..FVAGTMGLASAVGA.AT
.....A..A-----------KNP----NTTRNV-..FMVGALGALGASSV.ST
----.N.RP.-A..VIGSARFAYMAWTRVA
SKRA.-A.FVLSGGRFIYASLLRLL
SKRA.-A.FVLTGGRFVYASLVRLL
86
136
83
130
126
10
Using profile for prediction
Studied independent of TMpro
Neural network with 21 input, 21 hidden and 1 output neurons
Predicted output
(nonmembrane=0, membrane =1)
Experimental
observed locations
of TM helices
Residue Number
11
Another output
12
NN architecture needs to be modified
But instead I did post-processing of Neural network output
Computed Wavelet Transform
Mexican hat wavelet, scale = 10
13
Some more wavelet outputs
Note that these are from the training data itself..
Yet to check how it performs overall
14
Part 2: Stability upon Mutations
15
Evaluation of predictions of protein
stability changes upon mutations
• Effects of mutations on 2 TM proteins are available in
our group
– The two proteins are rhodopsin and bacteriorhodopsin
– Data available for how much mis-folding occurs
– How stability of protein is affected
• There are algorithms that can also predict these
changes
• We compared how accurate or reliable the prediction
methods are, by comparing their results with our
experimental data
16
3 Prediction algorithms
• I mutant 2.0
– Support vector machine
– Features: amino acid neighbors in 9nm sphere, temperature,
pH, relative solvent accessibility surface are
– http://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi
• DFIRE
– Knowledge based statistical potentials
– http://phyyz4.med.buffalo.edu/hzhou/mutation.html
• FOLDX
– Statistical mechanics.. Account for various energy terms
– http://fold-x.embl-heidelberg.de:1100/
17
Authors’ claims in 3 papers
18
Our results
Rhodopsin (PDB: 1U19)
Number of known
mutations
Folding
147
Meta 2
159
Both
279
I mutant
35.4
56.0
55.3
DFIRE
37.1
47.5
38.7
FOLD-X
55.7
67.2
52.7
Bacteriorhodopsin (PDB: 1QM8)
Folding
Meta 2
Both
Number of known
mutations
52
32
84
I mutant
54.7
78.1
64.3
DFIRE
57.7
73.3
63.0
FOLD-X
50
46.9
50.6
19
Bias in # of mutations that
increase/decrease stability
Database bias affects apparent accuracies of algorithms
I-mutant for example, predicts decrease in stability for a majority of the
mutations.
Whether the mutations studied through experiments preserve the
natural bias of decreasing stability mutations, affects the apparent
accuracy of the prediction algorithms
Experimental I-mutant
Rhodopsin
63
75
Bacteriorhodopsin
81
97
DFIRE
46
81
FOLDX
66
65
20
Correlation with known data
Rhodopsin
Bacteriorhodopsin
I-mutant
0.11
-0.09
DFIRE
0.16
0.18
FOLDX
0.24
-0.18
Reported correlations for these methods are quite large (>0.7)
On data compared here the correlations are quite low
21
Notes ..
• Local installation of blast and netblast are on cologne:
– /usr1/blast-2.2.13/
– /usr1/netblast-2.2.13/
• Java SDK on Cologne
– /usr1/j2sdk1.4.2_11/
22
Acknowledgements
Judith Klein-Seetharaman
Christopher Jon Jursa
Pitt Information sciences
(for developing web interface)
23

Is it good at TM Prediction? Yes, it is TM Pro!

Transcript Is it good at TM Prediction? Yes, it is TM Pro!

Directory