Madhavi_11072005
Download
Report
Transcript Madhavi_11072005
Characterization of
Transmembrane Helices
Madhavi Ganapathiraju
Nov 07, 2005
JKS-seminar
1
2
Summary
Completion of classification procedures for TM prediction using the LSA
features
Web-tool for the TM prediction has been designed; it is being
developed by Christopher Jursa
TMPDB, a set of 119 transmembrane proteins has also been processed
and included in evaluations
KchannelDB, the database of Kchannel proteins subdiviided into
families of 1, 2, 4 and 6 TMs each has been collected and processed.
First 2 have been evaluated.
Decision tree and support vector machine classifiers have been
evaluated
Paper summarizing the work has been written
Qok metric has been found to be incorrect in previous evaluations – It
has been corrected.
Nov 07, 2005
3
Recap: TM prediction method
Example:
Example:
MDPML…
(A) Map amino
acid sequence to 5
different property
sequences
-n---p--....R
O..OO
aDDad
(B) Window analysis from left to right
Count Ci1, Ci2…
Ci10
Place a moving
window at position i
i=i+1
Matrix of Counts
(L-l+1) x 10
Nov 07, 2005
(C) Singular
Value
Decomposition
(PCA)
Features
(L-l+1) x 4
(D) Neural Network
(4 input nodes,
1 output node)
Prediction &
confidence
Lx1 & Lx1
(E) Hidden Markov
Model
Prediction
Lx1
4
Neural Net Classifier
Dimension 1
4 Dimensions of the
Vector obtained by LSA
form the input
Dimension 2
Dimension 3
Dimension 4
Nov 07, 2005
5
Decision Tree & SVM Classifiers
Used MATLABarsenal, the wrapper tools
developed by Rong (LTI) to see the
performance of classifiers on the feature set
– Decision Trees
– SVM (2nd degree polynomial kernel)
Nov 07, 2005
6
Evaluation Data Sets
Benchmark
– 36 proteins of high resolution TM information
TMPDB
– 119 proteins of known 3D structure
KChannelDB
– Multiple sequence alignments of KChannel
proteins of 1 and 2 TM segments
Nov 07, 2005
7
Results: 36 high res
Segment
Symbol
Method
Qok
F
Q
ob
Residue level
Q
Q2
pred
F
F
2T
2N
s
Set 36 high resolution proteins
1
TMHMM*
71
90
90
90
80
74
77
2
TMpro (LC)*
61
94
94
94
76
?
?
3
TMpro (HMM)*
66
95
97
92
77
76
76
4
TMpro (NN)*
75
95
95
94
73
70
75
Evaluations have been performed by submitting data on benchmark server
Nov 07, 2005
8
Results: TMPDB
Segment
F
Q
obs
Q
pred
Residue level
Q2
F
2T
F
2N
TMHMM
90
89
90
89
80
90
NN
90
90
89
86
75
90
HMM
85
90
80
84
74
77
SVM
93
95
90
84
77
88
Decision Trees
92
97
86
83
75
87
Nov 07, 2005
9
Other things
Processed KChannel DB proteins for
evaluation
– Initial evaluations are done, but not ready for
discussion
…
Nov 07, 2005
10
TMPro web service
TMPro website is being developed by Christopher, Dr. Karimi’s
student
– Should be up in 2 weeks time
Developed standalone versions of feature processing required
for the web-service for DT and SVM
Nov 07, 2005
11
Charge rich proteins
I seem to have not mailed myself the latest figures here, I will
show them separately
Nov 07, 2005
12
Ongoing work
Qok is not high for TMPDB data set
To overcome this, error analysis is being performed
– Measure how far away from “truth” the prediction is (what threshold would
have classified the segment correctly as TM or non TM)
– Characteristics of the segments misclassified
Are they traditional globular hydrophobic segments only, can aromatic and other
properties be used to recover from error?
Combination with TMHMM prediction for improved performance
– Rule based combination on aromatic property has previously been shown
to improve TMHMM predictions (March/June 2005?) on high resolution
proteins
– Do this on TMPDB set as well
Other architectures of NN to be studied? Error TM segments to be
studied further with DT rules that fail
Nov 07, 2005