Madhavi_11072005

Download Report

Transcript Madhavi_11072005

Characterization of
Transmembrane Helices
Madhavi Ganapathiraju
Nov 07, 2005
JKS-seminar
1
2
Summary







Completion of classification procedures for TM prediction using the LSA
features
Web-tool for the TM prediction has been designed; it is being
developed by Christopher Jursa
TMPDB, a set of 119 transmembrane proteins has also been processed
and included in evaluations
KchannelDB, the database of Kchannel proteins subdiviided into
families of 1, 2, 4 and 6 TMs each has been collected and processed.
First 2 have been evaluated.
Decision tree and support vector machine classifiers have been
evaluated
Paper summarizing the work has been written
Qok metric has been found to be incorrect in previous evaluations – It
has been corrected.
Nov 07, 2005
3
Recap: TM prediction method
Example:
Example:
MDPML…
(A) Map amino
acid sequence to 5
different property
sequences
-n---p--....R
O..OO
aDDad
(B) Window analysis from left to right
Count Ci1, Ci2…
Ci10
Place a moving
window at position i
i=i+1
Matrix of Counts
(L-l+1) x 10
Nov 07, 2005
(C) Singular
Value
Decomposition
(PCA)
Features
(L-l+1) x 4
(D) Neural Network
(4 input nodes,
1 output node)
Prediction &
confidence
Lx1 & Lx1
(E) Hidden Markov
Model
Prediction
Lx1
4
Neural Net Classifier
Dimension 1
4 Dimensions of the
Vector obtained by LSA
form the input
Dimension 2
Dimension 3
Dimension 4
Nov 07, 2005
5
Decision Tree & SVM Classifiers

Used MATLABarsenal, the wrapper tools
developed by Rong (LTI) to see the
performance of classifiers on the feature set
– Decision Trees
– SVM (2nd degree polynomial kernel)
Nov 07, 2005
6
Evaluation Data Sets

Benchmark
– 36 proteins of high resolution TM information

TMPDB
– 119 proteins of known 3D structure

KChannelDB
– Multiple sequence alignments of KChannel
proteins of 1 and 2 TM segments
Nov 07, 2005
7
Results: 36 high res
Segment
Symbol 
Method 
Qok
F
Q
ob
Residue level
Q
Q2
pred
F
F
2T
2N
s
Set 36 high resolution proteins
1
TMHMM*
71
90
90
90
80
74
77
2
TMpro (LC)*
61
94
94
94
76
?
?
3
TMpro (HMM)*
66
95
97
92
77
76
76
4
TMpro (NN)*
75
95
95
94
73
70
75
Evaluations have been performed by submitting data on benchmark server
Nov 07, 2005
8
Results: TMPDB
Segment
F
Q
obs
Q
pred
Residue level
Q2
F
2T
F
2N
TMHMM
90
89
90
89
80
90
NN
90
90
89
86
75
90
HMM
85
90
80
84
74
77
SVM
93
95
90
84
77
88
Decision Trees
92
97
86
83
75
87
Nov 07, 2005
9
Other things

Processed KChannel DB proteins for
evaluation
– Initial evaluations are done, but not ready for
discussion

…
Nov 07, 2005
10
TMPro web service

TMPro website is being developed by Christopher, Dr. Karimi’s
student
– Should be up in 2 weeks time
Developed standalone versions of feature processing required
for the web-service for DT and SVM
Nov 07, 2005

11
Charge rich proteins

I seem to have not mailed myself the latest figures here, I will
show them separately
Nov 07, 2005
12
Ongoing work


Qok is not high for TMPDB data set
To overcome this, error analysis is being performed
– Measure how far away from “truth” the prediction is (what threshold would
have classified the segment correctly as TM or non TM)
– Characteristics of the segments misclassified


Are they traditional globular hydrophobic segments only, can aromatic and other
properties be used to recover from error?
Combination with TMHMM prediction for improved performance
– Rule based combination on aromatic property has previously been shown
to improve TMHMM predictions (March/June 2005?) on high resolution
proteins
– Do this on TMPDB set as well

Other architectures of NN to be studied? Error TM segments to be
studied further with DT rules that fail
Nov 07, 2005