Transcript Slides
Genome scale enzyme-metabolite and
drug-target interaction predictions using
the signature molecular descriptor
Faulon, J. L., M. Misra, et al. (2008), Bioinformatics 24(2): 225-33.
05/02/2008
Jae Hyun Kim
Contents
Terminology
Motivation
Method
Molecular Signature
Signature Kernel
Signature Product Kernel
Results
Conclusion
[email protected]
2
Terminology (1)
Catalyst
Enzyme
Increases the rate of chemical reaction / biological process
Remains unchanged
Biomolecules that catalyze chemical reactions
Usually proteins
Metabolite
Intermediates & products of metabolism
Restricted to small molecules
Reference:
www.wikipedia.org
[email protected]
3
Terminology (2)
Inhibitor
Molecules that decrease enzyme activity
Compete with substrates
Most of drugs/poisons
Reference:
www.wikipedia.org
[email protected]
4
Enzyme Commission (EC) Number
EC Number
Numerical Classification scheme for Enzymecatalyzed reactions
Four levels of hierarchy
Example: EC 3.4.11.4 : tripeptide aminopeptidases
EC 3 : hydrolases (enzymes that use water to break
up some other molecules )
EC 3.4 : hydrolases that act on peptide bonds
EC 3.4.11 : hydrolases that cleave off the aminoterminal amino acid from polypeptide
EC 3.4.11.4 : hydrolases that cleave off the aminoReference:
terminal end from a tripeptide
www.wikipedia.org
[email protected]
5
Motivation
Genome scale
Large-scale
enzyme-metabolite and
drug-target interaction
Protein-Chemical Interaction
predictions
Machine-learning Technique
using
the signature molecular descriptor
[email protected]
6
Molecular Signature
G=(V,E) : Molecular Graph
Atomic Signature
V : vertex (atom) set
E : edge (bond) set
Canonical representation of subgraph surrounding a
particular atom
include atoms and bonds up to a predefined distance
(height)
Molecular Signature of G : h(G)
h (x)
G
Height
: atomic signature in G rooted at x of height h
Chemicals : 0~6
Protein: 6~18 (amino acid residue 1~7)
[email protected]
7
Molecular Signature: Example
(Leucine)
(Isoleucine)
•Depth First Search up to “height” deep
•‘(‘ going down, ‘)’ going back up
[email protected]
(Glycine)
c_, n_: sp3 carbon/nitrogen atom
c=, o= : sp2 (double-bond) carbon/oxygen atom
h_: hydrogen
8
Reaction Signature
General form of enzymatic reaction R
s1S1+s2S2+…+snSn p1P1+p2P2+…+pmPm
Height h signature of reaction R
[email protected]
9
Pairwise Kernel
To predict/classify protein-protein
interactions
To measure similarity between two pairs of
proteins
Kernel Function K( (X1,X2), (X’1,X’2) )
How to measure similarity between
pairs?
[email protected]
10
Kernel Types
Pairwise similarity by component similarity
If X1~X1’ and X2~X2’ then (X1,X2)~(X1’,X2’)
Assess directly similarity between pairs
From
Ben-Hur, A. and W. S. Noble (2005). "Kernel methods for predicting
protein-protein interactions." Bioinformatics 21 Suppl 1: i38-46.
x12= (x1ix2j + x2ix1j ): pairwise representation of (X1, X2)
Similarity inside the pair Similarity between pairs
[email protected]
11
Signature Kernel
Definition
Apply to chemicals, proteins, reactions
[email protected]
12
Signature Product Kernel (1/2)
P: Protein, C: Chemical
Definition : Signature of Complex PC
Two pairs of P-C interaction (P,C) & (Q,D)
[email protected]
13
Signature Product Kernel (2/2)
Similarly,
Therefore,
[email protected]
14
Signature Kernel : Example (height 1)
# of occurrence
[email protected]
15
Signature Product Kernel : Example
[email protected]
16
Signature Similarity VS.
Sequence Alignment Scores
• Computed for every pair of amino acids
• Correlation : Chemically similar high BLOSUM62 score
[email protected]
17
EC Number Classification
Positive Examples
Negative Examples:
download from KEGG
more than 50, max 500
Equal Number, Random Selection
Signature Kernel, 5-fold CV
Using only reactions
[email protected]
Using only
protein sequences
18
EC Classification
•Using both sequences & reactions
•Signature Product Kernel
Class 1
Class 1.1
Class 1.1.1
Class 1.1.1.1
[email protected]
19
Comparison with other Methods
•Accuracy = (TP+TN)/
(TP+TN+FP+FN)
•Auc = Area Under Curve
•Precision = TP/(TP+FP)
•Sensitivity=TP/(TP+FN)
•Specificity=TN/(TN+FP)
•Jaccard Coefficient
= TP/(TP+FP+FN)
• A larger number indicates
better results
[email protected]
20
Predicting New Enzyme Interactions
Prediction
EC No. accepted in September 2006 : Test Set
Predict whether or not a given enzyme will catalyze a
given reaction
Signature Product Kernel
[email protected]
21
Predict DRUGBANK Using KEGG
•Class I : Both in training set
•Class II: Different Partners
•Class III: Only Target
•Class IV: Only Drug
•Class V: None
•Signature Product Kernel
Area under ROC = 0.74
[email protected]
22
Conclusion
Unified method for predicting proteinchemical interactions
Atomistic structure representation of
proteins encompasses information stored
in substitution matrices.
[email protected]
23