Slides - Pages
Download
Report
Transcript Slides - Pages
An ILP Approach to Model and
Classify Hexose Binding Sites
Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid
Keirouz, and David Page
Problem Description
Hexoses play a key role in many cellular pathways
Hexose binding properties are of great interest to biomedical
researchers
Current protein-sugar computational models are based on
prior biochemical knowledge
Goal: Investigate the empirical support for biochemical
findings by comparing ILP-induced rules to actual
biochemical results
Biochemical Review
An amino acid consists of:
a central carbon atom C
an amino group NH2
a carboxyl group COOH
a hydrogen atom H
and a side chains R
Amino acids differ by their side chain R
Side chain confer to each amino acid its distinctive properties
There are 20 different amino acids, also called residues
Biochemical Review
A protein is a long chain of amino acids linked together
Residue sequence determines protein shape and function
Similar residues can be easily interchanged in a protein
Biochemical Review
Hexoses are 6-carbon sugar molecules
Hexoses consist of:
A core pyranose ring
Hydroxyl (-OH) groups sticking out
Biochemical Review
Interaction forces between hexoses and residues are due to:
Charge
Hydrogen bond
Hydrophobicity
Pyranose ring:
Hydroxyl (-OH) groups:
Apolar (no charge)
Polar (negative charge)
hydrophobic (water hating)
Hydrophilic (water loving)
Form hydrogen bonds
Prior Biochemical Findings
Planar polar residues (Asn, Asp, Gln, Glu, Arg) are
frequently involved in hydrogen bonding.
The aromatic residues (Trp, Tyr, Phe, His), stack
against the apolar surface of the sugar pyranose ring.
Planar polar and aromatic residues are present at higher
frequencies in hexose binding sites.
Ordered water molecules and metal ions are involved in
binding specificity and affinity.
Hexoses and their binding sites are neither hydrophobic nor
hydrophilic. They exhibit both properties in a dual nature.
Predicting Sugar Binding Sites
Prior biochemical findings have been incorporated in binding
site classifiers
Black box models
We take the opposite approach:
Given hexose binding sites data, what biochemical rules
can we extract with no prior biochemical knowledge?
Dataset
Mine Protein Data Bank for galactose, glucose and mannose
Remove redundancies and keep proteins with a hexose
docked in binding site
Get 80 protein-hexose binding sites (positive set)
Extract 80 negatives: non-hexose binding sites and non-
binding surface grooves
Total of 160 entries, equally divided.
Binding Site Representation
Only the few atoms present at the binding site determine the
binding site affinity.
We define the binding site as a sphere of radius 10 Å,
centered at the binding site center
We extract all atoms in this sphere
We ONLY consider atoms, not residues
For every atom, we compute its charge, hydrogen bonding,
and hydrophobicity properties
Problem Formulation
Use Aleph heuristic search to learn first-order rules
Estimate the performance using 10-fold cross-validation
The consequent of any rule is bind(A), where A is predicted to
be a hexose binding site
Restrict clause length to a maximum of 8 literals
Tolerate a clause coverage of up to 5 training-set negatives
Minimize the cost function:
cost = (# covered negatives) − (# covered positives)
Literals
Individual atom literal:
point(A,B,C,D,E, F,G,H, I, J)
A: binding site
B: atom number
C, D, E: Cartesian coordinates
F: charge value (nominal)
G: hydrogen bonding value (nominal)
H: hydrophobicity value (nominal)
I, J: atomic element and its name
Literals
Distance between two atoms literal:
dist(A,B1,B2,M,N)
A: binding site
B1, B2: two atoms numbers
M: their Euclidean distance
N: the error, resulting in M±N (set to 0.5 Å)
Results
32.5% error rate over 10 folds (p-value < 0.0002)
comparable to other general sugar binding site classifiers
Although we only consider atoms, we can infer valuable
information regarding amino acids
Example: ND1 atoms are only present in His residues. A
rule requiring ND1 is actually requiring His.
We present the rule’s English translation, with residue
substitution and sorted by coverage
A is a hexose-binding site if:
It has a Trp residue and a Glu with an OE1 atom that is
8.53 Å away from a negatively charged Oxygen.
[Pos cover = 22, Neg cover = 4]
2. It has a Phe or Tyr residue and an Asp with an OD1
atom that is 5.24 Å away from an Asp or Asn’s OD1.
[Pos cover = 21, Neg cover = 3]
3. It has a branching aliphatic residue (Leu, Val, Ile), an
Asp and an Asn. Asp and Asn’s OD1 atoms are 3.41 Å
away.
[Pos cover = 15, Neg cover = 0]
1.
A is a hexose-binding site if:
4.
5.
6.
It has a hydrophilic non-hydrogen bonding Nitrogen atom (Pro,
Arg, His) with a distance of 7.95 Å away from a His ND1
nitrogen, and 9.60 Å away from a branching aliphatic residue’s CG1.
[Pos cover = 10, Neg cover = 0]
It has a hydrophobic CD2 atom, a hydrophilic Pro backbone or His
ND1 nitrogen and two Glu (or two Gln) distant by 11.89 Å.
[Pos cover = 11, Neg cover = 2]
It has an Asp B, two identical atoms Q and X, and a hydrophilic
hydrogen-bonding atom K. Atoms K, Q and X have the same charge.
B’s ODE1 oxygen share the same Y-coordinate with K and the same Zcoordinate with Q. Atom X is 8.29 Å away from atom K.
[Pos cover = 8, Neg cover = 0]
A is a hexose-binding site if:
It has a Ser, and two Gln and/or His, with NE2 atoms
that are 3.88 Å apart.
[Pos cover = 8, Neg cover = 2]
8. It has an Asn and a Phe, Tyr or His residue, with a
CE1 atom that is 7.07 Å away from a Calcium.
[Pos cover = 5, Neg cover = 0]
9. It has a Lys or Arg, a Phe or Tyr, a Pro or His, and
a Sulfate or a Phosphate.
[Pos cover = 3, Neg cover = 0]
7.
Discussion
We infer most of the established biochemical information
Rules 1 and 2, with highest coverage, rely on the aromatic residues
Trp, Tyr, and Phe. The fourth aromatic residue, His, is
mentioned in many different rules.
This highlights the docking interaction between the hexose and
the aromatic residues.
All rules require the presence of a planar polar residue (Asn,
Asp, Gln, Glu, Arg).
These residues are most frequently involved in hexose hydrogenbonding.
Discussion
The residues mostly mentioned in the rules are aromatic and
planar polar. Which mirrors the fact that they are present at
higher frequencies in hexose binding sites
Rule 5 requires both a hydrophobic and a hydrophilic
elements. It reflects the dual nature of hexose docking.
Rules 8 and 9 require the presence of different ions
(Calcium, Sulfate, Phosphate), confirming the relevance of
ions in hexose binding.
New discovery?
Rule 2 suggests a dependency between Phe/Tyr and
Asn/Asp. Such a relation has been proven in lectins.
Similarly, rule 1 suggests a dependency between Trp and
Glu.
A link not previously identified in the literature
Further investigation is needed to confirm this finding
Conclusion
ILP achieves a similar accuracy as other general sugar black-
box classifiers
In addition, it offers insight into the discriminating process.
Aleph was able to induce most of the known hexose-protein
interaction biochemical rules.
ILP finds a previously unreported dependency between Trp
and Glu.