Transcript Document
Advanced Bioinformatics
Lecture 7: Computer-aided lead identification
ZHU FENG
[email protected]
http://idrb.cqu.edu.cn/
Innovative Drug Research Centre in CQU
创新药物研究与生物信息学实验室
Table of Content
1. Schematic of DOCKing
2. Pharmacophore-based docking
3. INVDOCK Strategy
4. Ligand-based drug design
5. Classification of drugs by SVM
2
What is docking?
Given two molecules find their correct association
T
+
=
Computationally predict the structures of protein-ligand complexes from
their conformations and orientations. The orientation that maximizes the
interaction reveals the most accurate structure of the complex.
3
General protein–ligand binding
Ligand
− Molecule that binds
with a protein
Protein active site(s)
− Allosteric binding
− Competitive binding
Function of binding
interaction
− Natural and artificial
4
Docking strategy
PDB file
Surface Representation
Patch Detection
Matching Patches
Scoring & Filtering
Candidate
complexes
5
Schematic of docking methodology
(A) the target binding site is
filled with site points
(B) distances between atoms in
a molecule are matched to
that of site points
(C) a transformation matrix is
calculated for an orientation
(D) the molecule is docked into
the binding site, and the fit
of that conformer is scored
6
Design of HIV-1 protease inhibitor
Step 1: creation of spheres to fit a cavity
7
Design of HIV-1 protease inhibitor
Step 2: place a ligand to match the position of spheres
8
Design of HIV-1 protease inhibitor
Step 3: check chemical complementarity
9
Scoring in ligand-protein docking
Potential energy description
10
Some techniques
Surface representation, that efficiently represents the
docking surface and identifies the regions of interest
− Connolly surface
− Lenhoff technique etc.
Dense MS surface (Connolly)
Sparse surface (Shuo Lin et al.)
11
Connolly surface
Each atomic sphere is
given the van der
Waals radius of the
atom
Rolling a Probe Sphere
over the Van der Waals
surface leads to the
Solvent Reentrant
Surface or Connolly
surface
12
Lenhoff technique
Computes a “complementary” surface for the
receptor instead of the Connolly surface, i.e.
computes possible positions for the atom centers of
the ligand
Atom centers of the ligand
van der Waals surface
13
Pharmacophore-based docking
Basic idea
Appropriate spatial disposition of a small
number of functional groups in a molecule is
sufficient for achieving a desired biological
effect.
The ensemble formation will be guided by
these functional groups
14
3-D representation of a protein binding site
6.7
4.2-4.7
5.2
5.1-7.1
4.8
Distances
between
binding groups
in Angstroms
and
the type of
interaction
is searchable
15
Pharmacophore Fingerprint
Appropriate spatial disposition of a small
number of functional groups in a molecule is
sufficient for achieving a desired biological
effect.
The ensemble formation will be guided by
these functional groups
16
Schematic of PhDOCK methodology
DOCK
PhDOCK
17
Advantages and disadvantages of PhDOCK
Advantages: speed increase due to (1) rapid elimination of
ligands containing functional groups which would interfere
with binding. (2) speed increase over docking of individual
molecules. (3) more information pertaining to the entire
molecule is retained (no rigid portions). (4) Chemical matching
and critical clusters are encouraged.
Disadvantages: (1) complex queries are extremely slow. (2) the
majority of the information contained in the target structure is
not considered during the search.
18
INVDOCK Strategy
Existing methods
INVDOCK methods
Given a protein, find putative
Given a ligand, find putative
binding ligands from chemical
protein targets from protein
database
database
Given Lock, find Key
Given Key, find Lock
Forward lead identification
Backward MOA prediction
Science 1992; 257:1078
Proteins 1999; 36:1
19
INVDOCK Test on Drug Target Prediction
Anticancer Drug Tamoxifen
PDB Id
1a25
1a52
1bhs
1bld
1cpt
1dmo
Protein
Protein Kinase C
Estrogen Receptor
17 beta HSD dehydragenase
bFGF Factor
Cytochrome P450-TERP
Calmodulin
Experimental Findings
Secondary Target
Drug Target
Inhibitor
Inhibitor
Metabolism
Secondary Target
Proteins. 1999; 36:1
Tamoxifen is a famous anticancer
drug for treatment of breast cancer.
It was approved by FDA in 1998 as
the 1st cancer preventive drug. 30
million people are expected to use it.
20
INVDOCK Test on Drug Target Prediction
Drug Toxicity Targets (J. Mol. Graph. Mod. 2001, 20, 199)
Number of
experimentally
confirmed or
implicated
toxicity targets
Number of
toxicity
targets
predicted by
INVDOCK
Number of
toxicity targets
without structure
or involving
covalent bond
No. of INVDOCK
predicted toxicity
targets without
experimental
finding
Aspirin
15
9
2
4
2
Gentamicin
17
5
2
10
2
Ibuprofen
5
3
0
2
2
Indinavir
6
4
0
2
2
Neomycin
14
7
1
6
6
Penicillin G
7
6
0
1
8
Tamoxifen
2
2
0
0
4
Vitamin C
2
2
0
0
3
Total
68
38
5
25
29
Compound
Number of
toxicity
targets
missed by
INVDOCK
21
Results of docking studies
The docked (blue) and crystal (yellow) structure of ligands in some
PDB ligand-protein complexes. The PDB Id of each structure is shown.
22
Dataset and Testing Results
Protein-Protein cases from protein-protein docking benchmark:
Enzyme-inhibitor – 22 cases
Antibody-antigen – 16 cases
Protein-DNA docking: 2 unbound-bound cases
Protein-drug docking: tens of bound cases (Estrogen receptor, HIV protease, COX)
Performance: Several minutes for large protein molecules and seconds for
small drug molecules on standard PC computer.
Estrogen receptor
Estradiol molecule
from complex
Docking solution
DNA
Endonuclease
Docking solution
Endonuclease I-PpoI (1EVX) with
DNA (1A73). RMSD 0.87Å, rank 2
Estrogen receptor with estradiol (1A52).
RMSD 0.9Å, rank 1, running time: 11
seconds
23
Classification of Drugs by SVM
A drug is classified as either belong (+) or not belong (-) to a class
Drug class: inhibitor of a protein, BBB penetrating, genotoxic, etc.
Protein class: enzyme EC3.4 family, DNA-binding, etc.
By screening against all classes, the property of a drug or the
function of a protein can be identified
Class-1
SVM
-
Class-2
SVM
+
……
-
Class-n
SVM
-
Drug
Drug belongs
to class-2
24
Classification of drugs by SVM
What is SVM?
• Support vector machines, a machine learning method based on
artificial intelligence, learning by examples, statistical learning,
classify objects into one of the two classes.
Advantages of SVM:
• Diversity of class members (no racial discrimination).
• Use of structure-derived physico-chemical features as basis for
drug classification (no structure-similarity required in the
algorithm).
25
Artificial Intelligence (AI)
26
Machine learning method
Inductive learning (example-based learning)
27
Machine learning method
Feature vectors
A = (1, 1, 1)
B = (0, 1, 1)
C = (1, 1, 1)
D = (0, 1, 1)
E = (0, 0, 0)
F = (1, 0, 1)
28
Machine learning method
Feature vectors in input space
Z
Feature vector
A=(1, 1, 1)
B=(0, 1, 1)
C=(1, 1, 1)
D=(0, 1, 1)
E=(0, 0, 0)
F=(1, 0, 1)
Input space
F
E A
B
Y
X
29
SVM Method
Drug family
members
Border
Drug family
members
New border
Nonmembers
Nonmembers
Project to a higher dimensional space
30
SVM Method
New border
Support vector
Support vector
Protein family
members
Nonmembers
31
SVM Method
Support vector
Protein family
members
Nonmembers
New border
Support vector
32
Best Linear Separator?
33
Find closest points in convex hulls
d
c
34
Plane bisect closest points
d
c
35
Best Linear Separator
Supporting plane method
Maximize distance
Between two parallel
supporting planes
Distance
= “Margin”
=
36
Best Linear Separator
Supporting plane method
37
SVM Method
Border line is nonlinear
38
SVM Method
Non-linear transformation: use of kernel function
39
SVM Method
40
SVM Method
41
SVM Method
42
SVM Method
43
SVM Method
44
SVM for classification of drugs
How to represent a drug?
• Each structure represented by specific feature vector
assembled from structural, physico-chemical properties
Simple molecular properties (molecular weight, no. of
rotatable bonds etc. 18 in total)
Molecular Connectivity and shape (28 in total)
Electro-topological state polarity (84 in total)
Quantum chemical properties (electric charge,
polaritability etc. 13 in total)
Geometrical properties (molecular size vector, van der
Waals volume, molecular surface etc. 16 in total)
J. Chem. Inf. Comput. Sci. 44,1630 (2004)
J. Chem. Inf. Comput. Sci. 44, 1497 (2004)
Toxicol. Sci. 79,170 (2004)
45
SVM-based drug design and property
prediction software
Your drug
structure
Chemical Structure
Chemical
Structure
Drug
Option
two
Which class your
drug belongs to?
Option
one
Send structure
to classifier
Input structure
through internet
Computer loaded
with SVMProt
Input structure
on local machine
Drug designed
or property
predicted
SVM
classifier for every
Drug class
Identified
classes
46
SVM drug prediction results
Protein inhibitor/activator/substrate prediction
• 86% of the 129 estrogen receptor activators and 84% of 101 nonactivators correctly predicted.
• 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates
correctly predicted
Drug toxicity prediction
• 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted
• 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly
predicted
Pharmacokinetics prediction
• 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted
• 90% of 131 human intestine absorption and 80% of 65 non-absoption
agents correctly predicted.
47
Projects Q&A!
1. Biological pathway simulation
2. Computer-aided anti-cancer drug design
3. Disease-causing mutation on drug target
Any questions? Thank you!
48