chemoi̇nformati̇cs approaches to vi̇rtual screeni̇ng

Download Report

Transcript chemoi̇nformati̇cs approaches to vi̇rtual screeni̇ng

Chemoinformatics approaches to virtual
screening and in silico design
Alexandre Varnek
Laboratoire d’Infochimie, Université de Strasbourg
http://infochim.u-strasbg.fr/
Strasbourg
Paris
Laboratory of Chemoinformatics
Master on Chemoinformatics
(since 2002)
Chemoinformatics:
new disciline combining several „old“ fields
Chemical databases,
QSAR,
Virtual screening,
In silico design ,
……………..
OUTLOOK
•Needs for chemoinformatics
• Fundamentals of chemoinformatics
•Some applications
Chemoinformatics: why
•amount of information
many millions of compounds and reactions
many millions of publications
Storage, organization and search
experimental data
Chemical Databases
May 2009
September 2010
54,984,228
+7 M
62,105,511
+2 M
39,804,330
+22 M
281,474
43,995,234
831,886
Problem: Flood of Information
30 000 000
• > 5 million new compounds / year
• 800,000 publications / year
# of structures
• > 54 million compounds
25 000 000
20 000 000
15 000 000
10 000 000
5 000 000
0
1965
1970
1975
1980
1985
1990
1995
2000
Year
=> can anyone read 4.000 publications / day ?
chemical information should be well organized
and searchable
Problem: Not Enough Information
•
> 54,000,000 chemical compounds
•
> 500,000 3D structures in Cambridge Crystallographic File
> 1 % of all compounds
•
230,000 infrared spectra in largest database (Bio-Rad)
0.4 % of all compounds
What about physico-chemical and biological properties ?
The goal of chemoinfomatics is to develop predictive
approaches and tools
Chemoinformatics as a
modeling discipline
Chemoinfomatics as a modeling discipline
What structure do I need for a certain property ?
structure-activity relationships
How do I make this structure ?
synthesis design
What is the product of my reaction ?
reaction prediction, structure elucidation
Theoretical chemistry
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics
- Molecular model
- Basic concepts
- Major applications
- Learning approaches
Molecular Model
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics
electrons and nuclei
atoms and bonds
• molecular graph
• descriptor vector
Basic mathematical approaches
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics
Schrödinger equation,
HF, DFT, …
Classical mechanics
Statistical mechanics
-Graph theory,
-Statistical Learning Theory
Basic concepts
Quantum Chemistry
wave/particle dualism
Force Field
Molecular Modelling
classical mechanics
Chemoinformatics
chemical space
Chemical space = objects + metrics
• Objects:
- molecular graphs;
NH2
N
N
N
H
N
NH2
N
- descriptors vectors {Di} = f (
Metrics:
- Graphs hierarchy,
- Similarity measures
N
N
H
N
)
Navigation in Chemical Space:
topological space of chemical structures
Relationships between the
objects:
• Hierarchical scaffold-tree approach
• Structural mutation rules
• Network-like Similarity Graphs
• Combinatorial Analog Graphs
• ………….
 Rational organisation of structural data
 Exploration of the chemical space
 Identification of new objects (e.g., active
scaffolds, R-groups combinations, etc)
Navigation in Chemical Space:
vectorial space defined by molecular descriptors
Relationships between the objects:
In this space, each molecule is represented as a vector
whereas the metric is defined by similarity measures.
 In properly selected spaces, neighboring molecules possess similar properties.
 Different databases could be compared.
 Compounds subsets for screening could be rationally selected
Example :
Hansch Analysis
Biological Activity = f (Physicochemical parameters ) + constant
log1/C = a ( log P )2 + b log P + s + dEs + C
• Physicochemical
parameters can be broadly
classiied into three general
types:
• Electronic (s)
• Steric
• Hydrophobic
(dEs)
(logP)
Molecular Descriptors
Constitutional
(mol. weight, the number of S, N or O atoms, …)
Topological
(Randic index, informational content, …)
Geometrical
(molecular size, distances between functional groups, … )
Electrostatic
(electrostatic potential, charges, …)
Charged Partial Surface Area
Quantum-chemical
(energies of molecular orbitals, reactivity indices, …)
Thermodynamical
(heat of formation, logP, …)
Fragments
(sequences of atoms and bonds, augmented atoms, …)
More than 4000 types of descriptors are known
Learning approach
Quantum Chemistry
deductive >> inductive
Force Field
Molecular Modelling
deductive  inductive
Chemoinformatics
deductive << inductive
Learning approach
• In chemoinformatics the logic of learning is not
based
on
existing
physical
theories.
Chemoinformatics considers the world too complex
to be a priori described by any set of rules. Thus,
the rules (models) in chemoinformatics are not
explicitly taken from rigorous physical models, but
learned inductively from the data.
Chemoinformatics:
deductive
learning
knowledge
information
data
From Data to Knowledge
generalization
context
measurement
or calculation
inductive
learning
Models
• In chemoinformatics, a model represents an
ensemble of rules or mathematical equation linking a
given property (activity) with the molecular
structure.
PROPERTY= f (structure)
• Two main types of models:
- binary classification (SAR)
- regression (QSAR)
Organic chemistry:
exercise of « intuitive » chemoinformatics
Extraction of rules from the data
The Markovnikov Rule: When a Brønsted acid, HX, adds to an
unsymmetrically substituted double bond, the acidic hydrogen of the
acid bonds to that carbon of the double bond that has the greater
number of hydrogen atoms already attached to it.
Major applications
Algorithms for organisation and
search the data
- fingerprints,
- graph theory,
- similarity measures,
Machine-learning approaches:
- MLR,
-Decision Trees,
- Artificial Neural Networks,
- Support Vector Machines,
-………
Chemical
Databases
Structure-Activity
Models
Virtuel screening
In silico design
Chemoinformatics:
some applications
Discoverer of the Periodic Table —
an early “Chemoinformatician”
Dmitry
Mendeleév
(1834 – 1907)
• Russian chemist who arranged the 63 known elements into a
periodic table based on atomic mass, which he published in
Principles of Chemistry in 1869. Mendeléev left space for new
elements, and predicted three yet-to-be-discovered elements: Ga
(1875), Sc (1879) and Ge (1886).
Periodic Table
Chemical properties of elements gradually
vary along the two axis
computations
Hit
Target Protein
Virtual
Screening
Large libraries
of molecules
Small Library of selected hits
experiment
High Throughout Screening
Virtual screening is inevitable to analyse a huge
amount of protein-ligand combinations
Human proteome:
•
84000 peptides
Chemical universe:
• > 50 M compounds are currently available
• 1060 druglike molecules could be synthesised
Virtual screening must be very fast and efficient !
Virtual screening “funnel”
Filters
Similarity search
Pharmacophore models
CHEMICAL
DATABASE
(Q)SAR
Docking
VIRTUAL
SCREENING
HITS
–
molecules
~106
109
~101 – 103
molecules
INACTIVES
REACh regulation
• The European Union adopted Regulation on the Registration,
Evaluation, Authorisation, and Restriction of Chemicals (the “REACH
Regulation”), which entered into force on June 1, 2007.
• REACH imposes requirements of information of physico-chemical,
toxicology and eco-toxicology parameters for the chemicals,
production of which exceeds 1 ton.
• More than 30.000 compounds must be tested. Total cost estimated
(EU Commission) over a 11 -15 year period is €2.8 - €5.2 bn
No Data, No Market!
Chemoinformatics tools in SciFinder:
predictions of > 20 physico-chemical
properties and NMR spectra for each
individual compound
Drug design
Virtual screening: success stories & drugs
Virtual screening - what does it give us?
Herbert Koppen (Boehringer, Germany)
Current Opinion Drug Discovery & Dev. (2009) 12: 397-407
From virtuality to reality
Ulrich Rester (Bayer, Germany)
Current Opinion Drug Discovery & Dev. (2008) 11: 559-568
What has virtual screening ever done for drug discovery?
David E Clark (Argenta Discovery Ltd, UK)
Expert Opinion on Drug Discovery (2008) 8: 841-851
In silico screening: success stories & drugs
Market: tirofiban (1999)
Aggrastat (trade name) from Merck, GP IIb/IIIa antagonist (myocardial
infarction, it is an anticoagulant))
(2S)-2-(butylsulfonylamino)-3-[4-[4-(4-piperidyl)butoxy]phenyl propanoic acid (Mol. Mass: 440.6 g/mol)
PK data: Bioavailability: IV only (intravenous only); Half life : 2 hours
Combined with heparin and aspirin, but numerous precautions
http://www.bioscience.ws/encyclopedia/
39
Materials design
Ionic Liquids
Ionic Liquids are composed of
large organic cations:
R
2
+
N R
+
N R
1
R
R1
1
N
R
2
+
+
N
2
R
R1
N
3
and anions:
PF6-, Cl-, BF4-, CF3SO3-, [CF3SO2)2N]-
N
R
3
R
3
R2
N+ R
R4
1
Ionic Liquids
Large organic cations:
R1
R
2
+
N R
+
N R
1
1
R
2
+
N
R
2
R
R1
N
N
3
+
N
R
3
R
3
R2
N+ R
1
R4
anions:
PF6-, Cl-, BF4-, CF3SO3-, [CF3SO2)2N]-
There exist 1018 combinations of ions
that could lead to useful ionic liquids
Viscosity predictions on 23 new ILs
Solvionics
company
None of these Ionic
Liquids have been used
for model preparation
Ionic Liquids viscosity:
Experimental validation of the Neural Networks models
pred
• prediction error (~70 cP) is
similar to the “noise” in the
experimental data used for the
training of the model
RMSE=73 cP
exp
G. Marcou, I. Billard , A. Ouadi and A. Varnek,
submitted
Metabolites prediction
Prediction of aromatic hydroxylation sites for human CYP1A2 substrates
?
aromatic hydroxylation
CYP1A2
?
Potential hydroxylation sites
Method: SVM + descriptors issued from condensed graphs of reaction
The obtained model correctly predicts the hydroxylation
products with the probability of ≈80%
(see poster of C. Muller)
?
?
Reaction conditions
Search of optimal reaction conditions
+ H2
reaction query
A
B
Potential products of the reaction. The compound
C
A is a target
Experimental validation
+ H2
A
Sub
Conditions suggested by the program
Expérimental validation
1
catalyst
Pt/C (10%)
solvent
THF
additif
None
Yield (Exp)
A : 98 %
2
3
4
5
Pt/C (10%)
Ir/CaCO3 (5%)
Ir/CaCO3 (5%)
Ir/CaCO3 (5%)
DMF
EtOH
Hexane
DMF
None
NEt3 (5 %)
None
None
A : 90 %, Sub : 2%
A : 100 %
INSOLUBLE
A : 27%, Sub : 69 %
A. Varnek, in “Chemoinformatics and Computational Chemical Biology", J. Bajorath, Ed., Springer, 2010
« We are perhaps not far removed from the
time when we shall be able to submit the bulk
of chemical phenomena to calculation »
Joseph Louis Gay-Lussac, Mémoires de la Société
d ’Arcueil 2:207 (1808)
Visit our website : http://infochim.u-strasbg.fr