ISMB2006_Intro1 - Donald Bren School of Information and

Download Report

Transcript ISMB2006_Intro1 - Donald Bren School of Information and

Chemoinformatics
P. Baldi, J. Chen, and S. J. Swamidass
School of Information and Computer Sciences
Institute for Genomics and Bioinformatics
University of California, Irvine
Overall Outline
1.
2.
3.
4.
5.
6.
Introduction
Molecular Representations
Chemical Data and Databases
Molecular Similarity
Chemical Reactions
Machine Learning and Other Predictive
Methods
7. Molecular Docking and Drug Discovery
2
1. Introduction
•
•
•
•
•
What is Chemoinformatics
Resources
Brief Historical Perspective
Chemical Space: Small Molecules
Overview of Problems and Methods
3
What is Chemoinformatics?
• chemoinformatics encompasses the
design, creation, organisation,
management, retrieval, analysis,
dissemination, visualization and use of
chemical information
4
What is Chemoinformatics?
• "the mixing of information resources to
transform data into information and
information into knowledge, for the
intended purpose of making better
decisions faster in the arena of drug
lead identification and optimizaton"
5
What is Chemoinformatics?
• “the set of computer algorithms and
tools to store and analyse chemical
data in the context of drug discovery
and design projects”
• However: drug design/discovery is to
chemoinformatics like DNA/RNA/
protein sequencing is to bioinformatics
6
Resources
Books:
J. Gasteiger, T. E. and Engel, T. (Editors) (2003).
Chemoinformatics: A Textbook. Wiley.
A.R. Leach and V. J. Gillet (2005). An Introduction to
Chemoinformatics. Springer.
Journal:
Journal of Chemical Information and Modeling
Web:
http://cdb.ics.uci.edu
and many more………
7
Brief Historical Perspective
• Historical perspective: physics, chemistry
and biology
• Theorem:
computers/biology or computers/physics>>
computers/chemistry
• Proof:
Genbank, Swissprot, PDB, Web (CERN),
etc..
8
Caveat: Long Tradition
•
•
•
•
•
Quantum Mechanics
Docking
Beilstein
ACS
Etc…
Gasteiger, J. (2006). "Chemoinformatics: a new
field with a long tradition." Anal Bioanal
Chem(384): 57-64.
9
Possible Causes
• Alchemy
• Industrial age and early commercial
applications of chemistry
• Concurrent development of modern
computers and modern biology
• Scientific differences (theory/process)
• Psychological perceptions (life/inert)
• ACM
10
Chemical Space: Small Molecules
in Organic Chemistry
• Understanding chemical space
• Small molecules:
–
–
–
–
–
–
chemical synthesis
drug design
chemical genomics,
systems biology
nanotechnology
etc
11
“A mathematician is a machine that converts coffee into theorems”
P. Erdos
12
Cholesterol
13
Aspirin
14
“A chemoinformatician is a machine …..…”
15
Chemical Space
Stars
Existing
1022
Small
Mol.
107
Virtual
0
1060 (?)
Mode
Real
Virtual
Access
Difficult
“Easy”
16
Chemoinformatics
• Historical perspective: physics, chemistry and biology
• Understanding chemical space
• Small molecules (chemical synthesis, drug design,
chemical genomics, systems biology, nanotechnology)
• Predict physical, chemical, biological properties
(classification/regression)
• Build filters/tools to efficiently navigate chemical space to
discover new drugs, new reactions, new “galaxies”, etc.
17
Chemo/Bio Informatics
Two Key Ingredients
1. Data
2. Similarity Measures
Bioinformatics analogy and differences:
– Data (GenBank, Swissprot, PDB)
– Similarity (BLAST)
18
Computational/Predictive Methods
• Spetrum of methods:
– Quantum Mechanics
– ….
– Molecular Mechanics
– ….
– Machine Learning
19
Quantum Mechanics
Schrodinger’s Equation (time independent)
Hψ=Eψ
H=(-h2/8π2m)∂2+V = Hamiltonian Operator
E= Energy
V =external potential (time independent)
ψ= ψ(x,t) =(complex) wave function = ψ(x)T(t)
(time independent case)
Ψ2 = Ψ* Ψ =probability density function (particle at
position x)
20
Schrodinger Equation
• Partial differential eigenvalue equation
• Where are the electrons and nuclei of a molecule in
space?
• Uncer a given set of conditions, what are their energies?
• Difficult to solve exactly as number of particle grows
(electron-electron interactions, etc)
• Approximate methods
– Ab initio
– Semi empirical
• 3D structures
• Reaction mechanisms, rates
21
Ab Initio
• Limited to tens of atoms and best performed
using a cluster or supercomputer
• Can be applied to organics, organo-metallics,
and molecular fragments (e.g. catalytic
components of an enzyme)
• Vacuum or implicit solvent environment
• Can be used to study ground, transition, and
excited states (certain methods)
• Specific implementations include: GAMESS,
GAUSSIAN, etc.
22
Semiempirical Methods
• Semiempirical methods use parameters that
compensate for neglecting some of the time consuming
mathematical terms in Schrodinger's equation, whereas
ab initio methods include all such terms.
• The parameters used by semiempirical methods can be
derived from experimental measurements or by
performing ab initio calculations on model
systems.Limited to hundreds of atoms
• Can be applied to organics, organo-metallics, and small
oligomers (peptide, nucleotide, saccharide)
• Can be used to study ground, transition, and excited
states (certain methods).
• Specific implementations include: AMPAC, MOPAC, and
ZINDO.
23
Molecular Mechanics
• Force field approximation
• Ignore electrons
• Calculate energy of a system as a function
of nuclear positions
24
Molecular Mechanics
Energy = Stretching Energy + Bending Energy + Torsion
Energy + Non-Bonded Interactions Energy
25
Stretching Energy
26
Bending Energy
27
Torsion Energy
28
Non-Bonded Energy
29
Statistical/Machine Learning
Methods
NNs and recursive NNs
GA
SGs
Graphical Models
Kernels
………
Representations are essential. Must either (1) deal
with non-standard data structures of variable
size; or (2) represent the data in a standard
vector format.
30