Representation and Manipulation of 3D Molecular Structures

Download Report

Transcript Representation and Manipulation of 3D Molecular Structures

3D Molecular Structures
C371 Fall 2004
Morgan Algorithm (Leach & Gillet, p. 8)
Bioisosteres (Leach & Gillet, p. 31)
Milestones In Chemical
Information: IV (PW)
• Structure diagrams are planar but molecules are not, so need to
extend existing 2D screening and graph-search methods to
allow 3D substructure searching (Pfizer and Lederle, 1986-87)
• Sources of 3D structural data
– Experimental data (Cambridge Structure Database)
– Computational chemistry (quantum mechanics, molecular
mechanics, molecular dynamics)
– Structure-generation methods for databases of molecules
• CONCORD (Texas, 1987)
• CORINA (Munich/Erlangen, 1990)
• Further extensions to allow flexible searching (ICI, MDL and
Tripos, 1991-94)
Milestones In Molecular
Modelling: IV (PW)
• Use of 3D information in
QSAR to facilitate structurebased approaches to drug
discovery
• COmparative Molecular Field
Analysis (Tripos 1988), and
related approaches
– Calculate energies at
points on a 3D grid
surrounding a molecule
– Statistical correlation with
activity to identify important
positions in space
– Need for alignment
Pharmacophore (Leach & Gillet, p. 32)
3D Substructure Searching (PW)
O
a = 8.62+
- 0.58 Angstroms
N
O
b = 7.08+
- 0.56 Angstroms
c
a
O
c = 3.35+
- 0.65 Angstroms
O
O
N
b
O
O
O
S
O
O
O
O
O
O
N
N
N
N
O
O
N
N
O
O
N
N
N
O
O
N
N
O
O P O O
O
N
O
N
N
N
O P
O
O
O
O
O
N
O
P O
O
O
O
O
O
Current Activities: Virtual
Screening (PW)
• Need to prioritise the many molecules that could be
tested
• Increasingly sophisticated level of filtering to maximise
the numbers of potential leads
– “Drugability” considerations
– Similarity searching (both 2D and 3D) using initial
weak leads
– 3D substructure searching once possible
pharmacophoric patterns have been identified
– Docking once the 3D structure of the biological target
is available
Cambridge Structural Database
• X-ray crystal structures of more than
250,000 compounds (organic and
organometallic)
• Established in 1965
• Textual queries
• Structural queries
• Specific 3D constraints (conformation or
distance variables)
Protein Data Bank
• More than 25,000 X-ray and NMR
structures of protein and protein-ligand
complexes
• Some nucleic acid and carbohydrate
structures
• Founded in 1971 at Brookhaven National
Laboratory; now run by a consortium
• Retrieval by textual queries or in some
interfaces by amino acid sequences
Uses of the CSD and PDB
• Data mining for conformational properties and
intermolecular interactions (CSD & PDB)
• Data mining for information about intermolecular
interactions (CSD & PDB)
• Further understanding of the nature of protein
structure and its relationship to amino acid
sequence (PDB)
• Homology modeling (comparative modeling)
(PDB)
3D Pharmacophores
• Definition: a set of features together with their
relative spatial orientation that are thought to be
capable of interaction with a particular biological
target
– Hydrogen bond donors and acceptors
– Positively and negatively charged groups
– Hydrophobic regions and aromatic rings
• Depends on atomic properties rather than
element types
• Does not depend on specific chemical
connectivity
Lipinski Rule of Five
• Poor absorption or permeation are more
likely when a molecule has:
– More than five hydrogen bond donors
– More than ten hydrogen bond acceptors
– LogP greater than five
– Molecular weight greater than 500
3D Database Searching
• As with 2D searching, usually involves a 2stage process
– Rapid screen to eliminate molecules that
cannot match the query
– Graph matching to identify matches
• Interatomic distances between pairs of
atoms are important
Structure Generation Programs
• CONCORD (Coordinates found in the
CAS Registry File)
• CORINA (COoRdINAtes)
– About CORINA
– Generating 3D structures with CORINA
Conformational Search and Analysis;
Systematic Conformational Search
• Goal of Conformational Analysis: identify
all accessible minimum-energy structures
of a molecule
• Global minimum-energy conformation: the
minimum with the lowest energy
• Systematic searches assign values to the
torsion angles of the rotatable bonds in the
molecule
Random Conformational Search
• Simulated annealing: temperature is
gradually reduced from a high value to a
low temperature
Other Conformational Searches
• Distance geometry
• Molecular dynamics
Deriving 3D Pharmacophores
• Pharmacophore mapping: the process of
deriving a 3D pharmacophore
– Conformational flexibility
– Different combinations of pharmacophoric
groups in the molecule
• Genetic algorithms: a class of optimization
method based on computational models of
Darwinian evolution
Applications: Structural Genomics
• Definitions (Goals)
– Characterization of all protein structures in a
given genome
– Provide sufficient coverage fold space to
facilitate accurate homology modeling of the
majority of proteins of biological interest
– PDB Target Database
(http://targetdb.rcsb.org/)
Searching 3D Protein Structures (PW)
• Searching protein sequences is well established: how to search
the 3D structures in the Protein Data Bank (PDB)?
• Extensive collaboration between Information Studies and
Molecular Biology and Biotechnology to develop graph
representations of proteins that can be searched with
isomorphism algorithms analogous to those used for chemical
structures
• Focus here on folding motifs (secondary structure elements) in
proteins but others
– Protein amino acid sidechains
– Carbohydrates
– Nucleic acids
Representation Of Protein
Folding Motifs: I (PW)
• The helix and strand secondary structure elements (SSE) are
both approximately linear, repeating structures, which can hence
be represented by vectors drawn along their major axes
• The nodes of the graph are these vectors and the edges
comprise:
– The angle between a pair of vectors
– The distance of closest approach of the two vectors
– The distance between the vectors’ mid-points
• PROTEP compares such representation using a maximal
common subgraph isomorphism algorithm to identify common
folds
Representation Of Protein
Folding Motifs: II (PW)
Structural Relationship Between Leucine
Aminopeptidase And Carboxypeptidase A (PW)
• Use of 1LAP as the target for a PROTEP
search requiring structures with at least 7
SSEs in common with the target
• The four carboxypeptidase structures in the
PDB at that time have a fold containing five
helices and eight strands in a sheet in
common with 1LAP