Transcript talk
MSDchem and the
chemistry of the
wwPDB
EMBO 22nd-26th September 2008
EMBL-EBI Hinxton UK
EBI is an Outstation of the European Molecular Biology Laboratory.
The PDB Chemical components
PDB has more than the folding of standard
polymers in 3-D
It gives an insight of interesting special chemistry
Bound ligands
Modified aminoacids
Non-standard chemical components are often the
most interesting
The PDB ligand dictionary has served for many
years
As the reference dictionary for the chemical
definition of 3 letter codes in the PDB data
The ligand dictionary has been maintained by the
curators in all wwPDB sites
Problems were accumulated
Duplicate entries
Impossible chemistry
The definition of what a 3 letter code represents
was not clear and consistent
Stereo-chemistry was ignored
The MSDchem database
The database that supported the chemical
component dictionary in the MSD.
The curation team had an explicit clear definition
about ligands, right from the start
A distinct stereo-isomer;
connectivity,
bond orders,
absolute stereo-descriptors of atoms and bonds
This was reflected in the design and the
implementation of the MSDChem database
MSDchem ligand definition
The ligand identity
Atom, elements, bonds and bond orders
Atom and bond absolute stereo-descriptors
(Cahn-Ingold-Prelog)
Equivalent to a canonical stereo-smile or
INCHI string
DCF
C4' R
C3' S
C1' R
DCM
C4' S
C3' R
C1' S
Other properties
Atom names, and atom/bond ordering
Representative coordinates
Derived properties
Aromatic bonds
Smiles – INCHI strings
Systematic names
Idealised coordinates
Rings – planes
Atom Energy types
Ligand curation
For known ligands coordinates are checked
with ligand definition (Program DOHLC)
Atom labeling is checked
A new ligand may have to be defined
For a new ligand
Fundamental properties are checked
Derived properties are generated
Is it identical to an existing ligand with
another code? (DOHLC)
3TH
Not possible
New ligand
Actually it is
6CP
Ligands in the wwPDB
Improvement of the chemical dictionary
A core task of the wwPDB remediation project
Remaining issues and data errors were fixed
Duplicate identical ligands
No representative coordinates
Wrong valences
The definition of the ligand identity and the
deviations were agreed among wwPDB
The wwPDB invested significantly in this area
with a new software toolkit (ChemComp)
Replaced most of the MSDChem backend
Additional investment in chemical software
Use of chemical software packages
CACTVS
OpenEyes
CORINA
LexiChem
MSDChem not a separate data resource
Just loading of the wwPDB ligand dictionary in
Oracle
IUPAC atom names,deoxy-bases, better
chemical names
Difficult Issues
Molecules too big to be a single chemical
component
Special chemistry (like metal complexes)
Limitations of chemical software
Legacy chemical components that is hard to
deal with (like ions)
Components that have never been fully
observed
Modified components
The MSDChem web application
Public pages for the wwPDB ligand dictionary
Based on an Oracle database load
Various search options
Visualisation and navigation
Exporting in other formats
Has been running for almost 6 years
Is used and referred by
Ligand Depot (RCSB equivalent)
ChEbi at EBI
PubChem at NCBI
HIC-Up and others
Statistics
Hits per location
Number of ligands
edu
8000
7000
uk
6000
ebi
5000
other
4000
eu
3000
2000
com
1000
net
0
2000
2001
2002
2003
2004
2005
2006
2007
Daily average load of MSDChem
~ 400 queries
~ 100 distinct IP adresses
Search following references
Most common case: search for a 3 letter code
seen in a PDB file
Search for a chemical name or part of it found in
the literature
All known names are searched
Common, PDB
Systematic
A synonym
MSDChem search
3 letter code
Chemical name
Common, PDB
Systematic
A synonym
Ligand details
For every kind of search there is a result list
Summary information
Preview icon of the molecule
Links to pages for every chemical component
With detailed images
Links for more information about atoms, bond etc.
Various options for 3-D visualization
Download options for common chemical formats
Ligand details
Results overview
Ligand details
Ligand overview
Visualisation - Export
Coordinates
Ideal
Representative
Chemical formats
PDB
Molfile (SDF)
Searching for chemical composition
Often aspects of composition are known but not the
exact structure
Like particular elements (metals etc.)
Or particular chemical fragments
User friendly expression building pages based on
formula or fragments
Visually browse through the results
Formula range
Expression can be
built with web form
Example :
O1-4 N3-100 F0
1 to 4 oxygens
More than 3 nitrogens
No Fluorine
Anything else
Fragment search
Web form
Significant
fragments
Example :
More than 2
benzimidazoles
No piperazine
Anything else
Searching for parts of structure
An outline of the structure or of some characteristic
part is known
Looking for variants of molecules
Load the known target and remove the unimportant parts
Perform an sub graph search
Looking for chemical components with similar
fragments and localized chemistry
Load the known target and perform a fingerprint search
Substructure search
Applet to draw
diagram
Load and modify
existing ligand
May take a
couple of
minutes
Links to the PDB
MSDchem searches strictly the reference
dictionary
But provides links to the PDB entries that
include a ligand or a set of ligands
From ligand details pages
And from any query results page
Links to the summary pages for the entries
(MSD Atlas pages)
Or instances of the ligands in entries along with
their environment and interactions (MSDmotif)
Link to PDB
From any result page
Like a fragment search
Link to PDB entries with such ligands
Link to Binding sites
Details - interactions of these ligands in entries
Statistics – search within results
Ligand index – download
Download of the
complete archive
Compressed tar of
Molfiles (SDF)
CML (ChEBI style)
MSDChem XML
Relational database
Just listings
Smile strings – name
Summary
The wwPDB ligand dictionary provides the
chemistry of the PDB
The MSDChem backend has been merged in the
remediation project
The state of the dictionary has improved
The MSDChem web application provides searching
of the dictionary
Name
Formula
Substructure
Fragments - similarity