Transcript talk

MSDchem and the
chemistry of the
wwPDB
EMBO 22nd-26th September 2008
EMBL-EBI Hinxton UK
EBI is an Outstation of the European Molecular Biology Laboratory.
The PDB Chemical components
PDB has more than the folding of standard
polymers in 3-D
 It gives an insight of interesting special chemistry
Bound ligands
Modified aminoacids
Non-standard chemical components are often the
most interesting
The PDB ligand dictionary has served for many
years
As the reference dictionary for the chemical
definition of 3 letter codes in the PDB data
The ligand dictionary has been maintained by the
curators in all wwPDB sites
Problems were accumulated
Duplicate entries
Impossible chemistry
The definition of what a 3 letter code represents
was not clear and consistent
Stereo-chemistry was ignored
The MSDchem database
The database that supported the chemical
component dictionary in the MSD.
The curation team had an explicit clear definition
about ligands, right from the start
A distinct stereo-isomer;
connectivity,
bond orders,
absolute stereo-descriptors of atoms and bonds
This was reflected in the design and the
implementation of the MSDChem database
MSDchem ligand definition
The ligand identity
Atom, elements, bonds and bond orders
Atom and bond absolute stereo-descriptors
(Cahn-Ingold-Prelog)
Equivalent to a canonical stereo-smile or
INCHI string
DCF
C4' R
C3' S
C1' R
DCM
C4' S
C3' R
C1' S
Other properties
Atom names, and atom/bond ordering
Representative coordinates
Derived properties
Aromatic bonds
Smiles – INCHI strings
Systematic names
Idealised coordinates
Rings – planes
Atom Energy types
Ligand curation
For known ligands coordinates are checked
with ligand definition (Program DOHLC)
Atom labeling is checked
A new ligand may have to be defined
For a new ligand
Fundamental properties are checked
Derived properties are generated
Is it identical to an existing ligand with
another code? (DOHLC)
3TH
Not possible
New ligand
Actually it is
6CP
Ligands in the wwPDB
Improvement of the chemical dictionary
A core task of the wwPDB remediation project
Remaining issues and data errors were fixed
Duplicate identical ligands
No representative coordinates
Wrong valences
The definition of the ligand identity and the
deviations were agreed among wwPDB
The wwPDB invested significantly in this area
with a new software toolkit (ChemComp)
Replaced most of the MSDChem backend
Additional investment in chemical software
Use of chemical software packages
CACTVS
OpenEyes
CORINA
LexiChem
MSDChem not a separate data resource
Just loading of the wwPDB ligand dictionary in
Oracle
IUPAC atom names,deoxy-bases, better
chemical names
Difficult Issues
Molecules too big to be a single chemical
component
Special chemistry (like metal complexes)
Limitations of chemical software
Legacy chemical components that is hard to
deal with (like ions)
Components that have never been fully
observed
Modified components
The MSDChem web application
Public pages for the wwPDB ligand dictionary
Based on an Oracle database load
Various search options
Visualisation and navigation
Exporting in other formats
Has been running for almost 6 years
Is used and referred by
Ligand Depot (RCSB equivalent)
ChEbi at EBI
PubChem at NCBI
HIC-Up and others
Statistics
Hits per location
Number of ligands
edu
8000
7000
uk
6000
ebi
5000
other
4000
eu
3000
2000
com
1000
net
0
2000
2001
2002
2003
2004
2005
2006
2007
Daily average load of MSDChem
~ 400 queries
~ 100 distinct IP adresses
Search following references
Most common case: search for a 3 letter code
seen in a PDB file
Search for a chemical name or part of it found in
the literature
All known names are searched
Common, PDB
Systematic
A synonym
MSDChem search
3 letter code
Chemical name
Common, PDB
Systematic
A synonym
Ligand details
For every kind of search there is a result list
Summary information
Preview icon of the molecule
Links to pages for every chemical component
With detailed images
Links for more information about atoms, bond etc.
Various options for 3-D visualization
Download options for common chemical formats
Ligand details
Results overview
Ligand details
Ligand overview
Visualisation - Export
Coordinates
Ideal
Representative
Chemical formats
PDB
Molfile (SDF)
Searching for chemical composition
Often aspects of composition are known but not the
exact structure
Like particular elements (metals etc.)
Or particular chemical fragments
User friendly expression building pages based on
formula or fragments
Visually browse through the results
Formula range
Expression can be
built with web form
Example :
O1-4 N3-100 F0
1 to 4 oxygens
More than 3 nitrogens
No Fluorine
Anything else
Fragment search
Web form
Significant
fragments
Example :
More than 2
benzimidazoles
No piperazine
Anything else
Searching for parts of structure
An outline of the structure or of some characteristic
part is known
Looking for variants of molecules
Load the known target and remove the unimportant parts
Perform an sub graph search
Looking for chemical components with similar
fragments and localized chemistry
Load the known target and perform a fingerprint search
Substructure search
Applet to draw
diagram
Load and modify
existing ligand
May take a
couple of
minutes
Links to the PDB
MSDchem searches strictly the reference
dictionary
But provides links to the PDB entries that
include a ligand or a set of ligands
From ligand details pages
And from any query results page
Links to the summary pages for the entries
(MSD Atlas pages)
Or instances of the ligands in entries along with
their environment and interactions (MSDmotif)
Link to PDB
From any result page
Like a fragment search
Link to PDB entries with such ligands
Link to Binding sites
Details - interactions of these ligands in entries
Statistics – search within results
Ligand index – download
Download of the
complete archive
Compressed tar of
Molfiles (SDF)
CML (ChEBI style)
MSDChem XML
Relational database
Just listings
Smile strings – name
Summary
The wwPDB ligand dictionary provides the
chemistry of the PDB
The MSDChem backend has been merged in the
remediation project
The state of the dictionary has improved
The MSDChem web application provides searching
of the dictionary
Name
Formula
Substructure
Fragments - similarity