Transcript - ChemAxon

What’s new in JChem back-end
and Markush storage, search
and enumeration
Szabolcs Csepregi
Solutions for Cheminformatics
Contents
• ChemAxon chemical database tools
• Main features of JChem Base, Cartridge
• Example interfaces: JSP, ASP, AJAX
examples
• Integration with other CXN products
• Markush structure storage, search and
enumeration
• Recent developments, plans
Chemical database products
JChem Base
– A library for adding chemical structures into relational
database systems. Available in Java, JSP and .NET
– Open-source web application example is available.
JChem Cartridge for Oracle
– Extends Oracle SQL with chemical operators and index.
– SQL interface for ChemAxon functionality
Instant JChem
– An all-in-one desktop chemical database application.
JChem Web Services – SOAP interface to JChem Base
JC4XL – Excel integration (coming)
3
Compatibility and integration
Supported chemical file formats:
•
•
•
•
•
SMILES
MDL MOL/RXN/SDF/RDF (v2000 and v3000)
CML, MRV
IUPAC and traditional names
InChI, mol2, PDB, etc.
Database engines:
•
Oracle, MySQL, MS SQL Server, MS Access,
PostgreSQL, IBM DB2, Derby, etc.
All operating systems through:
•
•
•
Java API (JChem Base)
.NET API (JChem Base + IKVM) – for Windows
SQL (Cartridge)
4
Structure searching: features
•
Substructure, Similarity,
Full, Full fragment, etc.
search types
•
Wide range of query atoms
•
Query properties
•
R-group queries
•
Full SMARTS support
•
Coordination compounds
•
Link nodes
•
Pseudo atoms, Lone pairs
•
Relative stereo
•
Reaction search features
•
Polymers
•
Position variation
•
Hit coloring ...
www.chemaxon.com/conf/Structural_Search.ppt
5
Structure searching: options
Some selected structure search options:
–
–
–
–
–
–
–
–
–
–
Chemical Terms filter constraint
Tautomer search
Stereo on/off
Ignore charge/isotope/radical/valence/polymers
Vague bond matching modes: „or aromatic”;
ignore bond types
Inverse hit list
Maximum search time / number of hits
SQL SELECT statement for pre-filtering
Ordering of results
etc.
6
Structure search: performance
Compound
registration:
Substructure search in
PubChem (19.5 million
compounds):
JChem Base 5.2.0,
Intel Quad Q6600 2.4GHz,
8GB RAM; Oracle 10.2.0.3
Number of
compounds
Elapsed time
Duplicates not
checked
Duplicates
checked
10,000
21 s
26 s
100,000
2 min
2 min 36 s
200,000
3 min 45 s
5 min 5 s
Query
Number of hits
Search time
2
0.81 s
93
0.79 s
5,855
1.457 s
142,950
11.076 s
7
Table types
Control allowed chemical structures and available
operations
• Molecule
• Reaction
• Markush
• Query
• Any structure
8
Example web applications
Open source JSP, ASP examples
– Marvin applets
are used for
query drawing
and structure
visualization
AJAX example
– Back-end is JChem
Web Services
– No Java is needed
for browsing
Demo
9
Integration
Integration with other ChemAxon tools:
– Custom, uniform chemical representation. (Standardizer –
see separate presentation today.)
– Automatically calculated properties by Chemical Terms
Calculated columns (Calculator plugins)
– Additional similarity calculations (Screen - JChem Base
only)
– Tautomer handling:
• Tautomer search
• Tautomer duplicate filter table/index option
• Custom tautomer transforms or canonical tautomer using
Standardizer
– Query drawing and structure visualization (Marvin)
Provides the most consistent interface and back-end.
10
Integration
Additional Cartridge functionality
–
–
–
–
–
–
JChem index (for non-JChem tables)
Communication with Oracle optimizer
Reaction based enumeration (Reactor)
Format conversions – image generation also
Markush enumeration (Calculator plugins)
Property predictions through Chemical Terms
(Calculator plugins)
11
Registration system
• New component for registration system is under development
(API only)
• Main features:
– Customizable business logic
• Multilevel duplication control
• Customizable corporate registration ID
• Handling of salts, batches, lots, samples, and mixtures
– Identification, split and registration of salt and solvent structures
Storage of input structures in original format
– Mock registration (dry run)
– Pre-registration through a transitory area
– Basic, customizable implementation examples
• Separate examples for chemists and registrars
• Web and Instant JChem interfaces will follow later
12
Handling of Markush structures
Markush structures
• Combinatorial Markush structure registration and
search features handled in search and enumeration
–
–
–
–
–
–
R-groups (nesting to any depth)
Atom lists, bond lists
Position variation bond
Link nodes
Repeating units
Homology groups (aryl, alkyl, etc.)
• Built-in
• User-defined
• Compatible Markush
enumeration plugin
Markush Enumeration
• Markush enumeration
plugin
–
–
–
–
Full enumeration
Selected parts only
Random enumeration
Calculate library size:
exact size of huge
Markush libraries
arbitrary precision or
Magnitude
– Scaffold alignment
and coloring
– Markush code
– Optional example
homology group
enumeration
Markush storage & search
• Available in JChem Base and Instant JChem
• No enumeration involved – can handle very complex
Markush structures (tested up to 1040, but no explicit
limits were built in.)
• Substructure and Full structure search
• Basic query features supported
• Substructure hit visualization: „Markush structure
reduction”
Markush demo
What’s new
What’s new: JChem Base
5.1
– Position variation in queries
– New fast & reliable tautomer duplicate search
5.2
– .NET API
– Polymer storage and search
– New query options and features including searching of
attached data, group matching of undefined R-atoms,
repeating units.
– Improved substructure search performance
– JChem Web Services
– New metrics for similarity search (Tversky, etc.) (5.2.2)
What’s new: JChem Base
Polymer support details
• Polymer brackets and properties(type, connectivity,
etc.) considered during search and registration
• Attached data search (optional) – attached
to atoms/bonds/brackets
• Source- and structure-based representation
equivalence is checked (but can be switched off)
– Addition to a double bond. E.g. polystyrene.
– Polymerization through elimination of water or HCl. E.g.
polyester, polyamide.
What’s new: JChem Base
Polymer support details (cont.)
• Ladder type polymers
• Phase-shifting (for ht SRU) (can be switched off)
• End group matching:
– * atoms: unspecified end groups
– Search option to switch on/off end group matching
• Copolymer types: co, alt, rnd, blk, grf, xl, mer, mod
• Polymer mixtures
• New search options
What’s new: Cartridge-specific
5.1
–
–
–
–
Tautomer duplicate filtering index option
Alter index option
Improved import speed (5.1.3)
Improved upgrade: no need to remove/recreate indices
(5.1.4)
5.2
– Interactive installer
– Increased substructure search performance (5.2.2)
– Tversky similarity search (5.2.2)
What’s new: Markush
• New Features
– Homology groups
• 19 built-in groups
• Customizable:
– Examples (for built-in groups,
enumeration only),
– Full user-defined homology groups
defined by R-group definition
• Marvin templates for
easier sketching
– Import reagent files as R-groups
– Position variation and Repeating units
Plans
Plans: JChem Base & Cartridge
JChem Base
• Further speed improvements (SSS, similarity)
• New vague bond level options
• R-group decomposition integration
• Improved support for Screen molecular descriptors
Cartridge
• Screen molecular descriptors (BCUT, pharmacophore
similarity, chemical hashed fp, etc) and metrics (Euclidean,
Dice, etc.) for similarity search
• User-defined descriptor fingerprints
• Markush tables and search
• JChem Server, JChem cluster
Plans: Markush
– .VMN import (format used by Merged Markush Service &
Derwent World Patent Index)
– Multiple graphical attachment points of R-groups
– Homology variation queries
– Overlap analysis of Markush structures
– Homology group properties (# of atoms, branching points, #
of heteroatoms, etc.)
– Conditions for Markush variables
Summary
• JChem Base and Cartridge are comprehensive and
efficient
• Markush structure storage, search and enumeration
now reaching patent features coverage
• Continuous development, improvements in the
pipeline
Find out more
• Product descriptions & links
www.chemaxon.com/products.html
• Forum
www.chemaxon.com/forum
• Presentations and posters
www.chemaxon.com/conf
• Download
www.chemaxon.com/download.html