InstantJChem: a flexible chemical database system
Download
Report
Transcript InstantJChem: a flexible chemical database system
InstantJChem: a flexible
chemical database system
G. Marcou, D. Horvath
+Laboratoire d’infochimie, Université de Strasbourg, 1, rue Blaise Pascal,
67000 Strasbourg
1
Introduction
The goal is to present InstantJChem for the
storage and manipulation of chemical
information
1. General presentation
2. Database search
3. Creation of a database from scratch
What is a database?
A database stores data in an ordered form on a
precise subject.
A relational database stores information into
tables which possess inter-references
A relational database management system
(RDBMS) is a software that manages relational
databases
InstantJChem is not a database and is not an
RDBMS.
What is InstantJChem?
InstantJChem is a friendly interface between a
RDBMS, chemical information and the user.
User
RDBMS
Chemical Information
Key concepts of InstantJChem
Databases
Data
Projects
Schema
Entities
Views
Trees
and Tables
Exercise 1
Create a new project names IJCExercises…
Key concept: Project
Project
icon
contains
resources and
connections to
one or more
databases.
Exercise 1
…and import the file SC100.SDF in it….
Key concept: Schema
Schema/
Database
icon
Contains
connection to a
database and
special tables
(JChemProperties)
Key concept: Database and Tables
Table
icon
Database and
tables are managed
by the RDBMS.
Actually store
information.
What can be stored
Type
Description
Standard table
Integer
Long integer: 232 = 4294967296
Text
User can specify widths of text fields as large as needed.
Real
Real double-precision
Date
Allows to store dates.
Boolean
Value is True or False
List (Standard)
To store a list of database items
JChem table
Chemical terms
A list of functions evaluated on chemical structures: logD, pKa,
tautomers,...
Structure
Chemical structure, automatically created with a Jchem table
Key concept: Entities
Entity
icon
An entity is a
representation of
data.
It is a unique interface to conceptually different
types of tables (Standard, Chemical, SQL,
Extractions, etc).
Key concept: Data Trees
Data Tree
icon
A collection of
entities and views.
Organize information using a hierarchy (parentchild relationship between entities).
Exercise 1
….Customize a browser for it.
Key concept: Views
Views
icon
An interface to
data.
For simple data, a spreadsheet view is relevant.
For complex relational data, a form is mandatory.
Exercise 2
In the SC100 database, search for fluorobenzene and pyridine
containing molecules. Use Substructure or Similarity search.
Exercise 2
In the SC100 database, search for fluorobenzene and pyridine
containing molecules. Use Substructure or Similarity search.
Substructure search: 20 hits
Similarity search: 0 hits
Substructure search: 14 hits
Similarity search: 0 hits
Similarity search uses Chemical Hashed
Fingerprints defined at database creation.
Chemical Hashed Fingerprints (CHF)
• Pattern Length:
number of bonds of a
pattern
• Fingerprint Length:
total number of bits to
store the fingerprint
www.chemaxon.com
Efficient annotation to accelerate
structure search
• Bits per pattern:
number of bits a
pattern shall set on
Exercise 3
Combine molecule 25 and 89 into a pseudo-molecule to perform a
superstructure query.
Exercise 4
Use compound 46 as a Full and Full fragment query to search the
database. Repeat after removing the bromide from the query.
Structure Searches
www.chemaxon.com
Exercise 5
Search benzene containing compounds, which name contains
“pyrimidin” and annotated as “Good” concerning their aqueous
solubility.
Exercise 6
Search for compounds with at least one aromatic ring containing
at least on Nitrogen atom
Exercise 7
Search for compounds which MolWeight > 200 and not containing
a benzene ring
Exercise 8
Search for compounds with MolWeigh > 200, then for compounds
without a benzene ring and search for the union of the hit lists.
Execrise 9
Search for compounds possessing more than 4 microspecies at
pH=4.0….
Exercise 9
… Export your hit list.
Exercise 10
Import in your project the file ISICCRsm.RDF…
Exercise 10
… Create a Browser for this database
Exercise 11
Search for reactions including an imidazole ring into their
reactants then into their products.
Exercise 12
Add to your Schema a new data tree and structure entity named
AlkanBoilingPoint…
Exercise 12
… and add a floating point value field named BoilingPoint.
Exercise 13
Add to the AlkanBoilingPoint entity the following data.
Exercise 14
Add to the AlkanBoilingPoint entity a new date field named Date
and fill it.
Exercise 15
Add to the AlkanBoilingPoint entity a calculated value of LogP
using a Chemicalterm field.
Summary
Create a project and schema
Import data
Search by substructure, superstructure, similarity,
and exact match
Search by keyword
Combining queries and result lists
Export query results
Create a new database
Conclusion
InstantJChem is a Chemoinformatics layer above a
standard SGDB.
Provides many more Chemoinformatics services
(databases overlap, QSPR modeling, plots,
enumeration, scripting)
SGDB
InstantJChem