Transcript Slide 1

ChemModLab: A Web-based
Cheminformatics Modeling
Laboratory
S. Stanley Young + ECCR and ChemSpider Teams
S. Stanley Young + ECCR and ChemSpider Teams
ChemSpider : A Web-based
Chemical Informatics Resource
What is ChemSpider?

ChemSpider is a molecular structure-centric web
service for chemists:



Chemical structure drawing, manipulation, visualization,
modeling & databasing
Web location to deposit, curate and enhance data associated
with chemical structures
Web structure-based access to federated chemistry databases
representing chemical vendors, literature, online data, patents
and other forms of chemistry data
3
How do people generally use ChemSpider?

Searching for chemical structures, in rank order, via:






Registry numbers, trade names and synonyms.
Structure identifiers such as SMILES or InChI
Intrinsic properties: commonly mass-based searches executed
by mass spectrometrists
By systematic names: IUPAC or CAS Index name
Generation of physicochemical properties
Text-based searching of Open Access articles
4
ChemSpider Status August 2007


Online database of over 16.5 million structures
Systems in place for:




Indexing of and Integration to:




Single structure and data collection depositions
Association of analytical data with structures
Ability to curate data for each individual record
Over 70 individual databases
Patents from the US, European and Asian Patent offices
Text-based searching of over 50,000 Open Access articles
Over a thousand unique users access ChemSpider per day
5
Flexible Boolean Searching
6
Predicted Properties Details “Prozac”
7
Search result: 49 hits in 2.8 seconds
8
Integrated Visualization Tools
9
External Integrations - Wikipedia
The links between Wikipedia and
ChemSpider are formed automatically
10
What is ChemModLab?

ChemModLab is a Web Service for building and evaluating
QSAR models.

Send your data: assay results and SD file.

Use any or all of five descriptor types (2D).
(Use your own descriptors)

Use any or all of 16 statistical modeling methods.

Predict potency of untested compound.
11
Virtual Screening
ChemModLab
ChemSpider
12
ChemModLab Dialog (1)
Data Input
13
ChemModLab Dialog (2)
Five 2D Descriptor Sets
14
ChemModLab Dialogue (3)
16 Modeling Methods
15
ChemModLab Modeling Methods
16 Statistical Modeling Methods
•Trees: RandomForest, rpart, tree
• Neural networks
• k-nearest neighbors
• Support vector machines
• Partial least squares
• Partial least squares with linear discriminant analysis
• Least angle regression
• Ridge regression
• Elastic net
• Principal components regression
• Family ensemble of k-nearest neighbors, using 70% selection
• Family ensemble of tree, using 70% selection
• Family ensemble of rpart, using 70% selection
• randomForest using 70% selection
16
ECCR@NCSU + ChemSpider Plan
User submits data to ChemModLab to get QSAR Model(s).
Model is sent to ChemSpider.
ChemSpider computes a “virtual screen”.
The hit-list is clustered and sent to the user.
17
Accumulation curves
Compare descriptor sets, given a method
18
Accumulation Curves
Compare modeling methods, given a descriptor set
19
Diversity Map
Cluster
Active
Compounds
Modeling Methods
20
Continuous
Response
21
Continuous Response
22
Continuous
Response
23
Model
Evaluation
Take detailed
looks at which
models?
AID348 (NCGC):
KNN – Ph
ENet – CAP
RF – B#
RF – CAP
RF – FF
Tree – CAP
Tree – Ph
Tree – FF
PLS – CAP
24
Summary
1. ChemSpider is a web chemical informatics center.
2. ChemModLab is a free, web service for QSAR.
3. Together they support sophisticated virtual screening.
* ChemModLab is supported by the NCI RoadMap project.
25
ECCR@NCSU Group
ChemSpider Group
ChemModLab Team
ChemSpider Team
Jacqueline M. Hughes-Oliver
Atina D. Brooks
Gary W. Howell
Kirtesh Patil
Stan Young
Qianyi Zhang
Antony Williams (project
lead)
eccr.stat.ncsu.edu
A rotating team of
advisors and developers
including many
contributions from the
Open Source community
www.chemspider.com
26