Cheminformatics Web Service Infrastructure

Download Report

Transcript Cheminformatics Web Service Infrastructure

Indiana University ECCR Summary
Infrastructure: Cheminformatics web
service infrastructure made available
as a community resource including
2D and 3D databases, predictive
models, statistics, docking, name to
structure conversion, 2D to 3D
conversion, similarity and clustering.
Research: Major research areas
include data mining of PubChem
bioassays including network models,
predictive models for cytotoxicity,
QSAR domain applicability, clustering
huge datasets, data mining
Delivering a CIC distance class at Michigan
chemistry literature and
documents, and Semantic Web
applications
Education: A leading center in
cheminformatics education
offering Ph.D., M.S. and
innovative Distance Education
program aimed at government,
academia and industry.
Collaboration with University of
Michigan.
Indiana University School of
http://www.chembiogrid.org
2D Map of
PubChem from
GTM
Cheminformatics Web Service Infrastructure
•
Extensive set of (Web) services made available as a community resource including 2D
and 3D databases, predictive models, statistics, docking, name to structure conversion, 2D
to 3D conversion, similarity and clustering. Teragrid and local supercomputing resources
provide scalability to millions of compounds. Prototypes a comprehensive open community
infrastructure framework for algorithm & application deployment:
Infrastructure
Algorithms & Applications
PubChem and Other Public
Chemical & Biological
Information Sources
Aggregation and Data
Mining Algorithms
Cheminformatics,
Comp. Chemistry, and
Statistical (R) Web Services
Knowledge Discovery Tools
Teragrid, IU, and other
Grid/Cloud Cyberinfrastructure
And Supercomputing Resources
Educational applications
Indiana University School of
http://www.chembiogrid.org
Exemplar Current Project: Pub3D Services
• Provides 3D structures for 17M
PubChem compounds.
• Scalable to hundreds of millions
of structures
• Accessible via SQL, Web page
and Web service interfaces.
• Can be included in workflows.
• Will include multiple conformers
• Backed by novel algorithms and
distributed DB architecture to
enable fast shape queries.
• Also enables density of chemical
space analysis.
Indiana University School of
http://www.chembiogrid.org
Current Major Research Areas
• Data mining of PubChem bioassays using Bayesian models.
• Using Cytotoxicity models to predict acute toxicity
– Collaboration with Stephan Schurer, Scripps (Florida)
• Algorithms for domain applicability of QSAR models.
• Exploration of chemical spaces using density of space approach.
• Virtual Screening for anti-malarials
– Collaboration with Jean-Claude Bradley,
Drexel University
– Two micromolar inhibitors of falcipain-2
identified
• Supporting predictive model deployment
and exchange (PMML)
• Cheminformatics cyberinfrastructure
Fast comparison of toxicity data sets
using binary fingerprints.
Indiana University School of
http://www.chembiogrid.org
Graduate Program in Cheminformatics
• Unique program in Cheminformatics: we are the only center in the U.S.
we are aware of offering a range of formal qualifications in
cheminformatics.
– As of fall 2008 will have 6 Ph.D. students, 8 M.S. students, and 4 graduate
certificate students who come from government, industry & academia.
– All courses are available by Distance Education including CIC courseshare
with Michigan.
– We have received Industry Fellowships from Lilly and Symyx.
• General review of
cheminformatics education in
Drug Discovery Today 11, 9&10
(May 2006), pp436-439
Delivering a CIC distance class at Michigan
• Distance Education J. Chm. Inf.
Model 2006; 46; 495-502
Indiana University School of
http://cheminfo.informatics.indiana.edu
Future Directions
• New Initiative: linking bioinformatics and chemical informatics.
• Parallel deterministic Annealing/GTM clustering and dimensional scaling
(MDS) of huge datasets
– Exploiting multicore and other advanced chip architectures
• Extracting and mining chemical
information in journal articles (SMILES
index NLP, and structure/ontology
searching)
• Use of Semantic Web for automatic
workflow composition.
• Compound & Bioassay Network
Models allow investigation of crossassay relationships and the use of
PubChem as a source of
polypharmacology.
• Network view of SARs providing a
framework to analyze structure activity
landscapes.
2D Map of PubChem from GTM
Indiana University School of
http://www.chembiogrid.org