Visit to Genedata

Download Report

Transcript Visit to Genedata

VS Explorer – Analyzing large scale docking experiments
ChemAxon 2005 User Group Meeting
Marc Zimmermann
Martin Hofmann
Marc Zimmermann, 2005
ChemAxon UGM05
Selection of Potential Drugs
• 28 million compounds currently known
• Drug company biologists screen up to 1 million compounds against target
using ultra-high throughput technology
• Chemists select 50-100 compounds for follow-up
• Chemists work on these compounds, developing new, more potent
compounds
• Pharmacologists test compounds for pharmacokinetic and toxicological
profiles
• 1-2 compounds are selected as potential drugs
Page 2
ChemAxon UGM05
Marc Zimmermann, 2005
Page 3
High Volume Screening Analysis – the Methods
Screening
active
inactive
HTS
vHTS
(similarity, docking)
Assembling
Filtering
Clustering
Modeling
Virtual Screening – Computational or in silico analog of biological screening
o
Score, rank, and/or filter a set of structures using one or more computational
procedures
o
Helps to decide:
 Which compounds to screen
 Which libraries to synthesize
 Which compounds to purchase from an external source
ChemAxon UGM05
Marc Zimmermann, 2005
Page 4
High Volume Screening Analysis – the Tools at SCAI
Screening
VS Explorer
FTrees
FlexX
Assembling
Filtering
GRID Layer
ProMiner
TopNet
HTSview
Clustering
DB Annotator
Modeling
ChemAxon UGM05
Marc Zimmermann, 2005
Computational Aspects of Drug Discovery : Virtual Screening
• Enable scientists to quickly and easily find compounds binding to a
particular target protein
o
growth of targets number
o
growth of 3D structures determination (PDB database)
o
growth of computing power
o
growth of prediction quality of protein-compound interactions
• Experimental screening very expensive : not for academic or small companies
• Aim :
Active molecules
Tested molecules
Page 6
ChemAxon UGM05
Marc Zimmermann, 2005
Page 7
Grids for neglected diseases and diseases of the developing world
In silico drug discovery process
(EGEE, Swissgrid, …)
SCAI Fraunhofer
Clermont-Ferrand
Support to local
centres in plagued
areas (genomics
research, clinical trials
and vector control)
Swiss Biogrid consortium
Local research centres
In plagued areas
The grid impact :
•Computing and storage resources for genomics research and in silico
drug discovery
•cross-organizational collaboration space to progress research work
•Federation of patient databases for clinical trials and epidemiology in
developing countries
ChemAxon UGM05
Marc Zimmermann, 2005
Page 8
Structure-Based Virtual Screening
Protein-Ligand Docking
Target Protein
Ligand database
o
Aims to predict 3D structures
when a molecule “docks” to a
protein
 Need a way to explore the
space of possible proteinligand geometries (poses)
Molecular
docking
 Need to score or rank the
poses
o
Problem: many degrees of
freedom (rotation, conformation,
solvent effects)
Ligand docked into protein’s
active site
ChemAxon UGM05
Marc Zimmermann, 2005
Page 9
Grid VS Results Browser
• Quick overview on very large log-files
• Sorting and merging of files
• Storing and retrieval in databases
• Similarity searches and property predictions
• Interface to R statistics box
M END
> <Object Id>
MAC-0000100
> <Batch Ref>
03
> <Supplier Object Id>
"Smiles";"Data"
6743501
"c1(N2CCC(CC2)C(OCC)=O)sc3c(ccc(Cl)c3)n1";MAC-0000001;02;101.66;104.66
"C(=O)(Nc(cc1)ccc1Cl)N(CCCN2c(c(Cl)cc3C(F)(F)F)nc3)CC2";MAC-0000002;02;101.14;105.89
> <ENZ_KINETIC_RES_ACT.RES_ACT>
"n1(CC(CNCCNc2nccc(n2)C(F)(F)F)O)c3c(cc1)cccc3";MAC-0000003;02;101.64;97.32
"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Cl)Cl)CC2)cn1";MAC-0000004;02;100.09;101.14
"[N+](=O)([O-])c(ccc1N(CCCN2C(=S)Nc3ccc(cc3Br)F)CC2)cn1";MAC-0000005;02;108.98;97.02
"C(F)(F)(F)c1ccnc(NCCNC(=O)c2ccco2)n1";MAC-0000006;02;110.19;106.15
"C(F)(F)(F)c1ccnc(NCCNC(c2ccccc2)=O)n1";MAC-0000007;02;107.42;98.46
"C(NCc1ccco1)(=S)Nc(cccn2)c2";MAC-0000008;02;103.86;97.98
concat('ZINC', lpad(p.sub_id_fk,8,'0')) | target | ligand | conformations
|| score || time
"C(F)(F)(F)c1ccnc(NCCNC(=S)Nc(cccn2)c2)n1";MAC-0000009;02;107.77;98.6
ZINC00000057 | 1cet | ZINC00000057 | 172 || -7.45 || 3.25
"C(=O)(c1cccs1)N(CCCN2CC(O)COc(ccc3C(C)=O)cc3)CC2";MAC-0000010;02;107.41;104.92
ZINC00000061 | 1cet | ZINC00000061 | 203 || -18.37 || 3.84
s
"C(F)(F)(F)c1ccnc(NCC=C)n1";MAC-0000011;02;105.78;106.84
ZINC00000066 | 1cet | ZINC00000066 | 241 || -25.58 || 39.92
s
"N1(CCNc2ncccc2C(F)(F)F)C(=O)CC3(CCCC3)C1=O";MAC-0000012;02;105.26;103.38
ZINC00000122 | 1cet | ZINC00000122 | 399 || -14.14 || 7.41
s
"N1(CCCNc(c(Cl)cc2C(F)(F)F)nc2)C(=O)CC3(CCCC3)C1=O";MAC-0000013;02;102;106.84
ZINC00000197 | 1cet | ZINC00000197 | 272 || -8.60 || 2.44
s
• Prototype is under construction
ZINC00000290
ZINC00000349
ZINC00000453
ZINC00000484
ZINC00000607
|
|
|
|
|
1cet
1cet
1cet
1cet
1cet
| ZINC00000290
| ZINC00000349
| ZINC00000453
| ZINC00000484
| ZINC00000607
|
|
|
|
|
259 ||
82 ||
256 ||
447 ||
418 ||
-15.00 || 20.40 s
-10.81 || 22.20 s
-14.61 || 3.76 s
-18.33 || 35.53 s
-15.77 || 7.43 s
ChemAxon UGM05
Marc Zimmermann, 2005
Page 10
Rapid prototyping using ChemAxon Libraries
• 100% Pure JAVA (JRE)
o
Swing
o
JTable
• Using ChemAxon (MarvinBeans) for the chemical stuff
• OJDBC for database connection to Oracle
GUI (Swing)
Table Module
Chem Module
DB connect
File I/O
Marc Zimmermann, 2005
ChemAxon UGM05
Molecule Rendering
From spreadsheets to molecular spreadsheets
o
Overloading cellRenderer with Marvin from
 Switch SMILES  Structure on / off
Page 11
ChemAxon UGM05
Marc Zimmermann, 2005
Page 12
File Import / Export
• Implemented as a thread
• Comma Separated Files
o
CSV Parser
o
Preview Window
o
Tag missing Values
• SDF Molecular Files
o
SDF Properties Names as Row-Keys
o
Import Coordinates
o
Based on MolImporter from
Preview
Marc Zimmermann, 2005
ChemAxon UGM05
Page 13
Smart Indexing for large Collections
Index
FilePointer
•
Large index storing filepointers
or database keys
•
JAVA TableModel only stores
the full information for a limited
number of elements (cache)
Marc Zimmermann, 2005
ChemAxon UGM05
Page 14
Interactive Focus on Data
Index
•
Large index storing filepointers
or database keys
•
JAVA TableModel only stores
the full information for a limited
number of elements
FilePointer •
•
EventHandler for scrolling
triggers reload from external
memory (e.g. a cursor for RDB)
Update of the TableModel
ChemAxon UGM05
Marc Zimmermann, 2005
Page 15
Column Sorting
Index
sort(List)
•
EventHandle starting a sorting
thread
•
Resorting of the Index for flat files
•
New database query:
+ ORDER BY columnLabel
•
Coming next:
o
Object
FilePointer
o
Implementation of efficient
online sorting algorithms in
order to reduce the file access
Merging of two tables
Marc Zimmermann, 2005
ChemAxon UGM05
Page 16
DB Annotator: Semantics for databases
Semantic annotation of relational data
o
Linking databases and ontologies
o
Using the VS Explorer as Plugin
VS
Explorer
Ontology
browser
Marc Zimmermann, 2005
ChemAxon UGM05
Page 17
DHFR Assay for E.coli:
• Folate -> DHF -> THF -> synthesis of thymidin
DHF
• Important for cell growth
• DHFR inhibitor: Trimethoprim
Bioorg Med Chem Lett. 2003 Aug 4; 13(15):2493-6
High throughput screening identifies novel inhibitors of
Escheria coli dihydrofolate reductase that are competitive
with dihydrofolate.
Zolli-Juran M, Cechetto JD, Hartlen R, Daigle DM, Brown ED.
http://hts.mcmaster.ca/HTSDataMiningCompetition.htm
Trimethoprim
Marc Zimmermann, 2005
ChemAxon UGM05
Docking with FlexX1
• PDB structure 1RA2
• Cocrystallized DHFR and NADP
• FlexX places water particles
15th Symposium on QSAR 2004; Poster
Drilling into a HTS data set of e. coli.
Zimmermann M, Tresch A, Maass A, Hofmann M
1Rarey
M, Kramer B, Lengauer T and Klebe G, J Mol Biol 1996, 261(3):470-89.
Page 18
ChemAxon UGM05
Marc Zimmermann, 2005
Page 19
In silico Screening Workflow:
Training Set
Test Set
Docking
Fragment
Analysis
QSAR
2D Similarity
Analysis
HTS
MD
Simulation
Classification
active
inactive
Activity
Region
Candidates
ChemAxon UGM05
Marc Zimmermann, 2005
1CET – Lactate Dehydrogenase of Plasmodium Falciparum
Malaria Target:
o
Chloroquine binds in the cofactor
binding site of Plasmodium
Falciparum lactate dehydrogenase
o
PDB structure: 1CET
o
Ligand: Chloro-Quinolin
o
Test Ligands: Ambinter data set
from ZINC
Page 20
ChemAxon UGM05
Marc Zimmermann, 2005
Page 21
1CET vs. 50 000 Compounds on 200 Nodes: Global Statistics
• Done : 100%
• Rescheduled : 46
• Running on nodes
: 2296 h – 96 days
• Grid Time
: 205,5 h
o
Autodock.pl
: 2288 h
o
Scheduled
: 179h
o
Total transfer
: 8h
o
Ready
: 78 mn
• submission script
: 36 h
o
Waiting
: 78 mn
• time gain of
: 64 (instead of 200)
o
Submitted
: 24 h
• Ideal
: 11,5 h
ChemAxon UGM05
Marc Zimmermann, 2005
Planning Next Steps
• 2M compounds vs. 1 protein target
o
Input : 13GB
o
Output : 2 TB output (dlg), 0,5 TB (pdb)
o
12 CPU/year
o
Ideal : 3 days with 1350 CPUs
o
Reality : clusters grid with users, queues, errors…
• Challenges for our application?
o
100% obtained results
o
Minimal process time
o
Grid resources consuming (storage, cpu)
o
User interface for the application
o
…
Page 22