04_Arthur_Thomas - Swiss National Grid
Download
Report
Transcript 04_Arthur_Thomas - Swiss National Grid
DO (WILL) GRIDS MATTER
IN
DRUG DISCOVERY?
Arthur Thomas
SIB/Vital-IT and SwissBioGrid
Biology: Big Science!
Sanger Institute
Sequencing Factory
Automation Partnership:
Argonne Advanced Photon Source:
World’s largest X-ray Crystallography System
HTS “Factory” 2.5x105/8hr
107 data points/year
US NHMFL:
Osaka/Hitachi UHVEM:
World’s largest electron microscope
900MHz 21-T wide-bore NMR Facility
Siemens PET scanner
Biology: Big Data!
• 32,000 measures/spectrum
• 900 spectra/LC run
= 28,800,000 measurements (55MB)/LC run
[Source: Selinger et al. Trends in Biotech. (2003)]
• 55 MB/LC run
• 3 MS-MS/spectrum
• 200 KB/MS-MS
• (900 x 3 x 200 KB) + 55 MB = 595 MB
• 10 spectra/mm = 100 spectra/mm2
• 100 x 100 = 10,000 spectra/cm2
• 16 x 16 cm2 gel
• 6 x 16 x 10,000 = 2,560,000 spectra/gel
• 2,560,000 x 200 KB = 512 TB
[Source: Ron Appel (SIB)]
Biology: Big Data!
~1000 different biology reference data
bases:
• Genome/Nucleotide Sequence Databases
• RNA sequence databases
• Protein sequence databases
• Structure Databases
• Metabolic and Signaling Pathways
• Human Genes and Diseases
• Microarray and other Gene Expression
Databases
• Proteomics Resources
• Other Molecular Biology Databases
• Organelle databases
• Plant databases
• Immunological databases
Source: M Y Galperin, Nucleic Acids Research (2006)
Source: GenomeNet, Kyoto
Biology: Visualisation! Collaboration!
NCMIR “BioWall”
SAGE
HP Halo Collaboration Studio
Drug Discovery & Development
12+ years, $1-1.25 billion
Sequence
Homology,
Gene
Expression,
Proteomics,
System &
Disease
Modelling
Comb.
Libraries
HTS
ADME/Tox
QSAR
Paradigm Change
Old Science
New Science
Classical chemistry
Combinatorial
chemistry
‘Omics
Basic biology
‘Omics,
Biotechnology
Experimentation
Computation
Low throughput
High throughput
Animal studies
Molecular imaging
Trial Design
The Discovery Sieve
Getting Less and Less for
More and More
Source: PPD Inc
.
Pharma Challenges
• Declining productivity and ROI
– $1+ billion to bring a drug to market, $1 million/day revenue lost to
delay, declining post-patent lifetimes (5-7 years)
– Most drug candidates fail
• 1:10 development candidates fail
• 1:2 clinical trial candidates fail
– Number of NCEs has been falling for a decade
– 2:3 drugs do not generate a lifetime return
– Blockbuster (“one size fits all”) and “me too” mentalities not sustainable;
many patents (~$72b) expiring in next 5 years
– Stricter regulation (pre- and post-market), greater price pressure and
greater liability (Vioxx, Baycol, …)
• Deluge of data, drought of knowledge
– Huge investment in high-throughput data generation technologies not
matched by investment in data analysis technologies
– Poorly integrated data silos
• Increasingly collaborative landscape
– Challenges of sharing information across enterprise boundaries
New Pharma Ecosystem?
• 1,500 ($50b+) pharma/biotech partnerships in last 7
years
Source: Recombinant Capital
– e.g. 50% of Roche pharma/diagnostic revenues from licensing
deals
Typical Grid
Applications
“Instead of spending millions of dollars and
•
years in the lab screening hundreds of
of compounds, now it will be
Drug Discovery thousands
possible to screen hundreds of millions of
molecules in months” (Graham Richards)
– Sequence analysis
– Microarray analysis/network inference
– Virtual Screening (Autodock, CHARMM, Glide, FlexX)
• Development
– ADME, PK/PD (NONMEM, WinNonLin)
– Trial design (TrialSimulator)
– Process validation, compliance
• Marketing
– Market data analysis (SAS, SPSS)
Pharma Grids: the Good News
• J&JPRD1
– 1,200 rising to 3,000 PCs; mix of Linux (clusters) and Windows
(desktops)
– 20+ applications
– United Devices GridMP
• Novartis2
– Began in 2001
– Now 2,700+ PCs (out of 65,000), 5+ Tflops, 25,000 PC’s eventually?
– Apps: docking, genome annotation, chemoinformatics, clinical trial
simulation, text mining
– $400k investment, $2+ millions annual savings
– United Devices GridMP for PC farm
– Rigidly standardized PC environment
• gsk1
– 1,000+ PCs
– $1 million estimated annual savings
– United Devices GridMP for PC farm
1 Source: United Devices, Inc.
2 Source: Manuel Peitsch, Novartis
Pharma Grids: not-so-good News
“Less than half of the top 20
pharmaceutical companies are
implementing Grids”
[William Fellows, 451 Group]
Barriers to Grid adoption
• Difficulty of Building a Business Case
– Cui bono?
– Measuring the ROI?
• Unsuitable licensing models: driving open source?
• Trust and Access Control issues
– Extending to the balkanized (fire-walled) global enterprise
– Extending to the whole development ecosystem
• Technical Barriers
– Lack of suitable (“embarassingly parallel”) applications
– Heterogeneity of platforms
– Poor standardization of middleware (commercial vs open
source): will SOA (OGSA) solve this?
– Poor data grid management, semantic integration: driving
development of ontologies?
– Limited bandwidth: increasing use of Lambda rails?
Overcoming the Barriers:
Building a Business Case
• Capacity Improvement
– Driven by ROI
– Reduced build and running costs of PC Grids
cf. dedicated clusters
• R&D Process Innovation
– Driven by need for new ways of doing
– Collaborative research (industry/academia)
– “Open source research” (NIH, Wellcome)
Overcoming the Barriers:
Technical
• Software
– Less intrusive, more standardized middleware
– Web services, OGSA
• Data Management
– DataGrid techologies
• Data Integration
– Ontologies and shared knowledge spaces
• “Utility/On-Demand” Computing
• Bandwidth
– National and international LambdaRails
• Virtual Laboratories/Organizations
LambdaRails™
Source: OptiPuter Group
SwissBioGrid: A National Resource
• Dedicated to large-scale computational applications in bioinformatics,
modelling, chemoinformatics and bio-medical sciences
• CSCS manages GRID infrastructure, middleware, security
• SIB/Vital-IT has primary responsibility for providing bioinformatics
application validation and optimization, Web services, database services
• Some sites compute-intensive, some data-intensive
SwissBioGrid: A Mixture of
Clusters and
PCs
ETHZ Hreidar
(Sun Grid Engine)
SIB Vital-IT (Platform LSF)
UniZH Matterhorn
(Sun Grid Engine)
UniBS BC2 cluster
(Platform LSF)
ProtoGRID
Metascheduler
UniBS/FMI PC farms
CSCS
- Ticino Cluster (Itanium, LSF)
- Terrane Cluster (PS 5, PBS)
- Sun Cluster (PBS)
Some Good News…
“Open source discovery” is thriving!
• Anthrax (7,000+ CPU years)
• Smallpox (68,000+ CPU years)
– 400,000+ CPUs, 53,000+ CPU years to date,
75+ CPU years/day
• Human Proteome folding, Phase II (761+ CPU
years)
• Cancer project Phase II (437+ CPU years)
• AIDS project (25,000+ CPU years)
More Good News…
WISDOM
•
Malaria [500 million infections, 1.3 million deaths/year]
–
–
–
–
•
Autodock, FlexX
80 CPU years in 6 weeks
1,000,000 ligands against 11 targets
Top 1,000 hits identified
Novartis PC-Grid
Uni ZH PC-Grid
VitalIT IA64
Avian Flu [the next Big One]
–
–
–
77 CPU years on 2000 computers
300,000 ligands against 8 Influenza A neuraminidase targets
Hits now being analyzed
Dengue [10 million infections, 100,000 deaths/year]
VitalIT Nocona
– Autodock, Glide
BC2 PC-Grid
– Mixed PC and cluster Grid
– 130,000 ligands from NCI DTP library docked against dengue NS5 protein
– ~ 1 CPU min/dock
BC2 Athlon
– 70 hits found, being evaluated in vitro
BC2 Opteron
– Plan to dock 2.7 million ligands from ZINC library
– 1875 CPU-days for 1 target/1 site/1 parameter set/1 library (“parameter sweep”)
From Data Sharing to
Knowledge Sharing
• DataGrid
– SwissBioGrid experiment in data grid using
Avaki
– Complex update patterns
• KnowledgeGrid
– Aggressive use of ontologies for knowledge
standardization and sharing
• Gene Ontology
Thank You!
Questions?