Transcript robgill

Architecting the Virtual Organisation
Rob Gill
Biology Domain MDR-IT
March 2007
Drug Discovery Process
GSK is a research based pharmaceutical company – new
products from R&D pipeline
Gene
ID
Disease
Association
Gene-to-Target
Identify gene or target
protein important for
disease mechanism.
Target
Selection
Screening
Hit to
Lead
Cost
Target to Lead
Identify and Optimise
Inhibitors / modulators
Preclinical
Candidate
Candidate
Selection
Evaluate
• Safety
• Dose
• Toxicity
PoC,
File
Clinical Launch
FTIM
Test Hypothesis
in Man
• Efficacy
Informatics links the information used
to find and validate new targets
TARGETS
SCREENS
CANDIDATES
?
700000
600000
500000
400000
300000
200000
100000
0
BUT….. All these tools are built
and integrated within GSK
DRUGS
R&D Strategy: Virtualisation
 Total investment in drug research:
– GSK R&D: ~15,000 Scientists, >$4B R&D spend
– PhRMA member companies¥ (2005): >$51B R&D
spend
– Biotechs (2006): >$25B R&D spend
 R&D strategy is to tap into external knowledge and
expertise through a network of external alliances
sharing the risk, reward and control
 Centre of Excellence for External Drug Discovery
(CEEDD) launched in 2005
¥Includes
GSK
R&D Strategy: Globalisation
 The rate of growth of scientific and technical graduates
in Asia is out pacing the US and Europe
 China is one of the fastest growing countries:
– well over a million scientists and engineers are
graduating each year (National Bureau of Statistics,
2005)
– by 2010 it is predicted that China will surpass the
United States in the number of science and
engineering PhDs conferred (National Bureau of
Economic Research)
 R&D strategy is to be able to tap into this wealth of
knowledge along with other developments in the global
market.
GSK IT Definition of Virtualisation
“Virtualisation will allow GSK to operate seamlessly across
organisational, process and technology boundaries enabling GSK
to: exploit process expertise, partner effectively and flexibly
outsource”
Here’s why….
 Development of Alliances
 Access to Service Providers (CROs)
 Integrated Outsource Partners
 Better access for key opinion leaders
 Acquisition
 Globalised distributed workforce
Challenge for IT in GSK
 Business trend is away from the “fortress”
towards an “ecosystem”
 However the GSK IT environment was not
designed for this
 Significant overhead required to be integrated
into R&D/GSK processes
 Most solutions are “bolted on” to our current
infrastructure
 Virtualisation is one of the key streams of GSK
strategy
IT Techniques needed to support this
transition
Technical Architectures
Information Architecture (IA)











Speed Vs Flexibility
Need to simplify our environment
Necessity to support remote sites
Reduce costs
– Development
– Support
Capture processes across Biology
Publish and maintain models
Use to direct the generation of data services
Creation/Use of Ontologies
Identify “integration points”
Design out overlaps
Drive GSK Integration
Business Process Modelling & Model
Driven Design

If we are to fully embrace the Virtualisation challenge we need to look again
at how we capture and implement business processes.

Process Definition and Information Standards
– Architects and Business analyst in conjunction with business process owners
define the process model and information passing between process tasks
using a business process modeling tool
Implementation
– Software developers implement services and define business rules
Process Optimization and Monitoring
– Business process owners can then monitor metrics to analyze and optimize
the process
Better modeled processes and SOA approaches then open up GSK systems
to a number of delivery approaches
– Enterprise Service Bus (ESB)
– Workflow Tools (InforSense, Taverna …)



Business Process Management Architecture
Bespoke point-to-point interfaces between systems and adhoc communication do not allow business process to be
easily reconfigured, outsourced, or tracked. Workflow is either hard-coded into application or handled by adhoc
communication.
Inventory
System
Request
System
QC
System
Delivery
System
email
phone
phone
phone
Workflow is externalized and visible and point-to-point interfaces eliminated.
This allows business process to be easily reconfigured, outsourced and
metrics tracked.
Process Server
Request
Product
Build
Product
QC
Product
Deliver
Product
Task queue
Process Monitor
(captures metrics)
Request
System
Product
system
QC
System
Delivery
System
Use of Information Architecture
 Business Process Management takes us
closer to the Virtualisation goals
 However building this on top of an
unmanaged data architecture still leaves us
unable to fully benefit from the distributed
approach
 To properly Virtualise we need to both
understand business process and have a
managed data architecture.
Historical Ad-Hoc Integration (As Required)
Networks
View
Reagents
View
Gene View
Platform
View
Connection
Spaghetti
Move to “services” (But no IA)
Networks
View
Reagents
View
Gene View
Services
Platform
View
Services
Services
Hidden
Spaghetti
Modelled data and business process
Business
BRAD
Perspective
Genomics
Perspective
Discovery
Perspective
Services
Services
Data
Services
Genetics
Perspective
Gene Data
Protein Data
Gene
Omics
Results
Platform
Networks
Data
Network
Workflow Analysis
Visualise
Get
Gene
Sequence
Get
Protein
Sequence
Gene Data
Get
Gene
Annotation
Protein Data
Gene
Get
platform
results
Get
all
results
Omics
Results
Platform
Filter
By
Technology
Get
closest
neighbours
Filter
by
Species
Networks
Data
Network
Get
All
Pathways
Why Workflow ?
 Drives modularisation and aligns with model
driven approach
– Good for prototyping and process capture
– Very fast response time for development
 Captures scientific knowledge as part of the
workflow design process
 Built in flexibility for a rapidly changing
environment
 Supports Integrity and retention initiatives
 Lowers the barrier of software development
Target Application Architecture
Visualization & Analysis Tools
Sequencing LIMS
Workflow
Analytics
microArray LIMS
Bio Catalog
Proteomics LIMS
Cellular LIMS
Sample
Management
Biological
Inventory
Metabolomics LIMS
Experimental
&
Analysis
Results
Reference Biology
(Sequences, Genes, Proteins, Biological Networks)
Data Integration
Study Design &
Sample ID
Generation
Request
Management
Data
Marts
Biology Target Architecture Goals
 Access to all Biology applications made accessible
to vendors and partners by hosting in open
environment.
 Remove all major business workflow from
individual applications and implement workflow
control through business process management
with well-defined Service Data Objects.
 Look to standardize data formats and participate in
semantic GRID computing
Demonstrations of Workflow and Grid
usage at GSK
POC’s in the Biology Domain
Demonstrators in the portfolio space
 Workflow tools have been heavily used across both
the Biology and Chemistry space
– Some moving to production status
– Still some Challenges to overcome (discussed later)
 Primary use in process management and reporting
 Portfolio space has some novel and potentially
valuable opportunities.
– Using Workflow from the “Top down” has the
opportunity to cause integration across different
scientific disciplines
– Portfolio view will need to span the whole GSK space
as we move to the virtualised environment
Portfolio support using Workflow and
Semantic Grid
 Target is an overloaded term.
– Confusion over meaning and identification
– Controlled Vocabulary, Standards and
inappropriate identification has lead to a
complex mix of data types which are difficult
to traverse.
 A GSK “target” is actually a scientifically
generated concept linking a disease to a set
of entities and relationships needing to
be tested within models.
Concept Capture
“Gene” Data
“Disease” Data
“Compound” Data
Concept
Concept ID
Sequence
db
Sequence
ID
Reagent
db
Reagent
ID
Assay
db
Assay
ID
Assay
Results
Target Management
 Once captured as a network it is possible to query and
search across “Target space” in a far more flexible and
intuitive way.
 Standards can be enforced and aligned with GSK
requirements.
 Automated analysis can be used to answer numerous
questions about the inferences and results generated
within a portfolio program.
 Using services generated in the Biology and
Chemistry space it is possible to annotate the portfolio
in new and novel ways. And answer fundamental
questions around portfolio progression.
 This process uses simple independent services on top
of available GSK data and is reusable
Drug Discovery Process (again)
GSK is a research based pharmaceutical company – new
“Concepts” for R&D pipeline
Gene
ID
Disease
Association
Request
System
Target
Selection
Screening
Inventory
System
Preclinical
Candidate
Hit to
Lead
PoC,
File
Clinical Launch
QC
System
Delivery
System
email
phone
phone
X
phone
X
1. Workflow in reagent tracking POC
 Using Workflow to manage reagent processing
 Numerous questions can be asked:
–
–
–
–
–
–
Where is / how many Target(s) ?
Programs using a molecular target
Availability of reagents for a program
Lead series generated from a program
Mapping status of bio-reagents to Molecular Target
Which Bio-reagents used within a specific screen
 Very process driven approach focussed on GSK
data.
Target Identification
Assay Development
Gene Info Query
Assay Info Query
Chemical
Query
Bioreagent
Info Query
ProtA
Compound 123
ProtB
Multimeric protein
SoC
Workflow challenges for Production
Status
 Service availability
– Using services in this manner requires a
high availability of services
 Service Orchestration
– Need to ensure that once started a workflow
runs to completion
 Service support
– Can this be managed internally and
externally
2. Semantic Grid for Portfolio POC
 Create an environment to query “concept” data
across GSK and externally
 Access novel algorithms / analysis for use on
concept Networks to expand knowledge and
identify project “risks”.
– Data expansion (e.g. platform analysis)
 Used in Validation / Reagent generation
– Network expansion (looking for connections)
 Used by Discovery groups / Informatics
Assay Validation Analysis
Assay Development
Target Identification
Bioreagent
Analysis
Transcript
Analysis
GRID
“Concept
Novelty”
TransA_1
GeneA
TransA_2
GSK00000
X
TransA_3
Assay may fail to identify
this variant.
Business Value ??
 Methodology opens up Target space for broader analysis
– Supports MDR requirements for target tracking and analysis
– Linkage of Biology and Chemistry domains for better tracking
– Drives standards and vocabulary down to the business
– Removes need to redundantly identify “Molecular Target”
throughout process.
 Methods can be used to spot issues with programs and allow “kill
early” decisions
– Transcript analysis and assays
– Bio-reagent availability (inc. AB’s / Crystallography etc.)
 Methods and services can be reused across organisation outside
of portfolio area
GRID Challenges in Industry
 Service Productionisation
– Availability, Security, Granularity
– Relationship management
 Define the negotiation paradigm
 Licensed service model
 Implement local scoped catalogue
– Service Orchestration
 How to manage skews in Ontologies & technology
– Vendors
– Academia
– Knowledge Engineers ??
 GSK is looking at developing standards
One such group is SIMDAT
 Objectives:
– Develop federated versions of problem solving
environments
– Support of distributed product and process
development
– Test and enhance grid technology for access to
distributed data bases
– Tools for semantic transformation between these
data bases
– Grid support for knowledge discovery
– Promote defacto standards
– Raise awareness in important industrial sectors
What is SIMDAT ?
SIMDAT - Overview
Four sectors of
international economic
importance:
Automotive
Pharmaceutical
Aerospace
Meteorology
Seven Grid-technology
development areas:
Grid infrastructure
Distributed Data Access
VO Administration
Workflows
Ontologies
Analysis Services
Knowledge Services
The solution of industrially relevant complex problems
using data-centric Grid technology.
Drivers for joining SIMDAT
 SIMDAT provides a platform to look at all the following
requirements and develop novel strategies to deliver them:
– IT/IX tasked with delivering cutting edge analysis and data
management systems
– Need to develop an environment to drive integration and
introduce flexibility.
– Buy not build when technology or tools are available and
appropriate
– Lower “Activation Energy” around external collaborations
– Look to Academia for “cutting edge” research
– Look to Vendors for solid support / continuity & Value
– Ensure Pharma discovery process continues as a production
system with inbuilt flexibility
– Look for opportunities to Virtualise GSK infrastructure
– Support the needs for Of shoring and Outsourcing
GSK architectural Direction
SIMDAT
Applications
Solution
Vendors
Security
PSTUD.
B*
Service
Enactment
GATE
Gene
GSK
Sample
Microarray
??
Payment
Continuity
Value
Relationships
??
??
Other Pharma /
Biotech
First B2B Scenario
GSK GnG database
Integrate and Output
Inpharmatica via GRIA
SIMDAT gives us
 Framework for Service Consumption
– Specialised Software Granularity
– Data and Algorithm Services
– Accounting and Consumption Model
– Stratified Information
– Open Service Market
– Choreography to Orchestration
 Relationship Management
– Specialised Services
– Focused Licensing
– Zero Setup
– Negotiated Catalogue
– Dynamic Fee Dimensioning
Implemented Components of
the B2B Scenario
Licensed
Services
Job execution
Annotate project portal
User
Workflow
Semantic
Broker
Knowledge
Portal
InforSense
KDE
Updating SB
E2E
Job execution
Job execution
E2E
BIOCLIP Service
GRIA
wrapper
Inpharmatica
GRIA
Job execution
GRIA
wrapper
Chemisry Services
Service Updater
Final Scenario; B2A
Industrial Service
Provider
Web Portal
GRIA layer
Internet
Local
applications
Academic Service
Provider
Academic Consumer
Web Portal
GRIA layer
Local
applications
In Conclusion
 Virtualisation & Globalisation are key
objectives for GSK
 IT must move to support these initiatives
 Workflow and semantic directions can
support this goal but MUST move into
production
 Consolidation of the various efforts in Grid
and Ontology would make this move
simpler.
Acknowledgements
MDR-IT
 Richard Ashe
 Simon Dear
 Mike Moore
 John Armstrong
 Bart Ailey
 Thank you !
Biology technical architect
Biology information architect
Biology POC
Biology Business Analysis
SIMDAT Contractor