e-Scientific background

Download Report

Transcript e-Scientific background

Integrative Biology
exploiting e-Science to combat fatal diseases
Damian Mac Randal
CCLRC
1
Overview of Talk
• Project background
• The scientific challenge
• The e-Scientific challenge
• Proposed system
2
Scientific background
• Breakthroughs in biotechnology and IT have provided a wealth
(mountain) of biological data
• Key post-genomic challenge is to transform this data into information
that can be used to determine biological function
• Biological function arises from complex non-linear interactions between
biological processes occurring over multiple spatial and temporal
scales
• Gaining an understanding of these processes is only possible via an
iterative interplay between experimental data (in vivo and in vitro),
mathematical modelling, and HPC-enabled simulation
3
e-Scientific background
• Majority of the first round of UK e-Science Projects focused
primarily on data intensive applications (Data storage,
aggregation, and synthesis)
• Life Sciences projects focused on supporting the data
generation work of laboratory-based scientists
• In other scientific domains, projects such as RealityGrid,
GEODISE, and gViz began to consider compute-intensive
applications.
4
The Science and e-Science
Challenge
• To build an Integrative Biology Grid to support
applications scientists addressing the key postgenomic aim of determining biological function
• To use this Grid to begin to tackle the two chosen
Grand Challenge problems: the in-silico modelling
of heart failure and of cancer.
– Why these two? together they cause 61% of all deaths in the UK
5
Courtesy of Peter Kohl (Physiology, Oxford)
Normal beating
Fibrillation
6
Multiscale modelling
of the heart
MRI image of a
beating heart
Fibre orientation ensures
correct spread of
excitation
Contraction of
individual cells
Current flow through ion
channels
7
Heart modelling
• Typically solving coupled systems of PDEs (tissue level)
and non-linear ODE’s (cellular level) for the electrical
potential
• Complex three-dimensional geometries
• Anisotropic
• Up to 60 variables
• FEM and FD approaches
8
Details of test-run of Auckland
heart simulation code on HPCx
•
Modelled 2ms of electrophysiological excitation of a 5700mm3 volume of tissue from the
left ventricular free wall
•
Noble 98 cell model used
•
Mesh contained 20,886 bilinear elements (spatial resolution 0.6mm)
•
0.05ms timestep (40 timesteps in total)
•
Required 978s CPU on 8 processors and 2.5 Gbytes of memory
•
A complete simulation of the ventricular myocardium would require up to 30 times the
volume and at least 100 times the duration
•
Estimated max compute time to investigate arrhythmia ~107s (~100 days) requiring
~100Gb of memory (compute time scales to the power ~5/3)
•
At high efficiency this scales to approximately 1 day on HPCx
9
Multiscale modelling
of cancer
10
Cancer modelling
• Focusing on avascular tumours
• Current models range from discrete population-based
models and cellular automata, to non-linear ode systems
and complex systems of non-linear PDEs
• Key goal is the coupling (where necessary) of these
models into an integrated system which can be used to
gain insight into experimental findings, to help design new
experiments, and ultimately to test novel approaches to
cancer detection, and new drugs and treatment regimes
11
Summary of the scientific challenge
Modelling and coupling phenomena which occur on many different
length and time scales
• 1m
• 1mm
• 1mm
• 1nm
Range = 109
person
tissue morphology
cell function
pore diameter of a membrane protein
• 109 s (years)
• 107 s (months)
• 106 s (days)
• 103 s (hours)
•1s
• 1 ms
• 1 ms
Range = 1015
human lifetime
cancer development
protein turnover
digest food
heart beat
ion channel gating
Brownian motion
12
The e-Science Challenge
•
To leverage the first round of e-Science projects and the
global Grid infrastructure to build an international
“collaboratory” which places the applications scientist
“within” the Grid allowing fully integrated and
collaborative use of:
–
–
–
–
–
HPC resources (capacity and capability)
Computational steering, performance control and visualisation
Storage and data-mining of very large data sets
Easy incorporation of experimental data
User- and science-friendly access
=> Predictive in-silico models to guide experiment and, ultimately,
design of novel drugs and treatment regimes
13
Key e-Science Deliverables
•
A robust and fault-tolerant infrastructure to
support post-genomic research in
integrative biology that is user and
application driven
•
2nd Generation Grid bringing together
components across range of current
EPSRC pilot projects
14
e-Science/Grid Research Issues
– Ability to carry out reliably and resiliently large scale distributed
coupled HPC simulations
– Ability to co-schedule Grid resources based on a GGF-agreed
standard
– Secure data management and access-control in a Grid environment
– Grid services for computational steering conforming to an agreed
GGF standard
– Development of powerful visualisation and computational steering
capabilities for complex models
• Contributing projects:
– RealityGrid, gViz, Geodise, myGrid, BioSimGrid, eDiaMoND, GOLD,
various CCLRC projects, ….
15
Service oriented Architecture
• The user-accessible services will initially be grouped into
four main categories:
– Job management
• including deployment, co-scheduling and workflow management
across heterogeneous resources
– Computational steering
• both interactive for simulation monitoring/control and pre-defined for
parameter space searching
– Data management
• from straightforward data handling and storage of results to location
and assimilation of experimental data for model development and
validation
– Analysis and visualization
• final results, interim state, parameter spaces, etc, for steering
purposes
16
“Strawman Architecture”
model
model library
library
CellML?
CellML?
Resource
Resource
Directory
Directory
simulation
simulation
library
library
External Resources
deploy
deploy
lookup
lookup
lookup
lookup
IB
IB
Solver
Solver
Job
Job
Composition
Composition
Simulation
Simulation Control
Control
Security
Security
retrieve
retrieve
code
code
lookup
lookup
Job
Job
Submission
Submission
lookup
lookup
commands/
commands/
feedback
feedback
register
register
setup
setup
job
job
steering
steering
parameters
parameters
Job
Job
Directory
Directory
Simulation
Engine
Solver
Solver
Model
Model
Data
Data
monitoring
monitoring
state
state
retrieve
retrieve
metadata
metadata
IB
Server
State
State
lookup
lookup
Browser
Browser
cmds/
cmds/
f’back
f’back
load
load
portal
portal
Portlet
Portlet
Server
Server
cmds/
cmds/
f’back
f’back
Data
Data Control
Control
data
data
setup
setup
Computational
Computational
Steering
Steering
Coupled
Coupled Solver
Solver
Details
Details as
as above
above
Data
Data storage
storage
commands/
commands/
feedback
feedback
Data
Data
Management
Management
Database
Databasemetadata
metadata
lookup
lookup
Data
Management
SRB?
SRB?
Database
Database
Data
Data
“Mining”
“Mining”
Local
Local
display
display
results
results
data
data
interim
interim
results
results
commands/
commands/
feedback
feedback
Visualization
Visualization Pipeline
Pipeline
Visualization
Visualization
Control
Control
remote
remote
displays
displays
Visualization
Filter
Filter
Map
Map
Map
Map
17
Software architecture
• Underpinning development of the architecture are three fundamental
considerations:
– standardization, scalability and security
• Initially, Web service technology is being used for interactions between
the system components
• Many of the underlying components are being adopted from previous
projects, and adapted if necessary, in collaboration with their original
developers
• Portal/portlet technology, integrated with the user's desktop
environment, will provide users with a lightweight interface to the
operational services
• The data management facilities are being built using Storage
Resource Broker technology to provide a robust and scalable data
infrastructure
• Security is being organized around “Virtual Organizations” to mirror
existing collaborations
• A “rapid prototyping” development methodology is being adopted
18
Demonstrators
• Objectives:
– Immediate boost in size/complexity of problems scientists can
tackle
– Validation of the Architecture
– Learning exercise, exploring new technology
– Introduce scientists to potential of advanced IT, so they can better
specify requirements
• 4 Demonstrators, chosen for diversity
19
Demonstrators
• Implementation of GEODISE job submission middleware (via
MATLAB) using the Oxford JISC cluster on the NGS. (A simple
cellular model of nerve excitation)
• MPI implementation of Jon Whiteley and Prasanna
Pathmanathan’s soft tissue deformation code (for use in image
analysis for breast disease). (FEM code, non-linear elasticity)
• MPI implementation of Alan Garny’s 3D model of the SAN
incorporating the ReG Steering Library (FD code for non-linear
reaction-diffusion (anisotropic) plus an XML-based parser for
cellular model definition)
• CMISS modelling environment for complex bioengineering
problems - Peter Hunter, Auckland, NZ Production quality
FE/BE library plus front/back ends)
20
Resources
• Project manager, project architect, 7.5 post-docs and 6 PhD students
broken down into three main teams
• Heart modelling and HPC: 1.5 post-docs, 2 PhD students in Oxford,
0.5 post-doc at UCL. Led by Denis Noble and myself.
• Cancer Modelling: 1 senior post-doc and 2 PhD students in
Nottingham, 1 post-doc and 1 PhD student in Oxford, 1 PhD student
in Birmingham. Led by Helen Byrne in Nottingham. (Several further
PhD students have also been funded from other sources)
• Interactive services and Grid team: Project architect plus 2 post-docs
at CCLRC, 1 post-doc in Leeds, 0.5 post-doc at UCL
• Note: well over half of the effort is dedicated to the science
21
Current Status
•
Official project start date 1/2/04, recruitment of staff now complete
•
Initial project structure defined and agreed, initial requirements gathering and
security policy exercises completed, initial architecture agreed
•
Heart-modelling and cancer modelling workshops held in Oxford in June, with
talks by user communities
•
Cancer modelling meeting with all users in Oxford in July
•
Full IB workshop with all stakeholders in Oxford, 29th September
•
Survey of capabilities of existing middleware under way (thanks to everyone
who has given us lots of their time)
•
Four demonstrators identified and development commenced
22
Summary
• Science-driven project that aims to build on
existing middleware to (begin to) prove the
benefits of Grid computing for complex systems
biology – i.e. to do some novel science
• Huge and increasing (initial) buy-in from the user
community
• Challenge is to develop sufficiently robust and
usable tools to maintain that interest.
23
24