7.BioSimGrid

Download Report

Transcript 7.BioSimGrid

BioSimGRID and BioSimGRID ’lite’
-Towards
a worldwide repository for biomolecular
simulation
www.biosimgrid.org
Philip C Biggin
http://indigo1.biop.ox.ac.uk
[email protected]
Overview
• Introduction
- Motivation
- Consortium
- Case studies – added value from comparisons
• Design
- Architecture
- Data schema
• How to use
- Deposition
- Analysis
- Worldwide application
• The Future
- Towards computational systems biology
Current Paradigm for MD Simulations

Target selection: literature based; interesting protein/problem

System preparation: highly interactive; slow; idiosyncratic

Simulation: diversity of protocols

Analysis: highly interactive; slow; idiosyncratic

Dissemination: traditional – papers, posters, talks

Archival: ‘archive’ data … and then mislay the tape!

No third party involvement
bioinformatics
& structural
biology
Integrating Simulations and Structural Biology of Proteins
Novel structure
(RCSB)
Sequence alignment
Biomedically relevant
homologue(s)
Homology model(s)
BioSimGRID
MD simulations
bacterial K channel
mammalian K channel
Biomolecular simulation
database
dynamics in membrane
Comparative analysis
Interaction site dynamics
drug
discovery
Evaluation/refinement of model
Biological and pharmacological
simulation & modelling
e.g. drug discovery
drug docking calculations
Consortium
• Oxford: Mark Sansom, Paul Jeffreys, Bing
Wu, Kaihsu Tai
York
• Southampton: Jon Essex, Simon Cox, Stuart
Murdock, Muan Hong Ng, Hans Fogohr,
Steven Johnston
Nottingham
• London: David Moss
• Nottingham: Charlie Laughton
RAL
Oxford
• York: Leo Caves
• Bristol: Adrian Mulholland
Bristol
Southampton
London
Comparative Simulations: Drug Receptors

Why? – increase significance of results

Sampling – long simulations and multiple simulations

Sampling via biology – exploiting evolution

Biology emerges from comparisons…

e.g. mammalian receptor vs. bacterial binding protein
glutamate
D1



D2

Rat GluR2 EC fragment
Major receptor in mammalian
brains – drug target
MD simulations with/without
bound ligands
Analyse inter-domain motions
GluR2 – Flexibility & Gating…
Kainate
empty
Glutamate
>
>>
“ON”
“OFF”
4
empty

RMSD (Å)
3

+Glu
2

1
0

+Kai
0
0.5
1.0
time (ns)
1.5
2.0
Flexibility depends on ligand
occupancy & species
Gating mechanism – decrease in
flexibility on channel activation
But … incomplete sampling
Need: longer simulations &
comparative simulations
GlnBP – A Bacterial Binding Protein
X-ray structures
MD Simulation
empty
+ Gln
Gln bound
empty
Gln bound

GlnBP – bacterial 2-domain periplasmic binding protein

Similar fold to mammalian GluR2

X-ray shows ligand binding induces domain closure

MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations
Case Study 2..
OMPLA
AChE
Acetylcholinesterase Outer-membrane phospholipase
So how do compare…
 Similar active sites or similar motions
 Different structures
 Simulated with different MD packages (analysis difficult if not visualization)
 On different hard drives/tapes/CDs/DVDs.
 Under different graduate students’ desks
 Under different postdocs’ beds
 In different rubbish bins!
Answer…
Create a wordwide repository of molecular simulations…. 
BioSimGrid = BioSimDB + Toolkits + Integration
BioSimGrid Architecture…
GUI
Web Application
Python Application
HTTP(S)
SSH
Apache / Tomcat / SSL / Python
Service
Authentication Authorisation Accounting
Data
Retrieval
Tool
Data
Deposition
Tool
Analysis
HTML
Generator Tool
Trajectory
Query Tool
Video/Img SQL
Engine
Editor
TCP/IP
Middleware
BioSim Data Engine / Storage Resource Broker
TCP/IP
DB/Data
Database
Flat Files
DB
Flat File
Size/GB
7.5
3.0
Random Access /s
560.8
18.6
Sequential Access
389.0
5.5
Cross-software Analysis…
•
BioSimDB = PDB (or NDB) for MD
 enable discovery of new science (cf. genomics/proteomic
initiatives)
CHARMM
AMBER
GROMACS
NAMD
LAMMPS
BioSimDB
TINKER
It’s a Distributed Database
 Nobody has enough disk space in one place anyway
 Distributed and duplicate
 Any piece of information is stored in at least two sites
 …for resilience
Current Architecture
oxford.biosimgrid.org
BioSim Data Engine Services
IDA
MCAT
DB Interface
DB Engine
Database
SRB
Server
SRB
Agent
F/F Interface
F/F Engine
Flat
Files
Cache
soton.biosimgrid.org
BioSim Data Engine Services
SRB
Agent
SRB
Server
F/F Interface
F/F Engine
Flat
Files
Cache
MCAT
IDA
DB Interface
DB Engine
Database
Data Schema
 The hierachy is like that in the PDB:
 Chain  residue  atom  coordinate
 …but also extended in the time dimension: frames
Metadata..
 …is the data about data
 MD setup, parameters, instantaneous properties, etc.
 People currently write this in papers
 People forget something
 The disciplined way: …structured schema
Deposition…
Unified deposition for trajectories from any packages.
Analysis
BioSimDB Toolkit
•
Analysis tools
 Radius of Gyration
 Surface and Volume
 RMSD/RMSF
 Centre of Mass
 Inter-atomic distances
 Distance matrix
 Internal angles
 Principal Component Analysis
 Average structure
Current Implementation
New workflow with BioSimGrid

Target selection: literature based; interesting protein/problem

Perform simulation (or use someone else’s)

Protocals more systematically recorded/checked/confirmed

Archive data to BioSimGrid

Analyse shared data (either locally or distributed)

Dissemination: traditional – papers, posters, talks

Store results in BioSimGrid

Third parties can analyse data you deposit
That’s dandy - but who is this aimed at?
• Novice and Expert..
 Novice (web/GUI)
 Makes selections
 Guided through the options
 Can only do specific things
 Difficult to make mistakes
 Expert (employ scripting)
 Python interpreter
 Much available
 Reasonably unrestricted
Example sessions
Example sessions
Example sessions
Example sessions
Example sessions
Example sessions
Example sessions
Example sessions
Even in script mode the syntax is quite informative:-
FC = FrameCollection(`2, 100-200`)
myRMSD = RMSD(FC)
myRMSD.createPNG()
 Provide biochemists with little computational experience a means of
analysing computational data and obtain meaningful results.
Example sessions
Viewlet of a session; Demo4.html
BioSimGrid ‘Lite’
 Light version before final rollout
 Provides equilibrated lipid bilayer boxes
 Also provides ontogeny: How the box came about…
 …metadata
 …equilibration process (all the frames)
Deliverables to Date…
•
Database schema
•
Sample database (with test trajectories)
•
Prototype shared between 2 sites
•
Analysis tools – preliminary versions (about 14 tools)
•
Interface to database for data retrieval
•
Python hosting environment
Roadmap

Dec 2002 – project started

July 2003 – (internal) prototype

September 2003 – working prototype (All Hands meeting)

November 2003 – test ‘real world’ applications

December 2003 – multi-site prototype

2004 – multi-site deposition of data

2005 – open up to additional groups for deposition/testing
If you are interested…
The team would like to hear from interested parties especially with
new ideas etc
 Benefits to you




New directions are implemented
Toolkit suits your needs
Shared development of code
Faster and more thorough development
 BioSimGrid Benefits
 Larger user community
 More work gets done
 Code is efficient.
 BioSimGrid and community is successful
Future Directions in the GRID context
1.
HTMD – simulations coupled to structural
genomics

2.
3.
Diamond light source
Computational system biology – virtual
outer membrane

HPCx
Multiscale biomolecular simulations –
from QM/MM to meso-scale modelling

GRID-enabled simulations
BioSimGrid
Structural Genomics & HTMD
synchrotron
compute GRID
MD database
novel biology…

Overall vision – simulation as an integral component of structural genomics

Needs capacity computation – GRID?

MD database (distributed) – BioSimGRID
Towards a Virtual Outer Membrane (vOM)
Pi
TolC
OMPLA
OmpT
PiBP
TonB
FhuD
FhuA
OmpX
PhoE
OpcA
OmpF
OmpA
LamB
MalE
d+
Pi

First step towards computational systems biology – a suitable system

Bacterial OMs – 5 or 6 proteins = 90% of protein content

Structures or good homology models of proteins are available

Complex lipid – outer leaflet is lipopolysaccharide (LPS)

Minimum system size ca. 2.5x106 atoms; simulation times ca. 50 ns

cf. current FhuA – 80,000 atoms & 10 ns – need HPCx
Multiscale Biomolecular Simulations
QM (Bristol)
Drug-binding (Southampton)
Protein Motions (Oxford)
Drug Diffusion (London)

Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants,
endocannabinoids)

Complex multi-scale problem: QM/MM; ligand binding; membrane/protein
fluctuations; diffusive motion of substrates/drugs in multiple phases

Need for GRID-based integrated simulations
References…
1.
2.
K. Tai, S. Murdock, B.Wu, MH Ng, S. Johnston, H. Fangohr, S. Cox, P
Jeffreys, J. Essex, M.S.P. Sansom. Org. Biomol. Chem :: Under review
MH Ng, S. Johnston, S. Murdock, B. Wu, K. Tai, H. fangohr, S. Cox, J.
Essex, M.S.P. Sansom, P.Jeffrey.
UK E-Science Programme All Hands Meeting 2004 :: Accepted.
3. Python Website – www.python.org
4. BioSimGrid – www.biosimgrid.org
Acknowledgements
Oxford
Professor Mark Sansom
Dr Carmen Domene
Dr Alessandro Grottesi
Dr Andrew Hung
Dr Daniele Bemporad
Dr Shozeb Haider
Dr Kaihsu Tai (curation and integration)
Dr George Patargias
Oliver Beckstein
Jennifer Johnston
Syma Khalid
Jorge Pikunic
Pete Bond
Zara Sands
Jonathan Cuthbertson
Sundeep Deol
Jeff Campbell
Yalini Pathy
Loredana Vaccaro
Shiva Amiri
Katherine Cox
Robert d’Rozario
John Holyoake
Samantha Kaye
Anthony Ivetac
Sylvanna Ho
BBSRC
EC (TMR)
MRC
Oxford e-Science Center
Professor Paul Jeffreys
Dr Bing Wu (database management)
Matthew Dovey
Ivaylo Kostadinov
Southampton
Dr Stuart Murdock (generic analysis tools)
Dr Muan Hong Ng (data retrieval)
Dr Hans Fangohr
Steven Johnston
Prof Simon Cox
Dr Jon Essex
Elsewhere
Leo Caves (York)
Charles Laughton (Nottingham)
David Moss (Birkbeck)
Oliver Smart (Birmingham)
Adrian Mulholland (Bristol)
Marc Baaden (Paris)
DTI
The Wellcome Trust
OeSC (EPSRC & DTI) EPSRC
GSK
OSC (JIF)
More information…
[email protected]
www.biosimgrid.org