pims - European Bioinformatics Institute

Download Report

Transcript pims - European Bioinformatics Institute

PIMS
data management and harvesting
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Information Management System
■ Information Management System (IMS) is a joint
database and information management system
■ A database management system (DBMS) is a
system, usually automated and computerized, for
the management of any collection of compatible,
and ideally normalized, data
■ Information management is the handling of
knowledge acquired by many disparate sources in
a way that optimizes access by all who have a
share in that knowledge
Scientific goals
■ Recording laboratory information
■ A lot of data keeping
■ 10,000s of experiments
■ 1,000,000s of samples
■ Data interchange and interoperation
■ Collaboration in protein production
■ Share data between stages and sites
■ Data transfer to beamline or NMR ops
■ Data mining and reporting
■ Analysis
■ Negative results can be mined to improve methods
■ Scientific publications
■ Data deposition
PIMS
■ Protein Information Management System
■ Started in January 2005
■ 5 years UK project, funded by the
Biotechnology and Biological Sciences
Research Council (BBSRC)
■ Based on the Protein Production Data
Model paper
■ Proteins. 2005 Feb 1;58(2):278-84. “Design of
a data model for developing laboratory
information management and analysis systems
for protein production.”
Target
selection
Target
optimisation
Bioinformatics
Scope of PIMS
import
Expression
Purification &
Concentration
Crystallisation
Microcrystals
Molecular Biology
export
Data collection
Phasing
Model building
Refinement
Crystallography
Cloning
Stakeholders
■
BBSRC SPoRT funding
■
Scottish Structural Proteomics
Facility (SSPF)
■ Universities of Dundee, St.
Andrews, Glasgow and Warwick.
■
Membrane Protein Structure
Initiative (MPSI)
■ Universities of Glasgow, Leeds,
Oxford, Sheffield, Imperial
College, Birkbeck College,
UMIST and CCLRC Daresbury.
■
Protein Information Management
System (PIMS)
BBSRC funding
PIMS
SSPF
MPSI
■
■
■
■
■
■
■
■
CCP4, Diamond
Oxford Protein Production Facility
IBBMC, University Paris Sud
European Bioinformatics Institute
York Structural Biology
Laboratory
Daresbury Laboratory
Other UK protein scientists
Other protein scientists worldwide
Collaborations
■ Seamless data transfer and a consistent UI ...
■ ... from target to structure deposition
■ ... so far as possible
■ Bioinformatics: SSPF pipeline, EBI workflow
■ Crystallization: NKI, EMBL Hamburg & Grenoble
(BIOXHIT)
■ Data transfer: e-HTPX
■ Data collection: DNA, X-track
■ Structure solution: CCP4, CCPN
■ Instruments: Kendro, Csols
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Design
■ The data model
■ focuses on what data should be stored
■ is used to design the entities (classes or tables)
that we are dealing with, their various
attributes, and their relationships
■ The goal of the data model is to make sure
that the all data objects required are
completely and accurately represented
Reliability
■
■
■
■
Loss of data is inexcusable
Must be able to correct wrong data
Must keep audit trails
Must allow future changes
■ All made feasible by
■ Data model
■ Database
■ Software engineering standards
Ancestry
■ HalX: an open-source
LIMS (Laboratory
Information Management
System) for small- to largescale laboratories.
■ OPPF based on Nautilus
■ MOLE: a data
management application
based on a protein
production data model.
■ Acta Crystallogr D Biol
Crystallogr. 2005
Jun;61(Pt 6):671-8.
■ Prilusky J, Oueillet E,
Ulryck N, Pajon A,
Bernauer J, Krimm I,
Quevillon-Cheruel S,
Leulliot N, Graille M, Liger
D, Tresaugues L,
Sussman JL, Janin J, van
Tilbeurgh H, Poupon A.
■ Proteins. 2005 Feb
1;58(2):285-9.
■ Morris C, Wood P,
Griffiths SL, Wilson KS,
Ashton AW.
PIMS
■ The aim is to provide a Laboratory Information
Management System (LIMS)
■ for Laboratories that produce proteins from target genes
■ can be incorporated into commercial software in the area
of biotech and protein production
■ Improve the quality of the experimental data
deposited into PDB
■ by providing a software for lab scientists to harvest their
daily experimental data from protein production to
structure
■ My roles
■ Data Model
■ Database / Persistence layer / Java API
■ Java Applet development
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Why is Data Modelling Important?
■ A Data Model is a plan
for building a
database
■ detailed enough to be
used to create the
physical structure
■ simple enough to
communicate to the
end user the data
structure
■ The Unified Modelling
Language (UML)
Data Model
■ Related to protein production & crystallisation
■ Suitable for large & small facilities
■ Required to reproduce the samples & experiments
involved
■ Used for tracking samples, experiments & results
■ Developed to help software developers to collect,
store and exchange information through the
provision of a common platform
Area covered
■ Protein production work is
generally the investigation
of a particular protein, the
Target
■ The work often aims to
produce a derivative of the
Target, such as a single
domain or complexes
target
protein production
crystallisation
NMR tube
X-Ray
NMR
phasing
structure
The Core Data Model
Change Control Board
■
■
■
■
The data model is a work in progress
The science is developing too
Local protocols, which are novel and confidential
Not easy work
■ Thanks to…
■
■
■
■
■
■
Geoff Barton (Dundee)
Steve Prince (Manchester)
Anne Poupon (IBBMC)
Jon Diprose (OPPF)
Alun Ashton (Diamond)
Rasmus Fogh (CCPN)
Generation machinery
■ Implemented in UML
(Object Domain)
■ Developed within a
framework provided
by the CCPN project
■ Information stored in
the UML Data Model
is used to generate
automatically
■ SQL schema,
■ Java Application
Program Interfaces
(APIs) and
■ Documentation
UML
Data
Model
framework
XML
schema
Python
API
SQL
schema
Doc
www.ccpn.ac.uk
Java
API
Architecture
■ The API provides methods to access the
underlying DB to store and retrieve data
■ This allows applications to manipulate data without a
detailed knowledge of the way in which the data is stored
■ Various different applications make use of the API
■ LIMS
■ Any High Throughput applications (non-GUI)
■ They are able to exchange data easily
storage
SQL
schema
DB
API
Persistence
layer
Java
API
Tools: GUI, standalone
applications,…
From data model to application
■ Data Model
■ Use cases
■ Scientific logic into requirements
■ Specifications
■ security, performance, usability, etc
■
■
■
■
Java API
Test data
UI Design
Application
Modular Construction
■ http://www.pims-lims.org/project/use-case-suite.html
Training &
Support
Workflow
Reporting
Scheduling
Instrument
Management
System
Administration
Data Capture
Inventory
Management
Setup &
Configuration
Visualisation
Data Mining
Mobile Data
Collection
Sample
Management
Access Rights
Management
Bioinformatics
Project
Management
Reference Data
Reference data
■ Supplier details
■ Protocols
■ documenting set of editable default protocols
■ user interface design with Ed Daniel
■ Reagents
■ protocol-related reference samples
■ chemical hazard information
■ e.g. R and S-phrases
■ documenting lab chemicals as ‘MolComponents’
■ includes synonyms, formula, CAS-number and mass
■ naming system under discussion with NKI
■ ~400 identified, ~180 based on crystallisation screens
Instrument management
■ Analytical Data: A
Tower of Babel
■ Integration
.5
NMR
12 11 10 9 8 7 6 5 4 3 2 1 0
Parts Per Million
1
LC
2.834
1.244
.863
1.927
.389
■ CSols
1.5
2
Minutes
2.5
3
3.5
MS
0
20
40
60
80 100 120 140 160 180 200
Mass (m/z)
IR
4000 3500 3000 2500 2000 1500 1000
Wavenumber (cm-1)
■ produces a widely
used Instrument
Integration Package
■ if the PIMS I/O is
implemented in a
reasonable timescale
CSols may develop a
PIMS Driver
■ Kendro/Thermo
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
What can PIMS do for you?
Not a lot right now
Whatever you want, eventually ...
... as long as it's data management
for protein production
Version 0.2
■ October 2005
■ Then incremental delivery
■ … for one customer at a time and integrate with trunk
■ … and repeat until project complete
Protocol Editor
Applet Protocol Editor
■ Choose a step from a list
■ Draw Temperature step
■ List of the protocol's steps already done and reload them
from the bottom of the screen
■ Record the protocol in DB
■ Display the protocol's list from DB in the explorer and reload
anyone of them
Applet Workflow
■
■
■
■
Select in tabulation the experiment categories
Drag and drop the selected experiments
Build a workflow or load an existing one
Associate a protocol to an experiment
A collaborative framework
■ … to develop a family of LIMSes
■ Developers have difficulty in justifying the time
required to create the software needed
■ The biologist doesn't want to wait
■ The result is a rapidly written LIMS that is fragile
and cannot scale if the project grows up
■ Need a generic LIMS
■ helps to solve these problems by giving developers a
tool that can scale to meet the needs of a large project
■ And which welcome plugins for novel methods
Conclusion
■ Each “Click” could be a lot of coding ...
■ What do molecular biologists really want?
■ Expectations are High!
■ Users make an indispensable contribution
■ Tell us when it's not good enough ...
■ ... we will respond
Acknowledgements
■ PIMS developer group
■
■
■
■
■
■
■
■
■
■
■
■
Chris Morris (CCP4)
Anne Pajon (EBI)
Ed Daniel (Daresbury)
Peter Troshin (MPSI)
Jo van Niekerk (SSPF)
Susy Griffiths (YSBL)
Jon Diprose (OPPF)
Katherine Pilicheva
(OPPF)
Anne Poupon (IBBMC)
Eric Oeuillet (IBBMC)
Sabrina Haquin (IBBMC)
Alun Ashton (Diamond)
■ EBI-MSD
■ Kim Henrick
■ Wim Vranken
■ John Ionides
■ CCPN
■
■
■
■
Wayne Boucher
Rasmus Fogh
Tim Stevens
Dan