Why an NMR Data Model?

Download Report

Transcript Why an NMR Data Model?

Summary
•
•
•
•
What is CCPN?
What approach are we taking and why?
What are (some of) the technical details?
Software team
– Cambridge (Rasmus Fogh, Tim Stevens)
– EBI (Wim Vranken, Anne Pajon, John Ionides)
CCPN Project
• Modelled on CCPx (N = NMR)
• Main goals:
–
–
–
–
Disseminate best practise
Standards for software
Development of software
Repository for third party software
NMR Software
• Problem
–
–
–
–
Heterogeneous collection of developers
Lots of stand-alone programs
Lots of proprietary data formats
Lots of conversion scripts
• Solution
– Standards (production to deposition)
– Libraries (open, modular, …)
Data Format vs. Model
• Data format (syntax: how data is stored)
–
–
–
–
–
STAR
XML
SQL
Tab-separated ascii
Python pickle file
• Data model (semantics: what data means)
– BMRB NMRStar
– RCSB mmCIF
– XML DTD or schemas
NMR Community Consensus
• Data model rather than data format
– Format independent
– Language independent
– Science (descriptive)
• API to manipulate data model in memory
– Creation and manipulation of objects
– One for each language
– Bookkeeping
• I/O modules to load/store data from/to disk
– One for each (storage format, language)
– Bookkeeping
Application View
User
GUI
Application1
Application2
Application3
API
In Memory Representation
(Python, Java, C++, C)
I/O
Data Store
(XML, SQL)
Model Driven Architecture
• UML: Unified Modelling Language
– Abstract representation of semantics
– Pictorial
• Mapping from UML: to anything
– Multi-language
– Multi-format
– Architecture neutral (e.g. distributed or not)
• Power: good and bad
• CCPN uses Object Domain as its UML tool
– Python as scripting language
UML Example
CCPN UML
• CCPN only uses part of UML (logical model)
• Roughly equivalent to MOF subset of UML
• Influenced by XML
– Parent classes (but not all links are parent-child)
• Influenced by SQL
– Keys
• Documentation
– Auto generated HTML (from text in UML)
• Modular (packages)
• Main focus is NMR but entire architecture is
independent of NMR
Methodology Summary
• MetaModel: classes for defining semantics
– E.g. MetaClass, MetaAttribute, MetaRole
• Model: instantiation of MetaModel classes
– E.g. (Meta)Experiment
• API: classes which define semantics
– E.g. Experiment
• Developer: instantiation of API classes
UML to API
• Stage 1: UML to CCPN Model
– Currently script dependent on UML program
– Eventually move to XMI
• Stage 2: CCPN Model to CCPN API
– Script independent of UML program
• Most developers work independently of
UML
CCPN MetaModel
• Method of describing model semantics
• Implemented as Python classes
– Could do same in Java, …
– Independent of end language
– Independent of actual model
• Hand coded
• Currently around 12 classes, 3000 lines
– MetaPackage, MetaClass, MetaAttribute, …
CCPN Model
• Instantiation of MetaModel
– Creation of MetaClass objects, …
•
•
•
•
Might disappear with introduction of XMI
Auto generated from UML
Currently over 300 classes (2300 metaobjects)
NMR main focus so far
– Being worked on: protein production
– In future: X-ray, …
– Shared packages: Molecule, Coordinates, etc.
CCPN API
• Auto generated from CCPN Model
• Classes for developers
–
–
–
–
Mainly getters and setters
More than just code stubs
Constraints (e.g. cardinality) enforced
Links the hard part
• Mostly (> 99%) auto generated from UML
– Some helper functions and constraints hand coded
• Currently around 270000 lines in Python and
600000 lines in Java
Python API Use
p = Project(name=‘my project’)
e = Experiment(p, name=‘my experiment’, numDim=2)
Experiment(p, name=‘another experiment’, numDim=3)
print e.project.name
print p.experiments[1].name
print len(p.experiments)
expt = p.findFirstExperiment(name=‘another experiment’)
if (expt):
print expt.name, expt.numDim
expts = p.findAllExperiments(numDim=3)
for expt in expts:
print expt.name
Developer Benefits
•
•
•
•
Specified (in-memory) data model
No I/O code
Concentrate on science, not bookkeeping
Extendible
– Application data can be assigned to any object
– UML model can be extended (packages)
• Notifiers
– Register interest when specified attribute changes
(class, not object, level)
• Undo/Redo (in future)
Current CCPN State
• API releases
– b release of Python API in May 2003
– a release of Java API in December 2003
– C/C++ API next (probably 2005)
• Storage formats
– XML with Python and Java
– SQL with Java in April 2004, Python later
• Applications
– Conversion scripts for legacy NMR data
– Graphical NMR assignment program
• Website: http://www.ccpn.ac.uk
CCPN Acknowledgements
• Institut Pasteur (M. Nilges, J. Linge, M. Habeck, W.
Rieping)
• Utrecht (R. Kaptein, A. Bonvin, A. Nederveen)
• Nijmegen (G. Vriend, C. Spronk, S. Nabuurs)
• RCSB (H. Berman, J. Westbrook)
• BMRB (J. Markley, E. Ulrich, J. Doreleijers)
• CCPN Executive Committee (E. Laue, P. Driscoll, C.-W.
Chung, C. Redfield, B. Smith, M. Williamson, D. Harding)
• Funding: BBSRC and EU (NMRQUAL and TEMBLOR)
Salutary Quotation
Victoria Livschitz (IT architect, Sun):
“We now have a generation of young programmers who think
of software in terms of angle brackets. An enormous mess
of XML documents that are now being created by
enterprises at an alarming rate will be haunting our
industry for decades. With all that excitement, no one
seems to have the slightest interest in basic computer
science.”
(online interview, February 2004)