Taverna in e-Lico

Download Report

Transcript Taverna in e-Lico

Taverna in e-Lico
 e-Lico is an EU Project (2009-2012) to create a virtual
laboratory for data mining and data-intensive sciences
 Main partners:
– University of Geneva – project coordination
– University of Manchester – Taverna and text mining
– Institut National de la Santé et de la Recherche Médicale
(Toulouse) – biological data
– Greece – image mining
– Rapid I – Rapid Miner/Analytics
– University of Zurich – planner
Who in Manchester




Robert Stevens
Simon Jupp
Rishi Ramgolam
James Eales

Alan Williams
What has been done so far?
 Populous – adaption of RightField. Used for data entry
and ontology creation.
 Taverna plugin to expose Rapid Miner operators as
Taverna services
 Taverna extension to create data mining plans that are
translated into workflows
 Rapid Analytics repository browser to allow choosing and
uploading of data, so getting metadata
 Skinning of myExperiment – done by Don Cruickshank
What will be done - 1?
 Saving of provenance and data and workflow – call it a
run.
 Uploading the run to e-Lico’s myExperiment
 Ensuring semantics so that the data is known to be the
inputs to port “fred” of the workflow. Even better,
provenance item X is for invocation 237 of service Y in
run 93 of workflow…
What will be done – 2?
 Re-opening of a run from myExperiment
 Integration of provenance from RapidMiner’s operator
with the workflow provenance
 Tweaking of the Rapid Miner and planning extensions
 Additional/improved renderers – desire for Cytoscape
 SPARQL service plugin
 Development and documentation of workflow patterns
What will be done – 3?
 Coping with “dominating operators”
 Training workshops
 Benchmarking and documentation of results
What will go back into Taverna






Plugin to expose Rapid Miner services
Plugin to allow planning of data mining
SPARQL service
Workflow run upload/browse/reload
Repository browser possibly as additional data source
New renderers
What is e-Lico doing that is affected by other projects
 The saving of a workflow run depends upon
– myExperiment having semantic pack capabilities
– Some stronger mechanism to specify relationships
(wf4ever?)
 Browsing/view of workflow run may leverage
– Polish MSc student’s work
 Integration of provenance
– Overlap with eScience Central
What is e-Lico doing that affects other projects
 e-Lico workflow run may be a good test case for wf4ever
 Renderers
– Do not know specific projects – should start list
 Data gathering mechanism
– Ditto
 Populous
– Highly related to RightField
– What about TavernaLC (Taverna in a spreadsheet)?
 SPARQL service
– Harmonization/relationship with SADI
How could eLico be of use
 Data mining services
– Use in data-oriented projects e.g. MethodBox or SysMO?
 Metadata about data
– MethodBox or SysMO?
 Planning of workflows
– Does it work in practice?