Delivering data mining to the Life Science Community

Download Report

Transcript Delivering data mining to the Life Science Community

e-LICO
An e-Laboratory for Interdisciplinary Collaborative
research in data mining and data intensive sciences
Delivering data mining to the Life Science Community
Simon Jupp
School of Computer Science
University of Manchester, United Kingdom
October 12th, 2010
e-LICO project overview
 Infrastructure to support collaborative, data mining
enabled experimental research
 Knowledge-driven planning of DM workflows
– Improve planning by meta-mining
 Support research in data-intensive, knowledge-rich
domains
– Systems biology use case
European Project
 European Project, 9 partners. (Month 20/36)
– Specialists from Data Mining, Semantic Web, Grid
computing and Systems Biology
• University of Manchester, UK
• University of Geneva, Switzerland
• Inserm, France
• Josef Stefan Institute, Slovenia
• NHRF, Greece
• Poznan University, Poland
• Rapid-I GmbH, Germany
• Ruder Boskovic Institute, Coratia
• University of Zurich, Switzerland
An EU-FP7 Collaborative Project (2009-2012)
Theme ICT-4.4: Intelligent Content and Semantics
Problems…

Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners

Capturing the workflow
– Explanation
– Error detection / Repair
– Reproducibility
– Provenance
Problems… and solutions (e-LICO planned workflows)

Steep learning curve

– Many operators to choose from
Develop “Intelligent Discovery Assistant”
(IDA) for Data Analysis
– Best combination of operators
– Automatically generate workflows by planning
– Hard for non Data Miners
– Assist the user in solving DM task
– Structure workflows in workflow templates
– Self improvement through Meta-Mining

Capturing the workflow

Ontology based data model
– Explanation
– Adds semantics
– Error Detection / Repair
– OWL/RDF based
– Reproducibility
– Data Mining Experiment Resository
– Provenance
The e-LICO workflow
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
Ontology based AI planner
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
Workflow planning
 Hierarchical Task Network (HTN) planning
 Set of Tasks to achieve possible Data Mining Goals
 Tasks have an I/O specification and set of associated Methods to
achieve that task
 Methods composed of simpler Task/Methods
 Some methods are Operators with Conditions and Effects
Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a
workflow that does this Evaluation via Cross-Validation
The Data Mining Worfkflow Ontology (DMWF)
Class
Description
Examples
IO Object
Input and output used by operators
Data, Model, Report
MetaData
Characteristics of the IOObjects
Attribute, AttributeType, DataColumn,
DataFormat
Operator
DM operators
DataTableProcessing, ModelProcessing,
Modeling, MethodEvaluation
Goal
A DM goal that the user could solve
DescriptiveModelling, PatternDiscovery,
PredictiveModelling, RetrievalByContent
Task
A task is used to achieve a goal
CleanMV, CategorialToScalar, DiscretizeAll,
PredictTarget
Methods
A method is used to solve a task
CategorialToScalarRecursive, CleanMVRecursive,
DiscretizeAllRecursive, DoPrediction
Workflow Planning
 AI Planner
 Brute force planning
 Probabilistic Planning
 What will likely produce better results?
 Case-based Planning
– How did we solved that previously?
 DMOP (Workflow optimization ontology)
– Algorithm and Model selection given a particular task
– Meta-mining by abstraction and generalisation
Meta-Mining

Initially, the AI planner recommends applicable DM workflows, not
necessarily good ones

Self-improves with experience through meta-mining

The meta-miner
– Applies DM techniques to meta-data from past DM experiments
– Extracts workflow patterns that are signatures of high predictive
performance

The planner uses these workflow patterns to design and recommend
promising workflows
Workflow Execution
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
e-LICO Kick-Off, Geneva
12
4/12/2016
Workflow Execution

All operators in ontology (+200) are exposed as SOAP or REST based Web
Service

Plans converted to Workflow execution language (SCUFL 2)

Provenance capture
– Execution times, intermediate model returned to planner
Taverna
Worflow Publishing and Sharing
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
e-LICO Kick-Off, Geneva
14
4/12/2016
Workflow Publishing and Sharing

Workflows and data can be shared via myExperiment

Build a community of data miners

Set of re-usable workflows, data and workflow templates (packs)
Use case – Obstructive nephropathy
 Demonstrated with System Biology Use Case
– Biomarker discovery and pathway modelling in the study of
chronic kidney disease
– KUP challenge initiated (August 2010)
Expression data
Text-mining / Image mining
Further wet lab
experiments
KUP KB
(RDF store)
New models
And hypothesis
Research Questions

How and when does a planner based “Intelligent Discovery Assistant” help
the end user?

Can we improve planning and suggest better workflows through metamining?

Can we plan complex workflows with Scientific Goals that answer biological
questions?
– KUP goal is to construct diagnostic models that accurately connect the biological
views to the severity of this pathology
Where are we nowAvailability
 http://wwww.e-lico.eu
 1st year demo –
http://www.youtube.com/watch?v=JtmqZfzyEKs
 eProPlan plugin for Protégé 4.0
 Ontologies available
 Taverna
http://www.taverna.org.uk
 RapidMiner
http://rapid-i.com
Summary

e-LICO: virtual laboratory for interdisciplinary collaborative research in
data-mining

Ontology based AI planning of KDD workflows

Generic E-Science platform for DM

Application layer for Systems Biology
Acknowledgments






Robert Stevens (Manchester)
Alan Williams (Manchester)
Rishi Ramgolam (Manchester)
Jorg-Uwe Kietz (Zurich)
Melanie Hilario (Geneva)
E-LICO consortium