Delivering data mining to the Life Science Community
Download
Report
Transcript Delivering data mining to the Life Science Community
e-LICO
An e-Laboratory for Interdisciplinary Collaborative
research in data mining and data intensive sciences
Delivering data mining to the Life Science Community
Simon Jupp
School of Computer Science
University of Manchester, United Kingdom
October 12th, 2010
e-LICO project overview
Infrastructure to support collaborative, data mining
enabled experimental research
Knowledge-driven planning of DM workflows
– Improve planning by meta-mining
Support research in data-intensive, knowledge-rich
domains
– Systems biology use case
European Project
European Project, 9 partners. (Month 20/36)
– Specialists from Data Mining, Semantic Web, Grid
computing and Systems Biology
• University of Manchester, UK
• University of Geneva, Switzerland
• Inserm, France
• Josef Stefan Institute, Slovenia
• NHRF, Greece
• Poznan University, Poland
• Rapid-I GmbH, Germany
• Ruder Boskovic Institute, Coratia
• University of Zurich, Switzerland
An EU-FP7 Collaborative Project (2009-2012)
Theme ICT-4.4: Intelligent Content and Semantics
Problems…
Steep learning curve
– Many operators to choose from
– Best combination of operators
– Hard for non Data Miners
Capturing the workflow
– Explanation
– Error detection / Repair
– Reproducibility
– Provenance
Problems… and solutions (e-LICO planned workflows)
Steep learning curve
– Many operators to choose from
Develop “Intelligent Discovery Assistant”
(IDA) for Data Analysis
– Best combination of operators
– Automatically generate workflows by planning
– Hard for non Data Miners
– Assist the user in solving DM task
– Structure workflows in workflow templates
– Self improvement through Meta-Mining
Capturing the workflow
Ontology based data model
– Explanation
– Adds semantics
– Error Detection / Repair
– OWL/RDF based
– Reproducibility
– Data Mining Experiment Resository
– Provenance
The e-LICO workflow
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
Ontology based AI planner
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
Workflow planning
Hierarchical Task Network (HTN) planning
Set of Tasks to achieve possible Data Mining Goals
Tasks have an I/O specification and set of associated Methods to
achieve that task
Methods composed of simpler Task/Methods
Some methods are Operators with Conditions and Effects
Example: My task is ‘Data Mining With Evaluation’, my Goal is to get a
workflow that does this Evaluation via Cross-Validation
The Data Mining Worfkflow Ontology (DMWF)
Class
Description
Examples
IO Object
Input and output used by operators
Data, Model, Report
MetaData
Characteristics of the IOObjects
Attribute, AttributeType, DataColumn,
DataFormat
Operator
DM operators
DataTableProcessing, ModelProcessing,
Modeling, MethodEvaluation
Goal
A DM goal that the user could solve
DescriptiveModelling, PatternDiscovery,
PredictiveModelling, RetrievalByContent
Task
A task is used to achieve a goal
CleanMV, CategorialToScalar, DiscretizeAll,
PredictTarget
Methods
A method is used to solve a task
CategorialToScalarRecursive, CleanMVRecursive,
DiscretizeAllRecursive, DoPrediction
Workflow Planning
AI Planner
Brute force planning
Probabilistic Planning
What will likely produce better results?
Case-based Planning
– How did we solved that previously?
DMOP (Workflow optimization ontology)
– Algorithm and Model selection given a particular task
– Meta-mining by abstraction and generalisation
Meta-Mining
Initially, the AI planner recommends applicable DM workflows, not
necessarily good ones
Self-improves with experience through meta-mining
The meta-miner
– Applies DM techniques to meta-data from past DM experiments
– Extracts workflow patterns that are signatures of high predictive
performance
The planner uses these workflow patterns to design and recommend
promising workflows
Workflow Execution
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
e-LICO Kick-Off, Geneva
12
4/12/2016
Workflow Execution
All operators in ontology (+200) are exposed as SOAP or REST based Web
Service
Plans converted to Workflow execution language (SCUFL 2)
Provenance capture
– Execution times, intermediate model returned to planner
Taverna
Worflow Publishing and Sharing
3
1
Input Data
Workflow execution
engine
Ontology based
AI planner
4
Publish and
share
2
Meta-mining
Output: Data,
provenance and models
e-LICO Kick-Off, Geneva
14
4/12/2016
Workflow Publishing and Sharing
Workflows and data can be shared via myExperiment
Build a community of data miners
Set of re-usable workflows, data and workflow templates (packs)
Use case – Obstructive nephropathy
Demonstrated with System Biology Use Case
– Biomarker discovery and pathway modelling in the study of
chronic kidney disease
– KUP challenge initiated (August 2010)
Expression data
Text-mining / Image mining
Further wet lab
experiments
KUP KB
(RDF store)
New models
And hypothesis
Research Questions
How and when does a planner based “Intelligent Discovery Assistant” help
the end user?
Can we improve planning and suggest better workflows through metamining?
Can we plan complex workflows with Scientific Goals that answer biological
questions?
– KUP goal is to construct diagnostic models that accurately connect the biological
views to the severity of this pathology
Where are we nowAvailability
http://wwww.e-lico.eu
1st year demo –
http://www.youtube.com/watch?v=JtmqZfzyEKs
eProPlan plugin for Protégé 4.0
Ontologies available
Taverna
http://www.taverna.org.uk
RapidMiner
http://rapid-i.com
Summary
e-LICO: virtual laboratory for interdisciplinary collaborative research in
data-mining
Ontology based AI planning of KDD workflows
Generic E-Science platform for DM
Application layer for Systems Biology
Acknowledgments
Robert Stevens (Manchester)
Alan Williams (Manchester)
Rishi Ramgolam (Manchester)
Jorg-Uwe Kietz (Zurich)
Melanie Hilario (Geneva)
E-LICO consortium