The GridMiner project
Download
Report
Transcript The GridMiner project
GridMiner
A Framework for Knowledge Discovery
on the Grid – from a Vision to Design and
Implementation
Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja
University of Vienna
Institute for Software Science
email: [email protected]
www.gridminer.org
… Intelligent Grid Solutions
GridMiner Overview
Start: Jan. 2003
Host:
University of Vienna
Target:
Vienna University of Technology
provide tools to discover and access relevant knowledge and information
from different distributed and heterogeneous data sources
Test application area: medical
traumatic brain injury treatment
Predicting the outcome of seriously ill patients
analytical part focuses on data mining and On-Line Analytical Processing
(OLAP)
www.gridminer.org
CGW'04, 13. Dec. 04
2
Project members
Project leader:
Prof. A Min Tjoa, Vienna University of Technology
Prof. Peter Brezany, University of Vienna
Visualization:
Radoslav Ivanov
Data streaming:
Nguyen Manh Tho
Data mediation:
Alexander Wöhrer
Knowledge Mgt:
Ivan Janciak
Job Control:
Günter Kickinger
Sequence Rules:
Michael Rinner
Clustering:
Markus Mayer
Decision rules:
Christian Kloner
Juergen Hofer
GUI:
Paul Panhofer
Autonomic aspects:
Michael Bergmann
CGW'04, 13. Dec. 04
OLAP:
Bernhard Fiser
Umut Onan
Ibrahim Elsayed
www.gridminer.org
3
Outline
Motivation/ Requirements
GridMiner Services
Architecture
Dynamic Service Composition Engine
OLAP
Knowledge base
Data Integration
Graphical user interface
Implementation
Summary
www.gridminer.org
CGW'04, 13. Dec. 04
4
The process to cover
Data distributed over
participating hospitals
accesses from
different platforms
(hand held, PC,…) for
data generation,
querying, analysis
Process needs to
access various data
sources
www.gridminer.org
CGW'04, 13. Dec. 04
5
GridMiner
Motivation
integrate knowledge discovery and knowledge
management as an autonomic system
manage and control whole lifecycle of knowledge
give a strong support to other intelligent entities in their
needs for knowledge
Basic Requirements
Ability to access and analyze a huge amount of information –
typically heterogeneous and geographically distributed
Intelligent behavior ability to maintain, discover, extend, present
and communicate knowledge
High performance (real-time or soft real-time) query processing
High security guarantee
www.gridminer.org
CGW'04, 13. Dec. 04
6
GridMiner Services
Dynamic Workflow Control Service
Data mining services
Sequences (SPADE)
Clustering (SimpleKMeans)
Decision rules (SPRINT)
OLAP (sequential/parallel version)
Association rules on OLAP
Grid Data Mediator Service
www.gridminer.org
CGW'04, 13. Dec. 04
7
User environment
Web
GridMiner Architecture
Graphical User Interface
Knowledge Base
Service configuration
DSCE Client
Grid
Dynamic service control engine (DSCE)
Data Access and Integration
Data mining services
www.gridminer.org
CGW'04, 13. Dec. 04
8
Dynamic Service Control Engine
Process a workflow described by DSCL.
Based on the Open Grid Services Architecture
Supports both interactive and batch processing
User independent processing of the workflow
Provision of all intermediate results from the
involved services
Full user control during workflow execution
Supports the OGSA Notification Model
www.gridminer.org
CGW'04, 13. Dec. 04
9
Dynamic Service Control Engine
(cont.)
www.gridminer.org
CGW'04, 13. Dec. 04
10
Knowledge Base
SWRL
Rules
OWL
Metadata
Domain Ontology
Datamining Ont.
Activity Ontology
Datatsource Ont.
Facts
Web Ontology Language OWL
+ OWL-S
XML ,XML Schema (XSL)
(webrowset,pmml…)
www.gridminer.org
CGW'04, 13. Dec. 04
11
OLAP
Multidimensional data analysis by
sequential and distributed / parallel
OLAP engines.
Cube construction and querying
Representation of query results by
OLAP Modeling Markup Language
Integration with data mining engines
(Association rules on OLAP)
www.gridminer.org
CGW'04, 13. Dec. 04
12
Grid Data Mediation Service
Principles
Tight Federation:
global (relational) schema
Virtual integration:
let the data where it is
always up-to-date data
No proprietary solution
inherit well solve aspects from OGSA-DAI
Not bound to special architecture
Supported data sources:
RDBMS (via JDBC), XMLDB (Xindice), CSV files
Operators: “Union all” and “inner join”
Operators are XQuery based (using SAXON)
www.gridminer.org
CGW'04, 13. Dec. 04
13
Data Integration Scenario
Heterogeneities:
Name in A is „First Last“ (as the target format)
Name in C has to be combined
Distribution:
3 data sources
www.gridminer.org
CGW'04, 13. Dec. 04
14
Data Integration Scenario (cont.)
Query:
SELECT p_name FROM patient WHERE id=10
Standard
to
optimized
www.gridminer.org
CGW'04, 13. Dec. 04
15
Implementation/Technology
Globus 3.2
OGSA/DAI
GUI – Workflow constructions/Results
visualization (JGraph, Java web Start, Java
server pages)
Service Configuration (Java server
pages/PHP/..)
Knowledge base – (XML,OWL)
www.gridminer.org
CGW'04, 13. Dec. 04
16
Data mining Scenario
Decision Rules (SPRINT)
(Select 10k rows)
Decision Rules (C45)
Database
(100k rows)
(Select 20k rows)
Decision Rules
(C45)
www.gridminer.org
CGW'04, 13. Dec. 04
17
Graphical User Interface
www.gridminer.org
CGW'04, 13. Dec. 04
18
Summary
Integrated data mining infrastructure
Covers the whole process
Service Oriented Architecture
Implemented Prototype
Project ongoing
New data mining tasks (algorithms)
Knowledge management
More information:
http://www.gridminer.org
www.gridminer.org
CGW'04, 13. Dec. 04
19
Thank you
Questions?
www.gridminer.org
CGW'04, 13. Dec. 04
20