The GridMiner project

Download Report

Transcript The GridMiner project

GridMiner
A Framework for Knowledge Discovery
on the Grid – from a Vision to Design and
Implementation
Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja
University of Vienna
Institute for Software Science
email: [email protected]
www.gridminer.org
… Intelligent Grid Solutions
GridMiner Overview

Start: Jan. 2003

Host:
University of Vienna

Target:


Vienna University of Technology
provide tools to discover and access relevant knowledge and information
from different distributed and heterogeneous data sources
Test application area: medical



traumatic brain injury treatment
Predicting the outcome of seriously ill patients
analytical part focuses on data mining and On-Line Analytical Processing
(OLAP)
www.gridminer.org
CGW'04, 13. Dec. 04
2
Project members
Project leader:
Prof. A Min Tjoa, Vienna University of Technology
Prof. Peter Brezany, University of Vienna
Visualization:
Radoslav Ivanov
Data streaming:
Nguyen Manh Tho
Data mediation:
Alexander Wöhrer
Knowledge Mgt:
Ivan Janciak
Job Control:
Günter Kickinger
Sequence Rules:
Michael Rinner
Clustering:
Markus Mayer
Decision rules:
Christian Kloner
Juergen Hofer
GUI:
Paul Panhofer
Autonomic aspects:
Michael Bergmann
CGW'04, 13. Dec. 04
OLAP:
Bernhard Fiser
Umut Onan
Ibrahim Elsayed
www.gridminer.org
3
Outline










Motivation/ Requirements
GridMiner Services
Architecture
Dynamic Service Composition Engine
OLAP
Knowledge base
Data Integration
Graphical user interface
Implementation
Summary
www.gridminer.org
CGW'04, 13. Dec. 04
4
The process to cover
 Data distributed over
participating hospitals
 accesses from
different platforms
(hand held, PC,…) for
data generation,
querying, analysis
 Process needs to
access various data
sources
www.gridminer.org
CGW'04, 13. Dec. 04
5
GridMiner
 Motivation



integrate knowledge discovery and knowledge
management as an autonomic system
manage and control whole lifecycle of knowledge
give a strong support to other intelligent entities in their
needs for knowledge
 Basic Requirements




Ability to access and analyze a huge amount of information –
typically heterogeneous and geographically distributed
Intelligent behavior ability to maintain, discover, extend, present
and communicate knowledge
High performance (real-time or soft real-time) query processing
High security guarantee
www.gridminer.org
CGW'04, 13. Dec. 04
6
GridMiner Services
 Dynamic Workflow Control Service
 Data mining services
 Sequences (SPADE)
 Clustering (SimpleKMeans)
 Decision rules (SPRINT)
 OLAP (sequential/parallel version)
 Association rules on OLAP
 Grid Data Mediator Service
www.gridminer.org
CGW'04, 13. Dec. 04
7
User environment
Web
GridMiner Architecture
Graphical User Interface
Knowledge Base
Service configuration
DSCE Client
Grid
Dynamic service control engine (DSCE)
Data Access and Integration
Data mining services
www.gridminer.org
CGW'04, 13. Dec. 04
8
Dynamic Service Control Engine
 Process a workflow described by DSCL.
Based on the Open Grid Services Architecture
Supports both interactive and batch processing
User independent processing of the workflow
Provision of all intermediate results from the
involved services
 Full user control during workflow execution
 Supports the OGSA Notification Model




www.gridminer.org
CGW'04, 13. Dec. 04
9
Dynamic Service Control Engine
(cont.)
www.gridminer.org
CGW'04, 13. Dec. 04
10
Knowledge Base
SWRL
Rules
OWL
Metadata
Domain Ontology
Datamining Ont.
Activity Ontology
Datatsource Ont.
Facts
Web Ontology Language OWL
+ OWL-S
XML ,XML Schema (XSL)
(webrowset,pmml…)
www.gridminer.org
CGW'04, 13. Dec. 04
11
OLAP
 Multidimensional data analysis by
sequential and distributed / parallel
OLAP engines.
 Cube construction and querying
 Representation of query results by
OLAP Modeling Markup Language
 Integration with data mining engines
(Association rules on OLAP)
www.gridminer.org
CGW'04, 13. Dec. 04
12
Grid Data Mediation Service
Principles
 Tight Federation:
 global (relational) schema
 Virtual integration:
 let the data where it is
 always up-to-date data
 No proprietary solution
 inherit well solve aspects from OGSA-DAI
 Not bound to special architecture
 Supported data sources:
 RDBMS (via JDBC), XMLDB (Xindice), CSV files
 Operators: “Union all” and “inner join”
 Operators are XQuery based (using SAXON)
www.gridminer.org
CGW'04, 13. Dec. 04
13
Data Integration Scenario
 Heterogeneities:
 Name in A is „First Last“ (as the target format)
 Name in C has to be combined
 Distribution:
 3 data sources
www.gridminer.org
CGW'04, 13. Dec. 04
14
Data Integration Scenario (cont.)
 Query:
SELECT p_name FROM patient WHERE id=10
Standard
to
optimized
www.gridminer.org
CGW'04, 13. Dec. 04
15
Implementation/Technology
 Globus 3.2
 OGSA/DAI
 GUI – Workflow constructions/Results
visualization (JGraph, Java web Start, Java
server pages)
 Service Configuration (Java server
pages/PHP/..)
 Knowledge base – (XML,OWL)
www.gridminer.org
CGW'04, 13. Dec. 04
16
Data mining Scenario
Decision Rules (SPRINT)
(Select 10k rows)
Decision Rules (C45)
Database
(100k rows)
(Select 20k rows)
Decision Rules
(C45)
www.gridminer.org
CGW'04, 13. Dec. 04
17
Graphical User Interface
www.gridminer.org
CGW'04, 13. Dec. 04
18
Summary
 Integrated data mining infrastructure
 Covers the whole process
 Service Oriented Architecture
 Implemented Prototype
 Project ongoing
 New data mining tasks (algorithms)
 Knowledge management
 More information:
http://www.gridminer.org
www.gridminer.org
CGW'04, 13. Dec. 04
19
Thank you
Questions?
www.gridminer.org
CGW'04, 13. Dec. 04
20