Transcript ppt

Designing Services for Grid-based
Knowledge Discovery
A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio
DEIS
University of Calabria
ITALY
[email protected]
Future Generation Grids, Dagstuhl Seminar, November 2004
SUMMARY



The use of computers is changing our way to make discoveries
and is improving both speed and quality of the discovery
processes.
In this scenario the Grid can provide an effective
computational support for distributed knowledge discovery
from large and distributed data sets. To this purpose we
designed a system called Knowledge Grid.
This talk discusses how to design distributed knowledge
discovery services, according to the OGSA model, by using
the Knowledge Grid services starting from searching Grid
resources, composing software and data elements, and
executing the resulting application on a Grid.
2
OUTLINE

MOTIVATIONS

TOWARDS KNOWLEDGE SERVICES

THE KNOWLEDGE GRID

OGSA SERVICES FOR KNOWLEDGE DISCOVERY

A META-LEARNING EXAMPLE

CONCLUSIONS
3
MOTIVATIONS

Lots of data collected and warehoused.

Data collected and stored at enormous speeds in local databases,
from remote sources, or from the sky.

Scientific simulations generating terabytes of data.

Huge data sets are hard to understand.

Traditional techniques are infeasible for raw data.

Computational science is evolving toward data-intensive
applications that include
•
•
•
data analysis,
information management, and
knowledge discovery.
4
MOTIVATIONS




Most data will never be examined by humans; it is
analyzed and summarized by computers.
Data analysis is becoming a key element in scientific
discovery and in business processes.
Data intensive applications are defined to be those that
explore, query, analyze, visualize, and in general, process
very large-scale data sets.
Data intensive applications help
•
scientists in hypothesis formation
•
companies to provide better, customized services and support
decision making.
5
TOWARDS KNOWLEDGE SERVICES
SCIENTIFIC OBJECTIVES

This objective can be achieved through
•
development of techniques and tools for supporting data
intensive applications and
•
integration of Data and Computation
Information and Knowledge Grids.
Grids
with
to support the process of unification of data management and
knowledge discovery systems with Grid technologies for providing
knowledge-based Grid services.
Grid-aware Knowledge Discovery Systems
6
THE KNOWLEDGE GRID



PAST
KNOWLEDGE GRID - a distributed knowledge discovery
architecture that integrates data mining techniques and
computational Grid resources.
In the KNOWLEDGE GRID architecture data mining tools are
integrated with lower-level Grid mechanisms and services and
exploit Data Grid services.
This approach benefits from "standard" Grid services and offers
an open architecture that can be configured on top of generic
Grid middleware.
7
KNOWLEDGE GRID ARCHITECTURE
K
N
O
W
RPS
L
Result
Presentation Service E
D
G
E
Resource Metadata
Execution Plan Metadata
Model Metadata
High level K-Grid layer
DAS
TAAS
EPMS
Data Access
Service
Tools and Algorithms
Access Service
Execution Plan
Management Service
Core K-Grid layer
KMR
PAST
KDS
RAEMS
Knowledge Directory
Service
Resource Alloc.
Execution Mng.
KEPR
G
KBR R
I
D
Generic and Data Grid Services
8
THE KNOWLEDGE GRID
D3
S3
H3
S1
H2
S2
PAST
FUTURE
D1
D2
D2
H
Component
Selection
Service Selection
H1
D2
D2 1
D2
D3
D1
S3
H2
D4
S1
H3
D4
Application Workflow
Composition
Application Execution
on the Grid
9
OGSA KNOWLEDGE GRID SERVICES
FUTURE

The KNOWLEDGE GRID is an abstract service-based Grid
architecture that does not limit the user in developing and
using service-based knowledge discovery applications.


We are defining a set of Grid Services that export
functionality and operations of the KNOWLEDGE GRID.
Each of the KNOWLEDGE GRID services is exposed as a
persistent service, using the OGSA conventions and
mechanisms.
10
KNOWKEDGE SERVICES: A Meta-Learning
Example




A simple example of meta-learning process over the
KNOWLEDGE GRID.
To show how the execution of a significant distributed
data mining application can benefit from the Knowledge
Grid services, provided through the OGSA model.
Meta-learning aims to generate a number of
independent classifiers by applying learning programs
to a collection of distributed data sets in parallel.
The classifiers computed by learning programs are then
collected and combined to obtain a global classifier.
11
KNOWKEDGE SERVICES: A Meta-Learning
Example
Node1
Step 1
Learner
L1
Training
Set TR1
Step 2
Step 3
NodeZ
…
NodeA
Data
Set DS
Partitioner
P
Nodei
Learner
Li
Training
Set TRi
…
Noden
Training
Set TRn
Classifier
C1
…
Classifier
Ci
…
Classifier
Cn
Combiner/Tester
CT
Validation
Set VS
Global Classifier
GC
Testing
Set TS
Learner
Ln
12
KNOWKEDGE SERVICES: A Meta-Learning
Example


A user application interacts with Knowledge Grid nodes
to generate a classifier by combining the classifiers built
from different subsets of a given data set.
The scenario comprises five nodes:
•
NU, running the user application that builds the meta-learning
application and visualizes the global classifier;
•
NS, which is used for resource discovery and for steering the
meta-learning application execution;
•
NA, on which the original dataset is located and it provides a
data partitioning service;
•
NC, providing learning services which are performed in parallel
over a homogeneous cluster;
•
NZ, providing a combiner/tester service used to compute the
global classifier.
13
RESOURCE DISCOVERY AND EXECUTION
PLANNING
The
On node
application
DAS and
Ns, TAAS
the
builds
metaservices
an
The
user
application
of
execution
information
node Nsplan
invoke
about
for the
the
nodes
invokes
the
DAS
and
TAAS
corresponding
meta-learning
Nc and Nz is analyzed,
process,
services
on
and
services
on
the
node
Ns
other
specifying
such Knowledge
nodes
strategies
are identified
Gridfor
specifying
the
required
nodes,
data
as candidates
movement
in orderfor
to
and
the
obtain
resources:
two
nodes
information
algorithm
computation.
execution.
about
The the
DAS
The
and
providing
services
for
the
needed
execution
TAAS services
resources.
plan on
is submitted
node Ns
metalearning
process
Contacted
to
send
the this
EPMS
information
nodes
of node
reply
Ns.(a
to
learner
and
a
node
the U.A..
Ns sending metacombiner/tester)
and for
information.
resource reservation.
Storage
Reservation
Factory
TAAS
DAS
R
R
EPMS
User
Application
NU
DAS
Database
Service
NA
NS
DAS
R
Partitioner
Factory
Resource
Reservation
Factory
NC
TAAS
Learner
Factory
DAS
R
Resource
Reservation
Factory
NZ
TAAS
R
Combiner
Factory
14
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
The EPMS invokes the factories on
Na, Nc and Nz requesting the
creation of a partitioner service on
node Na, and the creation of two
reservation services on Nc and Nz.
On node Nc,computing cycles are
reserved (on each computing
element) to execute the learner
programs, storage space is
reserved to maintain the subsets
extracted from DS and the partial
classifiers. On node Nz, storage
space is reserved to maintain the
partial and global classifiers.
Storage
Reservation
Factory
Database
Service
NA
DAS
R
R
EPMS
User
Application
NU
R
DAS
TAAS
Partitioner
Factory
NS
DAS
Resource
Reservation
Factory
NC
TAAS
Learner
Factory
R
DAS
Resource
Reservation
Factory
NZ
R
TAAS
Combiner
Factory
15
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
The requests made
by the EPMS result
in the creation of
the requested
services.
Storage
Reservation
Factory
TAAS
DAS
R
R
EPMS
User
Application
NU
R
DAS
Database
Service
NA
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
Reservation
Service
NC
TAAS
Learner
Factory
R
DAS
Resource
Reservation
Factory
R
TAAS
Combiner
Factory
Reservation
Service
NZ
16
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
Storage
Reservation
Factory
The partitioner service
interacts
with
the
database service on the
same node to extract the
needed subsets from DS:
n training sets, a testing
set and a validation set.
TAAS
DAS
R
R
EPMS
User
Application
NU
R
DAS
Database
Service
NA
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
Reservation
Service
NC
TAAS
Learner
Factory
R
DAS
Resource
Reservation
Factory
R
TAAS
Combiner
Factory
Reservation
Service
NZ
17
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
The EPMS invokes the DAS
service on
node Na,
requesting to transfer the
training sets to node Nc,
and
the
testing
and
validation sets to node Nz;
the learner factory on Nc,
requesting the creation of
n learner service instances
to be run on the same
node.
Storage
Reservation
Factory
Database
Service
NA
DAS
R
R
EPMS
User
Application
NU
R
DAS
TAAS
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
Reservation
Service
NC
TAAS
Learner
Factory
R
DAS
Resource
Reservation
Factory
R
TAAS
Combiner
Factory
Reservation
Service
NZ
18
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
On node Nc, n learner
service
instances
are
created.
On
each
computing
element
of
node Nc, the learner
service instances generate
the partial classifiers. As
soon as each partial
classifier is obtained, a
notification message is
sent to the EPMS.
Storage
Reservation
Factory
Database
Service
NA
DAS
R
R
EPMS
User
Application
NU
R
DAS
TAAS
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
TAAS
R
Learner
Factory
DAS
Resource
Reservation
Factory
R
TAAS
Combiner
Factory
Learner
Serv.
Learner
Learner Serv.
Serv.
Reservation
Service
NC
Reservation
Service
NZ
19
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
The EPMS invokes (i) the
DAS service on node Nc,
requesting to transfer the
generated classifiers to
node
Nz;
the
combiner/tester factory on
Nz, requesting the creation
of
a
combiner/tester
service to be run on the
same node.
Storage
Reservation
Factory
TAAS
DAS
R
R
EPMS
User
Application
NU
R
DAS
Database
Service
NA
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
TAAS
R
Learner
Factory
DAS
Resource
Reservation
Factory
R
TAAS
Combiner
Factory
Learner
Serv.
Learner
Learner Serv.
Serv.
Reservation
Service
NC
Reservation
Service
NZ
20
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
On
node
Nz,
a
combiner/tester service
is created to perform the
combining and testing
processes and generate
the global classifier GC.
Storage
Reservation
Factory
TAAS
DAS
R
R
EPMS
User
Application
NU
R
DAS
Database
Service
NA
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
TAAS
R
Learner
Factory
DAS
Resource
Reservation
Factory
Learner
Serv.
Learner
Learner Serv.
Serv.
Reservation
Service
NC
Reservation
Service
NZ
R
TAAS
Combiner
Factory
Combiner
Service
21
KDD APPLICATION EXECUTION
SCIENTIFIC OBJECTIVES
The EPMS invokes the
DAS service on node Nz,
requesting to transfer
the generated global
classifier to node Nu.
Storage
Reservation
Factory
TAAS
DAS
R
R
EPMS
User
Application
NU
R
DAS
Database
Service
NA
Partitioner
Factory
Partitioner
Service
NS
DAS
Resource
Reservation
Factory
TAAS
R
Learner
Factory
DAS
Resource
Reservation
Factory
Learner
Serv.
Learner
Learner Serv.
Serv.
Reservation
Service
NC
Reservation
Service
NZ
R
TAAS
Combiner
Factory
Combiner
Service
22
OPEN ISSUES
SCIENTIFIC OBJECTIVES

Data privacy and security

KDD process state management

FUTURE
Complex processing patterns (Web Services are too simple
to express distributed data mining processes and
applications)

KDD Grid Service standards ( towards OGSA-KDAI ?)

KDD processes as G-Services Workflows

Asynchronous services

……
23
CONCLUSIONS
SCIENTIFIC OBJECTIVES




The knowledge-building process in a distributed setting involves
data and information collection, generation, and distribution
followed by the collective interpretation of processed
information into “knowledge.”
Next-generation Grids must be able to produce, use, and
deploy knowledge as a basic element of advanced
applications.
Knowledge-based Grids that can offer tools, components and
services to support data analysis, inference, and discovery in
scientific and business applications.
OGSA-based services for distributed knowledge discovery are a
key element for large support of e-science and e-business.
24
THANKS
www.icar.cnr.it/kgrid
CREDITS:
M. Cannataro
C. Comito
25