Data Mining Engineering

Download Report

Transcript Data Mining Engineering

Toward Knowledge Discovery in
Databases Attached to Grids
Peter Brezany
Institute for Software Science
University of Vienna
E-mail : [email protected]
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
1
Media That Radically Influenced Society
1500s
Printing Press
1840s
Penny Post
1930s
Radio
1950s
TV
1990s
Web
P.Brezany
1850s
Telegraph
20xx
Grid
Institut für Softwarewissenschaft - Universität Wien
1920s
Telephone
2
Talk Outline
• Data Mining on the Grid – Background Information
• Application Examples
• Architecture of a Traditional Data Mining System
• GridMiner – A framework for Data Mining on the Grid
• GridMiner Architecture
• Functional and Data Access Model
• Conclusions
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
3
Data Mining on the Grid
• Data mining on the Grid (DMG) : finding unknown data patterns in an
environment with geographically distributed data and computation.
• Data may be highly heterogeneous with a high update frequency
• A good DMG algorithm analyzes data in a distributed fashion with
modest data communication overhead.
• A typical DMG algorithm involves local data analysis followed by the
generation of a global data model.
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
4
Application Examples
• Finding out the dependency of the emergence of hepatitis-C on the
weather patterns: access to a large hepatitis-C DB at one location and
an environmental DB at another location.
• 2 major financial organizations want to cooperate. They need to share
data patterns relevant to the data mining task, they do not want to
share the data since it is sensitive - combining the databases may not
be feasible.
• Federating Brain Data Project – Integrating several neuro-science DBs
• A major multi-national corporation wants to analyze the
customer transaction records for quickly developing
successful business strategies.
- It has thousands of establishments through out the world
- Collecting all the data to a centralized data warehouse,
followed by analysis using existing commercial data mining
software,takes too long.
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
5
Telemedical Applications
AMG – Austrian Medical Grid
Database
Raw Medical Data
Derived Medical Data
Database
Reconstructed Medical Data
Web
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
6
Telemedical Collaboration - Example
A patient living in a remote village has a heart problem.
An EEG is taken by the local doctor and all the patient’s details
are stored in the doctor’s PC based telemedical system.
MRI and CT scans are taken within different departments of a
general hospital and stored in the telemedical DB. A consultant
compiles a report and saves it in the DB.
If necessary, in a specialized clinic a 3D ultrasound scan is taken
and further report compiled.
Requiring complicated surgery, an external specialist using Virtual
Reality techniques defines how the surgery should be planned.
The resulting operation is placed on video for, e.g., education.
 Data mining support/assistance is needed.
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
7
Architecture of a Data Mining System
Graphical user interface
Pattern evaluation
Knowledge
base
Data mining engine
Database or
data warehouse server
Data cleaning, data integration
Database
P.Brezany
Filtering
Data
warehouse
Institut für Softwarewissenschaft - Universität Wien
8
On Line Analytical Mining (OLAM)
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
9
GridMiner – A Framework for Data
Mining on Grids
System Requirements:
- Algorithm and data publishing and integration
- Compatibility with grid infrastructure and Grid
awareness
- Openness
- Scalability
- Security and data privacy
Functionality requirements:
- Mining different kinds of knowledge in databases
- Incremental data mining algorithms
- Interactive mining of knowledge at multiple levels
of abstraction
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
10
GridMiner (Layered) Architecture
(Based on the K.F. Jeffery´s idea)
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
11
Functional and Data Access Model
MDS
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
12
Example: Mining Patterns for Data
Classification and Associations
use database dat1, dat2
mine classifications
analyze credit_rating
using g_parsimony
display as tree
P.Brezany
use database DBs attributes
mine associations
using method attributes
display as rules
Institut für Softwarewissenschaft - Universität Wien
13
Knowledge Grid Architecture Layers
High level layer
Data
Access Service
Core layer
Tools and Algorithms
Access Service
Knowledge
Directory Service
Execution Plan
Management
Result Present.
Service
Resource Allocation
Execution Management
Generic Grid and Data Grid Services
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
14
Conclusions
• Grid data mining is a relevant research topic
• GridMiner approach may contribute to this research domain
• Collaborations are needed
• IPG (Information Power Grid) is the only Grid project, which
wants to addresss knowledge discovery issues
• Looking for a pilot application(s)
• Open issues
- basic Grid technology: Globus, DataGrid,
Jini, JXTA ?
P.Brezany
Institut für Softwarewissenschaft - Universität Wien
15
Data Storage and the Components
Site A
Site B
Site C
Site D
Preprocesing
Preprocessing
Preprocessing
Preprocessing
Local DM
Local DM
Local DM
Local DM
Construction of the Global Model
GUI
P.Brezany
Site E
Institut für Softwarewissenschaft - Universität Wien
16