Modeling and Language Support for the management of PBMS

Download Report

Transcript Modeling and Language Support for the management of PBMS

Modeling and Language Support
for the management of PBMS
Manolis Terrovitis
Panos Vassiliadis
Spiros Skiadopoulos
Elisa Bertino
Barbara Catania
Anna Maddalena
1
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
2
Motivation
Huge amounts of data are produced.
Interesting knowledge has to be detected
and extracted.
Knowledge extraction techniques (i.e.,
Data Mining) are not sufficient:
Huge amounts of results (clusters, association
tules, decision trees etc)
Arbitrary modeling of results
3
Motivation (con’t)
We need to be able to manipulate the
knowledge discovered!
The basic requirements:
A generic and homogenous model for patterns.
Well defined query operators.
Efficient storage.
4
The Patterns and PBMS
[Rizzi et. al. ER 2003]
Patterns are compact and rich in
semantics representations of raw data.
Clusters, association rules, decision trees e.t.c.
Pattern Base Management System
Patterns are treated as first class citizens
Pattern-based queries
Approximate mapping between patterns and
raw data
5
Contributions
We formally define the logical foundations
for pattern management
We present a pattern specification
language
We introduce queries and query operators
6
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
7
PBMS architecture
Pattern
Space
Pattern
Classes
 Pattern Space:
Pattern Types
Pattern Classes
Patterns
Pattern
Types
Instance
of
Member of
Patterns
 Intermediate Results
Intermediate
Mappings
Data
Mining
Algorithms
Data
Space
Pattern
Recognition
Algorithms
DB1
 Data Space
DB2
8
The patterns
Patterns hold information for:
the data source
the structure of the pattern
The relation between the structure and the
source, in an approximate logical formula.
9
Pattern - Cluster Example
Pid
337
Structure
[CENTER: [X: 21, Y: 1200], RAD: 12 ]
Data
EMP: {[Age, Salary]}
Formula
(t.Age - 21)2 + (t.Salary - 1200)2 ≤ 12 2
where t  EMP
10
Pattern type - example
Name
Disk
Structure Schema
[CENTER: [X:real, Y: real], RAD: real ]
Data Schema
REL: {[X: real, Y: real]}
Formula Schema
(t.X - CENTER.X)2 +
(t.Y - CENTER.Y )2 ≤RAD2
where t  REL
11
The formula
An intentional description of the patterndata relation
pros:
Efficiency, more intuitive results
cons:
Accuracy
12
Intentional vs. Extensional
AGE
30
Salary
30 31
13
The formula (con’t)
The formula is a predicate:
fp(x,y) where x  Source,y  Structure
 Expressiveness.
Functions and predicates
 Safety.
Range restriction.
 Queries employing the formula are n-depth
domain independent.
14
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
15
Query Operators
Query operator classes:
Database operators
Pattern Base operators
Crossover database operators
Crossover pattern base operators
16
Crossover Operators
Exact evaluation, via the intermediate mappings
Approximate evaluation, via the formula
Data
Space
Pattern
Space
PID
data
formula
structure
Exact
Approximation
Exact
17
Crossover Operators
Database
Drill-Through: Which data are represented by
these patterns?
Data-Covering: Which data from this dataset
can be represented by this pattern?
Pattern Base
Pattern-Covering: Which of these patterns
represent this dataset?
18
Query Example
AGE
p
q
Drill-through(
{ p |
p intersects q})
Salary
19
Outline
Introduction
Modeling of data and patterns
Query Operators
Summary and future work
20
Summary
Formal specification of basic PBMS
concepts
Investigation on the representation of the
pattern-data relation
Formal definition of query operators
21
Future Work
Query language
Generic similarity measures
Efficient implementation of intermediate
mappings
Statistical measures for the patterns.
22