Modeling and Language Support for the management of PBMS
Download
Report
Transcript Modeling and Language Support for the management of PBMS
Modeling and Language Support
for the management of PBMS
Manolis Terrovitis
Panos Vassiliadis
Spiros Skiadopoulos
Elisa Bertino
Barbara Catania
Anna Maddalena
1
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
2
Motivation
Huge amounts of data are produced.
Interesting knowledge has to be detected
and extracted.
Knowledge extraction techniques (i.e.,
Data Mining) are not sufficient:
Huge amounts of results (clusters, association
tules, decision trees etc)
Arbitrary modeling of results
3
Motivation (con’t)
We need to be able to manipulate the
knowledge discovered!
The basic requirements:
A generic and homogenous model for patterns.
Well defined query operators.
Efficient storage.
4
The Patterns and PBMS
[Rizzi et. al. ER 2003]
Patterns are compact and rich in
semantics representations of raw data.
Clusters, association rules, decision trees e.t.c.
Pattern Base Management System
Patterns are treated as first class citizens
Pattern-based queries
Approximate mapping between patterns and
raw data
5
Contributions
We formally define the logical foundations
for pattern management
We present a pattern specification
language
We introduce queries and query operators
6
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
7
PBMS architecture
Pattern
Space
Pattern
Classes
Pattern Space:
Pattern Types
Pattern Classes
Patterns
Pattern
Types
Instance
of
Member of
Patterns
Intermediate Results
Intermediate
Mappings
Data
Mining
Algorithms
Data
Space
Pattern
Recognition
Algorithms
DB1
Data Space
DB2
8
The patterns
Patterns hold information for:
the data source
the structure of the pattern
The relation between the structure and the
source, in an approximate logical formula.
9
Pattern - Cluster Example
Pid
337
Structure
[CENTER: [X: 21, Y: 1200], RAD: 12 ]
Data
EMP: {[Age, Salary]}
Formula
(t.Age - 21)2 + (t.Salary - 1200)2 ≤ 12 2
where t EMP
10
Pattern type - example
Name
Disk
Structure Schema
[CENTER: [X:real, Y: real], RAD: real ]
Data Schema
REL: {[X: real, Y: real]}
Formula Schema
(t.X - CENTER.X)2 +
(t.Y - CENTER.Y )2 ≤RAD2
where t REL
11
The formula
An intentional description of the patterndata relation
pros:
Efficiency, more intuitive results
cons:
Accuracy
12
Intentional vs. Extensional
AGE
30
Salary
30 31
13
The formula (con’t)
The formula is a predicate:
fp(x,y) where x Source,y Structure
Expressiveness.
Functions and predicates
Safety.
Range restriction.
Queries employing the formula are n-depth
domain independent.
14
Outline
Introduction
Modeling of data and patterns
Query operators
Summary and future work
15
Query Operators
Query operator classes:
Database operators
Pattern Base operators
Crossover database operators
Crossover pattern base operators
16
Crossover Operators
Exact evaluation, via the intermediate mappings
Approximate evaluation, via the formula
Data
Space
Pattern
Space
PID
data
formula
structure
Exact
Approximation
Exact
17
Crossover Operators
Database
Drill-Through: Which data are represented by
these patterns?
Data-Covering: Which data from this dataset
can be represented by this pattern?
Pattern Base
Pattern-Covering: Which of these patterns
represent this dataset?
18
Query Example
AGE
p
q
Drill-through(
{ p |
p intersects q})
Salary
19
Outline
Introduction
Modeling of data and patterns
Query Operators
Summary and future work
20
Summary
Formal specification of basic PBMS
concepts
Investigation on the representation of the
pattern-data relation
Formal definition of query operators
21
Future Work
Query language
Generic similarity measures
Efficient implementation of intermediate
mappings
Statistical measures for the patterns.
22