Data Management for P2P Computing: A Vision
Download
Report
Transcript Data Management for P2P Computing: A Vision
Data Management for
Peer-to-Peer Computing:
A Vision
Philip A. Bernstein
Microsoft Research
Fausto Giunchiglia
Univ. of Trento
Anastasios Kementsietsidis Univ. of Toronto
John Mylopoulos
Univ. of Toronto
Luciano Serafini
Univ. of Trento
Ilya Zaihrayeu
Univ. of Trento
May 28, 2002
P2P Databases
1
A Peer to Peer (P2P) Research Project
Database and AI researchers intermittently connect to
exchange research ideas … about P2P DBs
Seattle Toronto
May 28, 2002
Trento (Italy)
P2P Databases
2
Is There a Role for P2P DBs ?
Peers come and go, but must still be able to interoperate.
To us, the big question is how to cope with DBs that
are incomplete, overlapping, and mutually inconsistent
dynamically appear and disappear
have limited connectivity.
Scenario
Databases of medical patients
Complete integration is likely to be infeasible
But dynamic integration of DBs relevant to one patient
could have high value.
May 28, 2002
P2P Databases
3
Contributions
Why P2P databases are different
A P2P database scenario
A logic for P2P databases
Architecture and implementation issues
May 28, 2002
P2P Databases
4
A Model for P2P Databases
Each peer is a node with a database. It exchanges data
and services with acquaintances (i.e. other peers).
The set of acquaintances changes often, due to
site availability
changing usage patterns
Peers are fully autonomous.
No global control or central server.
Hence no global schema
It would be impractical to build one for each peer
And it might be impossible because of inconsistencies
May 28, 2002
P2P Databases
5
A Motivating Scenario
A patient may be described
in several DBs, which use
different patient id formats,
disease descriptions, etc.
When a patient is admitted
to the hospital, H becomes
acquainted with D
D: Doctor
P: Pharmacist
H: Hospital
The acquaintance is dropped
when treatment is over
When the doctor prescribes a
drug, D becomes acquainted
with P
A patient is injured skiing, so
more DBs get involved
May 28, 2002
Ski Clinic
P2P Databases
6
Proposal: Local Relational Model (LRM)
A logic for P2P data integration
Instead of a global schema, each peer has
coordination formulas – each specifies semantic
interdependencies between two acquaintances
binary domain relations – each specifies how
symbols in one database translate to symbols
in an acquaintance’s database.
Each expression in a coordination formula is relative
to just one participating database
Use coordination formulas and domain relations for
query and update processing.
May 28, 2002
P2P Databases
7
A Coordination Formula
p: pharmacist DB
medication(PrescriptionID, Pid, Prod)
d: doctor DB
treatment(Tid, Pid, Description, Type)
where type {“hospital”, “home”}.
(i:x).A(x) means
for all v in the domain of database i, A(v) is true.
A coordination formula
(p:y).(p:z).(p: (x).medication(x, y, z)
d: (w).treatment(w, y, z, “home”) )
“There’s a row in treatment in the doctor DB
for each row in medication in the pharmacist DB”
May 28, 2002
P2P Databases
8
Domain Relation
A row <d1,d2> in domain relation rik specifies that
value d1 in DBi corresponds to value d2 in DBk
rik may be partial
rik,rki need not be symmetric
Example - DBi contains lengths in meters and
DBk in kilometers (total but not symmetric)
rik(x) = roundToClosestK(x)
rik(653)=1, rik(453)=0
rki(x) = x*1000
rki(1)=1000
May 28, 2002
P2P Databases
9
Queries
A query is a coordination formula of the form
A(x) i: q(x), where
A(x) is a coordination formula
x has n variables
i is the database against which the query is posed
q is a new n-ary predicate symbol
A relational space is a pair <db,r> where db is a set of
DBs and r associates an rik with each pair of DBs
<db,r> ⊨ f A relational space <db,r> satisfies a
coordination formula f
The answer to a query:
n
{ddomi | <db,r> ⊨ ((i:x).A(x) i:x=d)}
May 28, 2002
P2P Databases
10
Interpreting a Query
A query
((i:P(x) j:R(y)) k:S(x,y) ) h: q(x,y)
Evaluate P,R,S in i,j,k (respectively)
Map these results via rih,rjh,rkh to sets si,sj,sk
And then compute ((si sj) sk)
May 28, 2002
P2P Databases
11
View-Based Data Integration
The standard model for data integration is based on
Global as view
Local as view
How does this relate to LRM?
Could use LRM to express view definitions
Either map “standard” view defns to LRM, or
Specify a restricted LRM for view definitions
Could customize LAV/GAV query interpretation for such LRM
views and queries
This is work in progress.
May 28, 2002
P2P Databases
12
Implementation Architecture
A classic multi-database system, with
A protocol for adding/dropping acquaintances
LRM query processing (domain mapping logic) that can
cope with chains of acquaintances
Dynamic approach to materialized view creation
Tools to help a user establish an acquaintance (e.g. yellow
pages, defining domain relations & coord formulas, ...)
User Interface
Query Mgr
LRM layer
A Node
May 28, 2002
Update Mgr
Wrapper
P2P
Network
Local Information Source
P2P Databases
13
Summary
Why P2P databases are different
A P2P database scenario
A logic for P2P databases (LRM)
Coordination formulas and domain relations
Query semantics
Architecture and implementation issues
May 28, 2002
P2P Databases
14
Theoretical Results
Provide inference rules for coordination formulas
Prove that the rules are sound and complete.
Define a generalized relational theory as a theory with
domain closure, distinct domain values, and a finite number
of possible relation extensions (CWA).
Define relational multi-context system <T,R> as a family
of relational languages (one per database) with a
generalized relational theory (in T) and a set of
coordination formulas (in R) for each one.
Prove that for any relational multi-context system, there’s a
unique maximal relational space that satisfies it.
(Generalizes Reiter’s result on CWA.)
Other results on recursive queries and query evaluation.
May 28, 2002
P2P Databases
15