Implementing Database Coordination in P2P Networks *

Download Report

Transcript Implementing Database Coordination in P2P Networks *

Implementing Database
Coordination in P2P Networks *
Ilya Zaihrayeu
SemPGRID-04, 18 May 2004, New York, USA
*
work with Fausto Giunchiglia
Why P2P Databases
• P2P data sharing: files … relational data?
• File sharing: KaZaa + Morpheus = more than 460 million
downloads (download.com, May 2004)
• P2P databases: academia testbeds so far..
• Promises: large-scale fault-tolerant multi-database system
with low start-up and maintenance costs, and high
“output” for an individual party
• Difficulties: data integration solutions are not applicable
due to centralized nature
• Challenges: new methodologies, theories and algorithms,
models, mechanisms and tools need to be developed
Why P2P Databases, cont’d
• Application: non performance critical domains, where local
autonomy of each party is essential
• Medical care scenario
– John is going for skiing and suffers an accident
– John is taken to local clinic for treatment – doctors need to know whether
John has contraindication against some drugs
– John does not know these details, but his database layer has a link to family
doctor’s databases
• Cooperating real estate agents example
– Agents coordinate their data to push sales
– When on the site of a customer who wants to sell, agent updates his
database and makes data available for other agents
– When on the site of a customer who may want to buy, agent shows details
from his database, and may query other agent’s databases
• Other examples: scientific databases (genomic data), tourism, etc
Data Coordination Model
• Interest Groups – group of peers able to answer queries about a
certain topic
– e.g., group topic – “Tourism in Trentino”, “Real Estate in Scotland”, etc
– each Interest Group has group manager (GM) which helps in maintenance of
the group
• Acquaintances – “known” nodes that contribute data
– acquaintance query – a query over the relations of an acquaintance which
results satisfy some local relation
• Correspondence Rules – solve heterogeneity problem at instance
level
– semantic heterogeneity at structure level is solved by acquaintance queries
• Coordination Rules – coordinate data (queries and updates) with
acquaintances
Interest Groups
• Help to cope with large
number of nodes by clustering
the network
• Nodes self-organize into
interest groups
• A node may form a child
interest group
• One node may belong to
multiple groups
• Use schema matching to
monitor group constitution
• GM is to support group
constitution, “talk” to other
GMs and provide information
about the group to newcomers
All topics
Arts
Movies
… Music
Lyrics
…
Shopping
…
Publications Computers
Books
Acquaintance query
• Acquaintance query is a conjunctive query:
• q(X) :- r1(X1), …, rn(Xn)
– q(X) – head, refers to local relation;
– r1(X1), …, rn(Xn) – subgols of the body, refers to the relation of an
acquaintance; and comparison predicates
– X, X1,…, Xn – variables or constants;
• E.g., P1: films (title, year, genre) :- P2: movie (title, year, director);
genres (title, genre); year>1995
C :- E,F
E
A loop
1
3
F
2
D :- I,G
A
B
C
B :- C,D
D
I
I :- A,B
4
F :- G
G
Correspondence Rules and
Coordination Rules
• Correspondence rules define how constants from
the local domain are translated into constants in
the domain of an acquaintance (forward
translation) and vice versa (backward translation)
– not necessarily symmetric, e.g. currency translation
• Coordination Rules’ goal is data coordination with
acquaintances and acquainted nodes
– activated by user (user query) or from the network
(network query, results, update)
Algorithmic notes
• Query answering algorithm
– Use acquaintance queries and correspondence rules to translate queries and
data
– Propagate to acquaintances if acquaintance queries are relevant
– Compute only new tuples, reconcile results
– Process loops in query propagation, define termination point (no propagation
using acquaintance queries that have been already used)
• “Getting acquainted” protocol
– Retrieve database schemas and then apply a matching operator on them
– Based on the matching results, generate (with help of user) acquaintance
queries, correspondence rules, tune up coordination rules
• Updates handling (work with E. Franconi, G. Kuper, A. Lopatenko)
– Data may go through a loop more than once, define termination point
Implementing P2P databases on top
of JXTA
• Benefits
– system platform, networking protocol independence
– IP-independence (location independence)
– gives basic blocks for building P2P applications
• We implement Interest Groups and Acquaintances in JXTA
• We encode database related functionalities into a set of custom
JXTA services (DB-related services)
DB-related services
Node-level services
Queries
handler
…
Group-level services
DB
Screening
operations service
…
GM
service
Architecture
User
User Interface (UI)
Nodes on
the
network
User-1
Database Manager (DBM)
User-2
JXTA Layer
Wrapper
PDBMS
SS
Source Database (SDB)
A node
A P2P database
network
User-n
Architecture, cont’d
User Interface (UI)
DBM
Query Planner
P2P Management
Coordination Rules
Query Propagation
Updates Handler
Acquaintance queries
Acquaintances
Results Handler
Correspondence Rules
Services
GM in-pipe adv
DB-related services
JXTA Core
Services
JXTA Layer
Wrapper
Pipes
Peer Groups
Advertisements
Gr. topic
SS
Peer
Gr. Adv
Peer
Adv
Pipe
Adv
In
Out
Discovery
Demo: toy databases and topology
Rendezvous
peer
Relations:
(1)
(2)
(3)
(4)
[1,2]
Movie (title, year, genre)
Credits (name, title, role)
Movie2 (title, year, director)
Genre (title, genre)
(2:-2) [2]
1
(1:-1)
Q
0
[1,2]
(1:-3,4)
3
(2:-2)
(2:-2)
(4:-1)
(3:-3)
2
4
[2,3,4]
(4:-4)
Mediator
peer
[3]
[4]
5
Query example 1
“List titles of movies featuring Tom
Hanks”
[1,2]
(2:-2) [2]
1
Q(t) :- Credits (n,t,r); n=“Tom Hanks”
(1:-1)
Q
0
[1,2]
(1:-3,4)
3
(2:-2)
(2:-2)
(4:-1)
(2:-2)
(3:-3)
2
[3]
4
[2,3,4]
(4:-4)
[4]
5
Query example 2
“Titles of drama movies issued after
1995”
[1,2]
(2:-2) [2]
1
Q(t) :- Movie (t,y,g); g=“Drama”;
y>1995;
(1:-1)
Q
0
[1,2]
(1:-3,4)
3
(2:-2)
(2:-2)
(4:-1)
(3:-3)
2
[3]
4
[2,3,4]
(4:-4)
[4]
5
Query example 3
“Names of actors playing in action
movies in 2003”
[1,2]
(2:-2) [2]
1
Q(n) :- Movie (t,y,g); Credits (n,t,r);
r=“Actor”; g=“Action”; y=2003;
(1:-1)
Q
0
[1,2]
(1:-3,4)
3
(2:-2)
(2:-2)
(4:-1)
(3:-3)
2
[3]
4
[2,3,4]
(4:-4)
[4]
5
References
• F. Giunchiglia and I. Zaihrayeu. Making peer databases interact - a
vision for an architecture supporting data coordination. 6th
International Workshop on Cooperative Information Agents (CIA2002), Madrid, Spain, September 18 -20, 2002.
• P. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L.
Serafini, and I. Zaihrayeu, “Data management for peer-to-peer
computing: A vision,” WebDB, 2002.
• A. Halevy, Z. Ives, D. Suciu, and I. Tatarinov, “Schema mediation in
a peer data management system,” ICDE, 2003.
• V. Kantere, I. Kiringa, J. Mylopoulos, A. Kementsietsidis, and M.
Arenas, “Coordinating peer databases using ECA rules,” DBISP2P,
September 2003.
• Enrico Franconi, Gabriel Kuper, Andrei Lopatenko, Ilya Zaihrayeu
(2004). The coDB Robust Peer-to-Peer Database System. Proc. of
the 2nd Workshop on Semantics in Peer-to-Peer and Grid
Computing (SemPGrid'04), 2004
• JXTA project, see http://www.jxta.org
Announcement
Submission deadline: 30 June, 2004
www.p2pkm.org
Thank you