A GRID-distributed XML database Giorgio Ghelli Pisa University

Download Report

Transcript A GRID-distributed XML database Giorgio Ghelli Pisa University

A GRIDdistributed XML
database
Giorgio Ghelli
Pisa University
The project
• (Small) part of GRID.it
• Born very humbly: “let us generalize
GRIS-GIIS structure with XML data
model”
• Now looking a bit like P2P XML DB
• Still in native phase
• Feedback looked for!
GALT 03, Edinburgh
GRID XML Database
2
The vision
• A system where everybody builds its
repository, decides it will be part of a
community, connects to the GRID,
and it works
• Canonical application: resource
description and discovery
• Challenges:
– No administrative burden
– Dynamicity - autonomy
GALT 03, Edinburgh
GRID XML Database
3
Assumptions
• Each piece of data belongs to a node
(may be replicated)
• Nodes come and go
• Node adherence to a known schema
is good enough
• Nodes are not malicious
• XML as a data model, subset of
XQuery as the language
GALT 03, Edinburgh
GRID XML Database
4
Aims
• High node autonomy (but for the
protocol)
• No administration
• Scalability
• Resilience
GALT 03, Edinburgh
GRID XML Database
5
The general idea
• An overlay network with a dynamic
hierarchical structure (peers and
super-peers)
• A peer receives a query, asks an
access plan to a super-peer, executes
the access plan
• No answer from a node = empty
query result from that node
• (Update: local)
GALT 03, Edinburgh
GRID XML Database
6
The challenge
• Query routing (with no central
schema)
– Broadcast: too many messages
– Sequential scan: too much time
– D-hash: we prefer data to be where it
belongs
• Peer clustering
GALT 03, Edinburgh
GRID XML Database
7
We ignore, or postpone
• Forever:
– Schema integration
• For a while:
– Replication
– Security
GALT 03, Edinburgh
GRID XML Database
8
Query routing
• Every node manages and publishes a
level-1 schema (synthetic representation
of a superset of its data: a type)
• Super-peers manage:
– A copy of the level-1 schemas of their subpeers
– A summary level-2 schema
• Super-peers use level-i schemas to decide
who is involved in a query
GALT 03, Edinburgh
GRID XML Database
9
Issues
• Schema formalism
• Schema management
• Super-peers communication protocol
GALT 03, Edinburgh
GRID XML Database
10
Schema formalism
• XDuce like, with intervals:
T ::=
[v1…v2]
T,T
l[T]
T or T
X
(guarded by l[…])
T*
• Equivalent to unranked tree automata
• Subtyping / intersection emptyness are
decidable
GALT 03, Edinburgh
GRID XML Database
11
Schema management
• Level-1 schemas either declared or
inferred from data
• Level-2 schemas have to be
synthesized:
– Take the union of level-1 schemas
– Simplify the union
– Trade off between precision and size?
• Management of schema freshness
GALT 03, Edinburgh
GRID XML Database
12
Super-peers communication
protocol
•?
GALT 03, Edinburgh
GRID XML Database
13
Related work
• P2P systems: we assume each piece
of data lives in a fixed node
• Distributed DB / OGSA-DQP: we do
not want to assume that a node
knows all the schema
GALT 03, Edinburgh
GRID XML Database
14
Conclusions
• Are we tackling a meaningful
problem?
• So much work is still ahead…
GALT 03, Edinburgh
GRID XML Database
15