What is P2P?
Download
Report
Transcript What is P2P?
Digital Enterprise Research Institute
Introduction to Peer-to-Peer
Networks
Manfred Hauswirth, Marcel Karnstedt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
www.deri.ie
Goals of the Tutorial
Digital Enterprise Research Institute
Position the P2P paradigm
in the design space of
distributed systems
Get an overview of P2P
systems and the underlying
concepts
Understand the problem of
decentralized data
management in P2P
systems
www.deri.ie
What is P2P?
Digital Enterprise Research Institute
Clay Shirkey (The Accelerator Group):
“Peer-to-peer is a class of applications that take advantage of
resources—storage, cycles, content, human presence—
available at the edges of the Internet. Because accessing
these decentralized resources means operating in an
environment of unstable connectivity and unpredictable IP
addresses, peer-to-peer nodes must operate outside the DNS
and have significant or total autonomy of central servers.”
P2P “litmus test:”
– Does it allow for variable connectivity and temporary network
addresses?
– Does it give the nodes at the edges of the network significant
autonomy?
www.deri.ie
P2P ~ an application-level Internet on top of the Internet
P2P in a historical Context
Digital Enterprise Research Institute
www.deri.ie
The original Internet was designed as a P2P system
any 2 computers could send packets to each other
– no firewalls / no network address translation
– no asymmetric connections (V.90, ADSL, cable, etc.)
the back-then “killer apps” FTP and telnet are C/S but anyone
could telnet/FTP anyone else
servers acted as clients and vice versa
cooperation was a central goal and “value”: no spam or
exhaustive bandwidth consumption
Typical examples of “old-fashioned P2P”:
Usenet News
DNS
The emergence of P2P can be seen as a renaissance
of the original Internet model
What is P2P?
Digital Enterprise Research Institute
Every participating node acts as both
a client and a server (“servent”)
Every node “pays” its participation by
providing access to (some of) its
resources
Properties:
no central coordination
no central database
no peer has a global view of the
system
global behavior emerges from local
interactions
all existing data and services are
accessible from any peer
peers are autonomous
peers and connections are unreliable
www.deri.ie
Where is P2P – System layers ?
Digital Enterprise Research Institute
Users
uses
E-commerce systems can be P2P
or centralized
QoS
Application
Information management
User
Commerce and society is P2P
Application layer
www.deri.ie
Directories and databases can be
P2P or centralized
exploits
Information
Management
Networks are P2P
QoS
Internet
exploits
Qos
Network
Types of P2P Systems
Digital Enterprise Research Institute
E-commerce systems
File sharing systems
Napster, Gnutella, Freenet, …
Distributed Databases
eBay, B2B market places, B2B integration servers, …
Mariposa [Stonebraker96], …
Networks
Arpanet
Mobile ad-hoc networks
www.deri.ie
How much P2P is involved?
Digital Enterprise Research Institute
www.deri.ie
P2P user
interaction
P2P application P2P
information
management
eBay
yes
no
no
Napster
yes
yes
no
Gnutella,
Freenet
yes
yes
yes
Related Approaches
Digital Enterprise Research Institute
Related distributed information system approaches:
Event-based systems
Push systems
Mobile agents
Distributed databases
www.deri.ie
Event-based (publish/subscribe)
Digital Enterprise Research Institute
www.deri.ie
System model
Components (peers) interact by generating and receiving events
Components declare interest in receiving specific (patterns of)
events and are notified upon their occurrence
Supports a highly flexible interaction between loosely-coupled
components
XY
Subscribe to
X followed by Y
Y
X
Event-based vs. Peer-to-Peer
Digital Enterprise Research Institute
Common properties:
symmetric communication style
dynamic binding between producers and consumers
Subscription to events ~ “passive” queries
EB: notification
P2P: active discovery
Subscription language supports more sophisticated
queries and pattern matching (event patterns with
time dependencies)
Event-based systems typically have a specialized
event distribution infrastructure
EB: 2 node types, P2P: 1 node type
EB infrastructure must be deployed
www.deri.ie
Push Systems
Digital Enterprise Research Institute
A set of designated
broadcasters offer information
that is pre-grouped in channels
(weather, news, etc.)
Receivers subscribe to channels
of their interest and receive
channel information as it is
being “broadcast” (timely
distribution)
Receivers may have to pay prior
to receiving the information
(pay-per-view, flat fee, etc.)
Pull push
www.deri.ie
Push Systems vs. Peer-to-Peer
Digital Enterprise Research Institute
www.deri.ie
Asymmetric communication style (P2P: symmetric)
Focus is on timely data distribution not on discovery
Filtering may be deployed to reduce data
transmission requirements
Subscription to channels is prerequisite
Producer/consumer binding is static
Push systems require a specialized distribution
infrastructure
Push: 3 node types, P2P: 1 node type
Push infrastructure must be deployed
Mobile Agents
Digital Enterprise Research Institute
A mobile agent is a computational entity
that moves around in a network at its
own volition to accomplish a task on
behalf of its owner
can cooperate with other agents
“learns” (“Whom to visit next?”)
Mobility (heterogeneous network!)
Weak: code, data
Strong: code, data, execution Stack
www.deri.ie
Mobile Agents vs. Peer-to-Peer
Digital Enterprise Research Institute
www.deri.ie
Very similar in terms of search and navigation
P2P: the peers propagate requests (search, update)
MA: the nodes propagate the agents
Mobile agent ~ “active” query
Mobile agent systems require a considerably more
sophisticated environment
mobile code support (heavy)
security (protect the receiving node from malicious mobile
agents and vice versa)
In many domains P2P systems can take over
more apt for distributed data management
less requirements (sending code requires much
bandwidth, security, etc.)
Distributed Databases
Digital Enterprise Research Institute
www.deri.ie
Fragmenting large databases (e.g., relational) over
physically distributed nodes
Efficient processing of complex queries (e.g., SQL)
by decomposing them
Efficient update strategies (e.g., lazy vs. eager)
Consistent transactions (e.g., 2 phase commit)
Normally approaches rely on central coordination
Distributed Databases vs. Peer-to-Peer
Digital Enterprise Research Institute
Data distribution is a key issue for P2P systems
Approaches in distributed DB that address
scalability
LH* family of scalable hash index structures [Litwin97]
Snowball: scalable storage system for workstation clusters
[Vingralek98]
Fat-Btree: a scalable B-Tree for parallel DB [Yokota 9]
Approaches in distributed DB that address
autonomy (and scalability)
www.deri.ie
Mariposa: distributed relational DBMS based on an
underlying economic model [Stonebraker96]
P2P Data Management has to address both
scalability and autonomy
Usage Patterns to Position P2P
Digital Enterprise Research Institute
www.deri.ie
Discovering information is the predominant problem
Occasional discovery: search engines
P2P, MA
Notification: event-based systems
push
notification for (correlated) events (event patterns)
E.g., notify me when my stocks drop below a threshold
Systematic discovery: P2P systems
ad hoc requests, irregular
E.g., new town — where is the next car rental?
search engines, MA
find certain type of information on a regular basis
E.g., search for MP3 files of Jethro Tull regularly
Continuous information feed: push systems
event-based
subscription to a certain information type
E.g., sports channel, updates are sent as soon as available
The Interaction Spectrum
Digital Enterprise Research Institute
Event-based systems
Push systems
passive
www.deri.ie
Mobile agents
Peer-to-peer systems
active
Peer-to-Peer vs. C/S and Web
Digital Enterprise Research Institute
www.deri.ie
Client-Server
Peer-to-Peer
Sessionbased
Web-based
tight
loose
very loose
asymmetric
asymmetric
symmetric
Number of
Clients
moderate
(1000)
high
(1,000,000)
high (1,000,000)
Number of
Servers
few (10)
many
(100,000)
none (0)
Coupling
Comm.
Style
Coupling vs. Scalability
Digital Enterprise Research Institute
Coupling
www.deri.ie
session-based
push-based
event-based
web-based
peer-to-peer
Scalability
P2P System Models
Digital Enterprise Research Institute
www.deri.ie
Centralized model
global index held by a central authority
(single point of failure)
direct contact between requestors and providers
Example: Napster
Decentralized model
Examples: Freenet, Gnutella
no global index, no central coordination, global behavior emerges
from local interactions, etc.
direct contact between requestors and providers (Gnutella) or
mediated by a chain of intermediaries (Freenet)
Hierarchical model
introduction of “super-peers”
mix of centralized and decentralized model
Centralized Information Systems
Digital Enterprise Research Institute
Web search engine
www.deri.ie
Client
Global scale application
Client
Example: Google
150 Mio searches/day
1-2 Terabytes of data
Client
(April 2001)
Result
Find
home page of Karl Aberer …
"aberer"
Client
Client
1
2
Google
Server
Client
Client
Strengths
Global ranking
Fast response time
Weaknesses
Client
Client
Client
Google: 15000 servers
Infrastructure, administration, cost
A new company for every global application ?
(Semi-)Decentralized Information Systems
Digital Enterprise Research Institute
P2P Music file sharing
www.deri.ie
Global scale application
Peer
Example: Napster
1.57 Mio. Users
10 TeraByte of data
(2 Mio songs, 220 songs per user)
(February 2001)
Find
Request
and transfer
file
<title> "brick
in the wall"
Result
f.mp3
<artist>
you
find"pink
f.mp3floyd"
at peer x
from
peer
X
directly
<size> "1 MB"
schema
<category> "rock"
PeerX
Peer
3
Peer
Peer
1
2
Napster
Server
Peer
Peer
Peer
Peer
Peer
Napster: 100 servers
Lessons Learned from Napster
Digital Enterprise Research Institute
www.deri.ie
Strengths: Resource Sharing
Every node “pays” its participation by providing access to its resources
– physical resources (disk, network), knowledge (annotations), ownership (files)
Every participating node acts as both a client and a server (“servent”): P2P
global information system without huge investment
decentralization of cost and administration = avoiding resource bottlenecks
Weaknesses: Centralization
server is single point of failure
unique entity required for controlling the system = design bottleneck
copying copyrighted material made Napster target of legal attack
increasing degree of resource sharing and decentralization
Centralized
System
Decentralized
System
Fully Decentralized Information Systems
Digital Enterprise Research Institute
P2P file sharing
www.deri.ie
•
Strengths
Global scale application
Example: Gnutella
– Good response time, scalable
– No infrastructure, no administration
– No single point of failure
40.000 nodes, 3 Mio files • Weaknesses
(August 2000)
– High network traffic
– No structured search
– Free-riding
I have
Find
"brick_in_the_wall.mp3"
"brick in the wall"
….
Self-organizing System
Gnutella: no servers
Self-Organization
Digital Enterprise Research Institute
www.deri.ie
Self-organized systems well known from physics,
biology, cybernetics
distribution of control ( = decentralization = symmetry in
roles = P2P)
local interactions, information and decisions
emergence of global structures
failure resilience
P2P Architectures
Digital Enterprise Research Institute
www.deri.ie
Principle of self-organization can be applied at
different system layers
Networking
Layer
Internet
Routing
TCP/IP,
DNS
Data Access
Layer
Overlay
Networks
Resource
Location
Gnutella,
FreeNet
Service Layer
P2P
applications
Messaging,
Distributed
Processing
Napster,
Seti,
Groove
User Layer
User
Communities
Collaboration
eBay, Ciao
Original Internet designed as decentralized system:
P2P overlay networks ~ application-level Internet on top of
the Internet
support application-specific addresses
Resource Location in P2P Systems
Digital Enterprise Research Institute
www.deri.ie
Problem: Peers need to locate distributed information
Peers with address p store data items d that are identified by a key kd
Given a key kd (or a predicate on kd) locate a peer that stores d,
i.e. locate the index information (kd, p)
Thus, the data we have to manage consists of the key-value pairs (kd, p)
Can such a distributed database be maintained and accessed by a
set of peers without central control ?
P2
P3
P1
P8
kd="jingle-bells"
p="P8"
d="jingle-bells.mp3""
kd ="jingle" ?
P4
("jingle",P8)
P7
P5
P6
Resource Location Problem
Digital Enterprise Research Institute
Operations
search for a key at a peer: p->search(k)
update a key at a peer: p->update(k,p')
peers joining and leaving the network: p->join(p’)
Performance Criteria (for search)
search latency:
e.g. searchtime(query) Log(size(database))
message bandwidth, e.g. messages(query) Log(size(database))
messages(update) Log(size(database))
storage space used, e.g. storagespace(peer) Log(size(database))
resilience to failures (network, peers)
Qualitative Criteria
complex search predicates: equality, prefix, containment, similarity
search
use of global knowledge
peer autonomy
peer anonymity and trust
security (e.g. denial of service attacks)
www.deri.ie
Summary
Digital Enterprise Research Institute
What is a P2P System ?
What is emergence ?
At which layers can the P2P architecture occur ?
How do we define efficiency for a P2P resource
location system ?
www.deri.ie
Unstructured P2P Overlay Networks
Digital Enterprise Research Institute
No index information is used
www.deri.ie
i.e. the information (k, p) is only available directly from p
Simplest approach: Message Flooding (Gossiping)
send query message to C neighbors
messages have limited time-to-live TTL
messages have IDs to eliminate cycles
k="jingle-bells"
Example: C=3, TTL=2
Gnutella
Digital Enterprise Research Institute
Developed in a 14 days “quick hack” by Nullsoft (winamp)
Originally intended for exchange of recipes
Evolution of Gnutella
www.deri.ie
Published under GNU General Public License on the Nullsoft web server
Taken off after a couple of hours by AOL (owner of Nullsoft)
This was enough to “infect” the Internet
Gnutella protocol was reverse engineered from downloaded versions of
the original Gnutella software
Third-party clients were published and Gnutella started to spread
Based on message flooding
Typical values C=4, TTL=7
TTL
One request leads to 2 * C * (C 1)i 26,240 messages
i 0
Hooking up to the Gnutella systems requires that a new peer knows at
least one Gnutella host (gnutellahosts.com:6346; outside the Gnutella
protocol specification)
Neighbors are found using a basic discovery protocol (ping-pong
messages)
Gnutella: Protocol Message Types
Digital Enterprise Research Institute
Type
Ping
Pong
Query
QueryHit
Push
Description
www.deri.ie
Contained Information
Announce availability and probe for None
other servents
Response to a ping
IP address and port# of responding servent;
number and total kb of files shared
Search request
Minimum network bandwidth of responding
servent; search criteria
Returned by servents that have
IP address, port# and network bandwidth of
the requested file
responding servent; number of results and
result set
File download requests for
Servent identifier; index of requested file; IP
servents behind a firewall
address and port to send file to
Gnutella: Meeting Peers (Ping/Pong)
Digital Enterprise Research Institute
www.deri.ie
C
A
B
A’s ping
B’s pong
C’s pong
D’s pong
E’s pong
D
E
Gnutella: Searching (Query/QueryHit/GET)
Digital Enterprise Research Institute
GET X.mp3
www.deri.ie
X.mp3
X.mp3
C
A
B
A’s query (e.g., X.mp3)
C’s query hit
E’s query hit
D
E
X.mp3
Popularity of Queries
[Sripanidkulchai01]
Digital Enterprise Research Institute
Very popular documents are approximately equally popular
Less popular documents follow a Zipf-like distribution (i.e.,
the probability of seeing a query for the ith most popular
query is proportional to 1/(ialpha)
Access frequency of web documents also follows Zipf-like
distributions caching might work for Gnutella
www.deri.ie
Free-riding on Gnutella
[Adar00]
Digital Enterprise Research Institute
www.deri.ie
24 hour sampling period:
70% of Gnutella users share no files
50% of all responses are returned by top 1% of sharing hosts
A social problem not a technical one
Problems:
Degradation of system performance: collapse?
Increase of system vulnerability
“Centralized” (“backbone”) Gnutella copyright issues?
Verified hypotheses:
H1: A significant portion of Gnutella peers are free riders.
H2: Free riders are distributed evenly across domains
H3: Often hosts share files nobody is interested in (are not
downloaded)
Free-riding Statistics - 1
[Adar00]
Digital Enterprise Research Institute
H1: Most Gnutella users are free riders
Of 33,335 hosts:
22,084 (66%) of the peers share no files
24,347 (73%) share ten or less files
Top 1 percent (333) hosts share 37% (1,142,645) of total files shared
Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared
Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared
www.deri.ie
Free-riding Statistics - 2
[Adar00]
Digital Enterprise Research Institute
H3: Many servents share files nobody downloads
Of 11,585 sharing hosts:
Top 1% of sites provide nearly 47% of all answers
Top 25% of sites provide 98% of all answers
7,349 (63%) never provide a query response
www.deri.ie
Topology of Gnutella
[Jovanovic01]
Digital Enterprise Research Institute
Small-world properties verified (“find everything close by”)
Backbone + outskirts
www.deri.ie
Gnutella Backbone
Digital Enterprise Research Institute
[Jovanovic01]
www.deri.ie
Categories of Queries
Digital Enterprise Research Institute
Categorized top 20 queries
[Sripanidkulchai01]
www.deri.ie
Caching in Gnutella
[Sripanidkulchai01]
Digital Enterprise Research Institute
Average bandwidth consumption in tests: 3.5Mbps
Best case: trace 2 (73% hit rate = 3.7 times traffic
reduction)
www.deri.ie
Gnutella: Bandwidth Barriers
Digital Enterprise Research Institute
www.deri.ie
Clip2 measured Gnutella over 1 month:
typical query is 560 bits long (including TCP/IP headers)
25% of the traffic are queries, 50% pings, 25% other
on average each peer seems to have 3 other peers actively
connected
Clip2 found a scalability barrier with substantial
performance degradation if queries/sec > 10:
10 queries/sec
* 560 bits/query
*
4 (to account for the other 3 quarters of message traffic)
*
3 simultaneous connections
67,200 bps
10 queries/sec maximum in the presence of many dialup users
won’t improve (more bandwidth - larger files)
Gnutella: Summary
Digital Enterprise Research Institute
www.deri.ie
Completely decentralized
Hit rates are high
High fault tolerance
Adopts well and dynamically to changing peer populations
Protocol causes high network traffic (e.g., 3.5Mbps). For example:
4 connections C / peer, TTL = 7
1 ping packet can cause 2 * i 0 C * (C 1)i 26,240 packets
TTL
No estimates on the duration of queries can be given
No probability for successful queries can be given
Topology is unknown algorithms cannot exploit it
Free riding is a problem
Reputation of peers is not addressed
Simple, robust, and scalable (at the moment)
Modern Gnutella
Digital Enterprise Research Institute
Lots of improvements
Hybrid Super-Peer architecture
“Gnutella + DHT”
www.deri.ie
Improvements of Message Flooding
Digital Enterprise Research Institute
www.deri.ie
Expanding Ring
start search with small TTL (e.g. TTL = 1)
if no success iteratively increase TTL (e.g. TTL = TTL +2)
k-Random Walkers
forward query to one randomly chosen neighbor only, with large
TTL
start k random walkers
random walker periodically checks with requester whether to
continue
Discussion Unstructured Networks
Digital Enterprise Research Institute
www.deri.ie
Performance
Search latency: low (graph properties)
Message Bandwidth: high
– improvements through random walkers, but essentially the
whole network needs to be explored
Storage cost: low (only local neighborhood)
Update and maintenance cost: low (only local updates)
Resilience to failures good: multiple paths are explored
and data is replicated
Qualitative Criteria
search predicates: very flexible, any predicate is possible
global knowledge: none required
peer autonomy: high
Summary
Digital Enterprise Research Institute
www.deri.ie
How are unstructured P2P networks characterized ?
What is the purpose of the ping/pong messages in
Gnutella ?
Why is search latency in Gnutella low ?
Which are methods to reduce message bandwidth
in unstructured networks ?
Hierarchical P2P Overlay Networks
Digital Enterprise Research Institute
www.deri.ie
Servers provide index information, i.e. the
information (k, p) is available from dedicated
servers
Simplest Approach
one central server
user register files
service (file exchange) is organized
as P2P architecture
index server
k="jingle-bells"
Napster
Digital Enterprise Research Institute
www.deri.ie
Central (virtual) database which holds an index of offered
MP3/WMA files
Clients connect to this server, identify themselves (account)
and send a list of MP3/WMA files they are sharing (C/S)
Other clients can search the index and learn from which
clients they can retrieve the file (P2P)
Additional services at server (chat etc.)
Napster Server
register
(user, files)
A
“A has X.mp3”
Download X.mp3
B
Superpeer Networks
Digital Enterprise Research Institute
Improvement of Central Index Server (Morpheus, Kaaza)
multiple index servers build a P2P network
clients are associated with one (or more) superpeers
superpeers use message flooding to forward search requests
Experiences
redundant superpeers
are good
superpeers should have
high outdegree (>20)
TTL should be minimized
www.deri.ie
Discussion
Digital Enterprise Research Institute
www.deri.ie
Performance
Search latency: very low (index)
Message Bandwidth: low
– with superpeers flooding occurs, but the number of
superpeers is comparatively small
Storage cost: low at client, high at index server
Update cost: low (no replication)
Resilience to failures: bad (system has single-point of
failure)
Qualitative Criteria
search predicates: very flexible, any predicate is possible
global knowledge: server
peer autonomy: low
Summary
Digital Enterprise Research Institute
www.deri.ie
Which are the two levels of P2P networks in
superpeer networks, and to which functional layers
are they related ?
Which problem of distribution is avoided in
superpeer networks and addressed in structured
network ? What is the impact on the relation
between nodes and functional layers ?
Structured P2P Overlay Networks
Digital Enterprise Research Institute
Unstructured overlay networks – what we learned
simplicity (simple protocol)
robustness (almost impossible to “kill” – no central
authority)
Performance
search latency O(log n), n number of peers
update and maintenance cost low
Drawbacks
tremendous bandwidth consumption for search
free riding
Can we do better?
www.deri.ie
Efficient Resource Location
Digital Enterprise Research Institute
www.deri.ie
FULL REPLICATION
update cost
high
low
low
search cost
low
STRUCTURED P2P OVERLAY
NETWORKS
(e.g. prefix routing)
high
UNSTRUCTURED P2P
OVERLAY NETWORKS
(e.g. Gnutella)
high
SERVER
(e.g. Napster)
maximal bandwidth
Distribution of Index Information
Digital Enterprise Research Institute
www.deri.ie
Goal: provide efficient search using few messages without using
designated servers
Easy: distribution of index information over all peers, i.e. every peer
maintains and provides part of the index information (k, p)
Difficult: distributing the data access structure to support efficient
search
Search starts here
server
Where to start the search?
?
data access
structure
index information I
I1
I2
I3
I4
peers (storing data and index information)
peers (storing data)
Approaches
Digital Enterprise Research Institute
www.deri.ie
Different strategies
P-Grid: distributing a binary search tree
Chord: constructing a distributed hash table
CAN: Routing in a d-dimensional space
Freenet: caching index information along search paths
Commonalities
each peer maintains a small part of the index information
(routing table)
searches performed by directed message forwarding
Differences
performance and qualitative criteria
P-Grid
Digital Enterprise Research Institute
www.deri.ie
Search tree (prefix tree)
???
extra
data
?
101
0??
00?
000
001
1??
01?
010
011
N objects log2(N) steps
10?
100
?
101
!
101
101
?
101
11?
110
111
Scalable data access structures
Digital Enterprise Research Institute
Assume number of data objects >> storage of one
node
www.deri.ie
Distributed storage
Given a data access structure
Size of data access structure = number of data objects
Size of data access structure >> storage of one node
Problem: where to store?
Non-scalable Distribution of Search Tree
Digital Enterprise Research Institute
•
www.deri.ie
Distribute search tree over peers
bottleneck
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable Distribution of Search Tree
Digital Enterprise Research Institute
www.deri.ie
"Napster"
bottleneck
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable data access structures
Digital Enterprise Research Institute
www.deri.ie
Associate each peer with a complete path
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable data access structures
Digital Enterprise Research Institute
www.deri.ie
???
1??
peer 1 peer 2
knows more about
this part of the tree
10?
100
101
peer 3
peer 4
knows more about
this part of the tree
The result is P-Grid
Digital Enterprise Research Institute
www.deri.ie
101
?
Peers cooperate in search
???
101
?
1??
peer 1 peer 2
???
peer 1 peer 2
101
?
101 1??
?
10?
100
101
peer 3
Message
to peer 3
101 ?
peer 3
11?
110
111
peer 4
peer 4
! 101
Construction
Digital Enterprise Research Institute
www.deri.ie
Splitting Approach (P-Grid)
peers meet and decide whether to extend search tree by splitting
the data space
peers can perform load balancing considering their storage load
networks with different origins can merge, like Gnutella, Freenet
(loose coupling)
Node Insertion Approach (Chord, CAN, …)
peers determine their "leaf position" based on their IP address
nodes route from a gateway node to their node-id to populate
the routing table
network has to start from single origin (strong coupling)
Replication of data items and routing table entries is used to
increase failure resilience
P-Grid Discussion
Digital Enterprise Research Institute
Performance
Search latency: O(log n) (with high probability, provable)
Message Bandwidth: O(log n) (selective routing)
Storage cost: O(log n) (routing table)
Update cost: low (like search)
Qualitative Criteria
search predicates: prefix searches
global knowledge: key hashing
peer autonomy: peers can locally decide on their role
(splitting decision)
www.deri.ie
DHT example: Chord
Digital Enterprise Research Institute
Hashing of search keys AND peer addresses on binary keys of
length m
www.deri.ie
e.g. m=8, key("jingle-bells.mp3")=17, key(196.178.0.1)=3
Data keys are stored at next larger node key
peer with hashed identifier p,
data with hashed identifier k, then
k ] predecessor(p), p ]
p
k
predecessor
m=8
stored
32 keys
at
p2
p3
Search possibilities
1. every peer knows every other
O(n) routing table size
2. peers know successor
O(n) search cost
Routing Tables
Digital Enterprise Research Institute
www.deri.ie
Every peer knows m peers with exponentially
increasing distance
p p+1
p+2
Each peer p stores a routing table
First peer with hashed identifier si such that
si =successor(p+2i-1) for i=1,..,m
We write also si = finger(i, p)
p+4
s1, s2, s3
s5
p2
p+8
s4
p3
p4
p+16
i
si
1
p2
2
p2
3
p2
4
p3
5
p4
Search
O(log n) routing table size
Search
Digital Enterprise Research Institute
www.deri.ie
search(p, k)
find in routing table largest (i, p*) such that p* [p,k[
/* largest peer key smaller than the searched data key */
if such a p* exists then search(p*, k)
else return (successor(p)) // found
p p+1
p+2
p+4
s1, s2, s3
s5
k2
p2
p+8
s4
k1
p3
p4
p+16
Search
O(log n) search cost
RT with exp. increasing
distance O(log n) with high
probability
Node Insertion
Digital Enterprise Research Institute
www.deri.ie
New node q joining the network
q asks existing node p to find predecessor and fingers
cost: O(log2 n)
p p+1
p+2
q
p+4
p2
p+8
p3
p4
p+16
routing table
of p
routing table
of q
i
si
i
si
1
q
1
p2
2
q
2
p2
3
p2
3
p3
4
p3
4
p3
5
p4
5
p4
Load Balancing in Chord
Digital Enterprise Research Institute
Network size n=10^4
5 10^5 keys
uniform data distribution
50 keys per node?
NO, as IP addresses do
not map uniformly into
data key space.
www.deri.ie
Length of Search Paths
Digital Enterprise Research Institute
Network size n=2^12
100 2^12 keys
Path length ½ Log2(n)
RTs can be seen
as an embedding
of search trees
into the network
and thus search
starts at a randomly
selected tree depth
www.deri.ie
Chord Discussion
Digital Enterprise Research Institute
Performance
Search: like P-Grid
Node join/leave cost: O(log2 n)
Resilience to failures: replication to successor nodes
Qualitative Criteria
search predicates: equality of keys only
global knowledge: key hashing, network origin
peer autonomy: nodes have by virtue of their address a
specific role in the network
www.deri.ie
Topological Routing (CAN)
Digital Enterprise Research Institute
www.deri.ie
Based on hashing of keys into a d-dimensional space (a torus)
Each peer is responsible for keys of a subvolume of the space (a zone)
Each peer stores the adresses of peers responsible for the neighboring
zones for routing
Search requests are greedily forwarded to the peers in the closest zones
Assignment of peers to zones depends on a random selection made
by the peer
Network Search and Join
Digital Enterprise Research Institute
www.deri.ie
Node 7 joins the network by choosing a coordinate in the volume of 1
=> O(d) updates or RTs
CAN Refinements
Digital Enterprise Research Institute
www.deri.ie
Multiple Realities
We can have r different coordinate spaces
Nodes hold a zone in each of them
Creates r replicas of the (key, value) pairs
Increases robustness
Reduces path length as search can be continued in the
reality where the target is closest
Overloading zones
Different peers are responsible for the same zone
Splits are only performed if a maximum occupancy (e.g. 4)
is reached
Nodes know all other nodes in the same zone
But only one of the neighbors
CAN Path Length
Digital Enterprise Research Institute
www.deri.ie
CAN Discussion
Digital Enterprise Research Institute
Performance
www.deri.ie
Search latency: O(d n1/d), depends on choice of d (with high
probability, provable)
Message Bandwidth: O(d n1/d), (selective routing)
Storage cost: O(d) (routing table)
Update cost: low (like search)
Node join/leave cost: O(d n1/d)
Resilience to failures: realities and overloading
Qualitative Criteria
search predicates: spatial distance of multidimensional keys
global knowledge: key hashing, network origin
peer autonomy: nodes can decide on their position in the key
space
Dynamical Clustering (Freenet)
Digital Enterprise Research Institute
www.deri.ie
Freenet Background
P2P system which supports publication, replication, and retrieval
of data
Protects anonymity of authors and readers: infeasible to
determine the origin or destination of data
Nodes are not aware of what they store (keys and files are sent
and stored encrypted)
Uses an adaptive routing and caching strategy
Index information maintained at each peer (limited cache size)
Key
Data
Address
8e47683isdd0932uje89
456r5wero04d903iksd0
f3682jkjdn9ndaqmmxia
wen09hjfdh03uhn4218
712345jb89b8nbopledh
d0ui43203803ujoejqhh
ZT38hwe01h02hdhgdzu
Rhweui12340jhd091230
eqwe1089341ih0zuhge3
erwq038382hjh3728ee7
tcp/125.45.12.56:6474
tcp/67.12.4.65:4711
tcp/127.156.78.20:8811
tcp/78.6.6.7:2544
tcp/40.56.123.234:1111
tcp/128.121.89.12:9991
Freenet Routing
Digital Enterprise Research Institute
www.deri.ie
If a search request arrives
Either the data is in the table
Or the request is forwarded to the addresses with the most
similar keys (lexicographic similarity, edit distance) till an answer
is found or TTL reached (e.g. TTL = 500)
If an answer arrives
The key, address and data of the answer are inserted into the
table
The least recently used key and data is evicted
Quality of routing should improve over time
Node is listed under certain key in routing tables
Therefore gets more requests for similar keys
Therefore tends to store more entries with similar keys
(clustering) when receiving results and caching them
Dynamic replication of data
Freenet Routing
Digital Enterprise Research Institute
www.deri.ie
peer p has k
peer p' has k'
search k
response (k,p)
new link established
search k', k' similar to k
Freenet: Inserting Files
Digital Enterprise Research Institute
www.deri.ie
First a the key of the file is calculated
An insert message with this proposed key and a
hops-to-live value is sent to the neighbor with the
most similar key
Then every peer checks whether the proposed key
is already present in its local store
yes return stored file (original requester must propose
new key)
no route to next peer for further checking (routing uses
the same key similarity measure as searching)
continue until hops-to-live are 0 or failure
Freenet: Evolution of Path Length
Digital Enterprise Research Institute
1000 identical nodes
max 50 data
items/node
max 200
references/node
Initial references:
(i-1, i-2, i+1, i+2) mod n
each time-step:
-
randomly insert
-
TTL=20
every 100 time-steps:
300 requests
(TTL=500) from
random nodes and
measure actual path
length (failure=500).
www.deri.ie
median path length 500 6
Freenet Discussion
Digital Enterprise Research Institute
www.deri.ie
Performance
Search latency: low (small world property)
Message Bandwidth: low (selective routing)
Storage cost: relatively low (experimentally not validated !)
Update cost: low (like search)
– but a bootstrapping phase is required
Resilience to failures: good (high degree of replication of
data and keys)
Qualitative Criteria
search predicates: with encryption only equality of keys
global knowledge: none
peer autonomy: high (with encryption risk of storing
undesired data)
Comparison
Digital Enterprise Research Institute
www.deri.ie
Paradigm
Search Type
Gnutella
Breadth-first
String
search on graph comparison
Freenet
Depth-first
Equality
search on graph
Chord
CAN
P-Grid
Implicit binary
search trees
d-dimensional
space
Binary prefix
trees
Search Cost
(messages)
TTL
2* i 0 C *(C 1)i
O(Log n) ?
Equality
O(Log n)
Equality
O(d n^(1/d))
Prefix
O(Log n)
Small World Graphs
Digital Enterprise Research Institute
Each P2P system can be
interpreted as a directed graph
(overlay network)
peers correspond to nodes
routing table entries as directed
links
Task
Find a decentralized algorithm
(greedy routing) to route a
message from any node A to any
other node B with few hops
compared to the size of the graph
Requires the existence of short
paths in the graph
www.deri.ie
Milgram’s Experiment
Digital Enterprise Research Institute
Finding short chains of acquaintances
linking pairs of people in USA who didn’t
know each other;
Source person in Nebraska
Sends message with first name and location
Target person in Massachusetts.
Average length of the chains that were
completed was between 5 and 6 steps
“Six degrees of separation” principle
BIG QUESTION:
WHY there should be short chains of acquaintances
linking together arbitrary pairs of strangers???
www.deri.ie
Random Graphs
Digital Enterprise Research Institute
www.deri.ie
For many years typical explanation was - random
graphs
Low diameter: expected distance between two nodes is
logkN, where k is the outdegree and N the number of nodes
When pairs or vertices are selected uniformly at random
they are connected by a short path with high probability
But there are some inaccuracies
If A and B have a common friend C it is more likely that they
themselves will be friends! (clustering)
Many real world networks (social networks, biological
networks in nature, artificial networks – power grid, WWW)
exhibit this clustering property
Random networks are NOT clustered.
Clustering
Digital Enterprise Research Institute
Clustering measures the fraction of neighbors of a node that
are connected themselves
Regular Graphs have a high clustering coefficient
but also a high diameter
Random Graphs have a low clustering coefficient
www.deri.ie
but a low diameter
Both models do match the properties expected from real
networks!
Regular Graph (k=4)
Long paths
L ~ n/(2k)
Highly clustered
C~3/4
Random Graph (k=4)
Short path length
L~logkN
Almost no clustering
C~k/n
Small-World Networks
Digital Enterprise Research Institute
www.deri.ie
Random rewiring of regular graph (by Watts and
Strogatz)
With probability p rewire each link in a regular graph to a
randomly selected node
Resulting graph has properties, both of regular and
random graphs
– High clustering and short path length
Freenet has been shown to result in small world graphs
Flashback: Freenet Search Performance
Digital Enterprise Research Institute
www.deri.ie
Modifying routing tables in Freenet through
caching has a "rewiring effect"
Studies show that Freenet graphs have small-world
properties
Explains improving search performance
Regular graph:
n nodes, k nearest neighbors
path length ~ n/2k
4096/16 = 256
Rewired graph (1% of nodes):
path length ~ random graph
clustering ~ regular graph
Small World Graph
Random graph:
path length ~ log (n)/log(k)
~4
Search in Small World Graphs
Digital Enterprise Research Institute
BUT! Watts-Strogatz can provide a model for
the structure of the graph
existence
high
www.deri.ie
of short paths
clustering
It does not explain how the shortest paths
are found
also
Gnutella networks are small-world graphs
why
can search be efficient in Freenet?
P2P Overlay Networks as Graphs
Digital Enterprise Research Institute
Each P2P system can be
interpreted as a directed
graph …
peers correspond to nodes
routing table entries as directed
links
… embedded in some space
P-Grid: interval [0,1]
Chord: ring [0,1)
CAN: d-dimensional torus
Freenet: strings + lexicographical
distance
www.deri.ie
Kleinberg’s Small-World Model
Digital Enterprise Research Institute
www.deri.ie
Kleinberg’s Small-World’s model
Embed the graph into an r-dimensional grid
constant number p of short range links (neighborhood)
q long range links: choose long-range links such that the probability
to have a long range contact is proportional to 1/dr
Importance of r !
Decentralized (greedy) routing performs best iff. r = dimension of
space
r=2
Influence of “r” (1)
Digital Enterprise Research Institute
www.deri.ie
1
•
Each peer u has link to the peer v with probability proportional to d ( u ,v ) r
where d(u,v) is the distance between u and v.
•
Optimal value: r = dim = dimension of the space
•
•
•
If r < dim we tend to choose more far away neighbors (decentralized
algorithm can quickly approach the neighborhood of target, but then slows
down till finally reaches target itself).
If r > dim we tend to choose more close neighbors (algorithm finds quickly
target in it’s neighborhood, but reaches it slowly if it is far away).
When r = 0 – long range contacts are chosen uniformly. Random graph theory
proves that there exist short paths between every pair of vertices, BUT
there is no decentralized algorithm capable finding these paths
Influence of “r” (2)
Digital Enterprise Research Institute
www.deri.ie
Given node u if we can partition the remaining peers into sets A1, A2,
A3, … , AlogN , where Ai, consists of all nodes whose distance from u is
between 2i and 2i+1, i=0..log(N-1).
Then given r = dim each long range contact of u is nearly equally likely to
belong to any of the sets Ai
When q = log N – on average each node will have a link in each set of Ai
A1
A2
A
3
A
4
DHTs and Kleinberg model
Digital Enterprise Research Institute
P-Grid’s
model
Kleinberg’s
model
www.deri.ie
Conclusions from Kleinberg's Model
Digital Enterprise Research Institute
www.deri.ie
With respect to the Watts and Strogatz model
there is no decentralized algorithm capable performing effective search
in the class of SW networks constructed according to Watts and Strogatz
J. Kleinberg presented the infinite family of Small World networks that
generalizes the Watts and Strogatz model and shows that decentralized
search algorithms can find short paths with high probability
there exist only one unique model within that family for which
decentralized algorithms are effective.
With respect to overlay networks
Many of the structured P2P overlay networks are similar to Kleinberg’s
model (e.g. Chord, randomized version, q=log N, r=1)
Unstructured overlay networks also fit into the model (e.g. Gnutella q=5,
r=0)
Some variants of structured P2P overlay networks are having no
neighborhood lattice (e.g. P-Grid, p=0)
Extensions to spaces beyond regular grids are possible (e.g. arbitrary
metric spaces)
Summary
Digital Enterprise Research Institute
www.deri.ie
How can we characterize P2P overlay networks such that we
can study them using graph-theoretic approaches?
What is the main difference between a random graph and a
SW graph?
What is the main difference between the Watts/Strogatz and
the Kleinberg model?
What is the relationship between structured overlay networks
and small world graphs?
What are possible variations of the small world graph model?
Specific problems: Identity management
Digital Enterprise Research Institute
www.deri.ie
Definition:
Consistent mapping of a set of attributes onto an
identifier in a unique, deterministic, and secure way
Identification is an essential building block in
distributed (information) systems
Examples: directory services
DNS: symbolic host names IP address
X.500: distinguished name object (attributes)
UDDI: query web service specification
Identity management issues
Digital Enterprise Research Institute
www.deri.ie
Data management
Uniqueness of identifiers
Centralized vs. distributed data management
– Degree of decentralization (orthogonal to the distribution of
data!)
Update consistency
Security
Access permissions (+ management)
Requires unique identification
Resilience against attacks
Infrastructure
Third party?
Scalability, robustness, 24/7 availability, etc.
Use case: Dynamic IP addresses
Digital Enterprise Research Institute
www.deri.ie
Most computers on the Internet have a dynamic IP
address
Limited number of IP addresses
Dynamic Host Configuration Protocol (lease time)
Host mobility (physical mobility)
Problem for any system that builds a distributed
management structure on top of such networks
Use Case:
Management of dynamic IP addresses in structured
P2P systems (Chord, DKS, Pastry, P-Grid, etc.)
P2P systems and dynamic IP addresses
Digital Enterprise Research Institute
www.deri.ie
Structured P2P systems (Chord, Pastry, P-Grid, DKS,
etc.)
These systems construct a distributed index routing
tables
Dynamic IP addresses routing tables become
inconsistent system can break down
Unstructured (Gnutella) and hierarchical (FastTrack)
systems
Less of a problem
But, they pay with
–
–
–
–
high bandwidth consumption, or
single point of failures, or
high infrastructure costs, or
etc.
Problems to address
Digital Enterprise Research Institute
www.deri.ie
How to find out that an IP address has become invalid?
No response
– Network problem or did the peer get a new address?
Response
– Is it still the same peer? (authenticity, replay, man-in-themiddle attacks)
Frequency of address changes is crucial
Peers can join and leave at any moment
IP address can change at any moment
Security: DOS attacks are very simple
Assume peers report back their new IP address
EvilHacker.org participates in the overlay and thus finds
out IP addresses
EvilHacker.org reports all IP addresses it finds pointing to
random hosts or itself
A scalable and secure infrastructure is required
Problems of current P2P approaches
Digital Enterprise Research Institute
www.deri.ie
Third-party infrastructures are required
Very costly maintenance protocols
Maintenance protocols may compromise structural
properties (e.g., load-balancing)
Previous knowledge is lost (e.g., reputation of the
peer, QoS, etc.)
No current approach addresses security
Only the owner should be allowed to update the mapping
DOS, replay, man-in-the-middle, etc. are not addressed
IPv6 as an alternative
Digital Enterprise Research Institute
www.deri.ie
With IPv6 dynamic addresses or NAT are no longer
necessary
IPv6 address space is ~3,4 * 1038 (or 1030 addresses per
person on the planet)
IPv4 (current) address space is 232
IPsec (included in IPv6)
solves authentication problem
DOS attacks are more difficult
Mobility is addressed
IPv6: home/foreign address
IPv4: mobility extension but not supported on a large scale
Problem: IPv6 has not been deployed yet
DNS as an alternative
Digital Enterprise Research Institute
DNS extensions
Several RFCs extend the original DNS specification so that
DNS could support secure updates
Problems
– Very heavy-weight
– Configuration is very difficult and error-prone
– Not for the “normal user” (as in a P2P system)
www.deri.ie
DynDNS (and similar services)
Hosts maintain a consistent name/address mapping in a
special DNS domain via a special client
Problems
– Centralized scalability
– Service may go out of business
Basic idea
Digital Enterprise Research Institute
DNS
lookup IP
address
www.deri.ie
FQDN IP address
routing based on
FQDN (any overlay)
static
mapping
Index
P-Grid
Data
lookup IP
address
routing based on
logical identifier
P-Grid
logical identifier IP address
DYNAMIC mapping
Informal discussion
Digital Enterprise Research Institute
www.deri.ie
Use unique logical identifiers (UUIDs) instead of physical
identifiers (IP addresses):
Peer identifiers
Routing based on UUIDs
Use the overlay itself to securely maintain mappings between
the logical and the physical identifier
Self-referential approach
Rate of changes < self-healing rate
Dynamic equilibrium
Advantages:
General identification facility disentangling logical identifiers
from network structure
Tracking of chances (for example, reputation)
Maintenance and routing
Digital Enterprise Research Institute
www.deri.ie
Universal Unique Identifier (UUID) are generated
locally
Cryptographically secure hash function global
uniqueness
Index / routing tables: UUID
Peers maintain an up-to-date UUID-IP mappings in P-Grid
Routing
Peers cache known UUID-IP mappings
Mapping exists in cache identity of target peer is
checked before forwarding
No mapping is known query P-Grid for mapping
Security issues
Digital Enterprise Research Institute
Peer generates public/private key
UUID-public key P-Grid
Why is it secure?
www.deri.ie
Data in P-Grid is stored at a number of random replicas
Hard to attack
Request are
– signed (private key) authenticity & access permissions
– time-stamped no replay possible
Quorums are required for each request
Better security than PGP
Quorums
Independent, random paths avoid weakest link problem
Revoke and update of security relevant information is possible
113
Directory maintenance: self-healing
Digital Enterprise Research Institute
www.deri.ie
Repair strategies
Eager repair: repair each stale entry encountered immediately
Lazy repair: repair a routing table when all references at one level
become stale
DNS
lookup IP
address
routing based on
FQDN (any overlay)
lookup IP address
in case of failure
routing based on
logical identifier
maintain logical identifier IP address
mapping: eager or lazy
ID Presently online
query(01*) @ 7
…query(0101) @ 7 (for stale entry 5, cycle -> abort)
ID Presently offnline
…query(1110) @ 7 (for stale entry 14, forward to 12 or 13)
Digital Enterprise
Institute
StoresResearch
mappings
ID
…query(1110) @ 12 (is offline)
of peers
…query(1110) @ 13 (for stale entry 2)
Up-to-date cache ……query(0010) @ 13 (forward to 5)
……query(0010) @ 5 (forward to 7)
1 : 2 ,12
……query(0010) @ 7 (forward to 9)
……query(0010) @ 9 (new entry for 2 found !)
Stale cache
…query(1110) @ 2 (new entry for 14 found !)
query(01*) @ 14 (finally )
0
00
000
1
01
001
10
010
011
100
101
www.deri.ie
11
2
0 : 1,14
10 : 11,13
12
1
1
1 : 12, 13
01 : 5, 10
001: 9,4
7
1
1 : 12, 13
01 : 5,14
001: 9,4
9
2,3
1 : 8,2
01 : 3, 10
000: 1,7
4
2,3
1 : 6,13
01 :10,14
000: 1,7
14
4,5
1 : 2,12
00 : 9,4
011: 3,10
5
4,5
1 : 8, 13
00 : 7,9
011: 3,10
3
6,7
1 : 11,12
00 : 1,9
010: 5,14
10
6,7
1 : 6,8
00 : 1,7
010: 5,14
11
8,9
0 : 4,7
11 : 2,12
101: 8,13
6
8,9
1 : 1,3
11 : 2,12
101: 8,13
13
10,11
0 : 5,9
11 : 2,12
100: 6,11
8
10,11
0 : 4,9
11 : 2,12
100: 6,11
12,13,14
12,13,14
0 : 5,7
10 : 6,13
Eager repair strategy
Digital Enterprise Research Institute
www.deri.ie
System is in a dynamic equilibrium if the rate of changes due to
changing mappings and the rate of repairs is equal
Dynamic equilibrium equation
LHS
Rate at which repair of stale routing entries occur
rup changes per 1-rup queries
Nrec – 1 additional recursive queries
Repair makes sense only if the routing entry to be repaired
corresponds to an online peer
A repair is possible only if recursive query succeeds
RHS
Rate of entries turning stale
rup changes
1-pdyn probability of non-stale references (only these can turn stale)
r references at each peer for each of log 2n levels
Lazy repair strategy
Digital Enterprise Research Institute
www.deri.ie
Repair only if all references of a level are stale
Not all routing entries are treated uniformly
The number of stale entries for each routing level at each peer
defines the state of that level
Markovian model
0 ref
stale
ID
change
1 ref
stale
ID
change
2 ref
stale
ID
change
…
ID
change
r ref
stale
repairs
Dynamic equilibrium equation
inflow = outflow (for each state)
At dynamic equilibrium, the number of routing levels with
given number of stale entries over the whole system should
not change
N.B. We distinguish stale entries from offline peers
Lazy repair: Analytical vs.
simulation results
Digital Enterprise Research Institute
www.deri.ie
H
L
Number of messages vs. rate of change (N=128,256,512,1024,
replication factor is 8)
Msg
80
60
Lazy Rec., Mess. vs. r_up for p_on=1
sim,N=128
ana,N=128
sim,N=256
ana,N=256
sim,N=512
ana,N=512
sim,N=1024
ana,N=1024
n=N 8 ,
rup
40
20
0.025
0.05
0.075
0.1
0.125
0.15
r_up
Effect of pon on stability and
message overhead
Digital Enterprise Research Institute
www.deri.ie
In networks with more online peers the lazy strategy is advantageous
but collapses earlier
Msg
1200
1000
H HLL
Lazy vs. Eager rec. r_up= 0.2, lg_2 n =5
Directory "unstable"
Lazy rec
800
Eager rec
600
400
Directory "stable"
200
0.3
0.4
0.5
0.6
0.7
0.8
0.9
p_on
Summary
Digital Enterprise Research Institute
Decentralized, self-maintaining, light-weight, and
secure directory service
Robust and applicable in unreliable environments
Contributions
www.deri.ie
Logical independence of identity from network properties
General approach for identification
Structural properties are maintained
Existing knowledge is retained
Dynamic resilience of a P2P system under “churn”
GridVine: Peer Data Management
Digital Enterprise Research Institute
www.deri.ie
Searching semantically richer objects
in large scale heterogeneous networks
<xap:CreateDate>2001-12date?
19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
<es:DofCreation> 05/08/2004 </es:DofCreation>
?
?
?
?
?
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
➠ Lack of semantic interoperability
Information Heterogeneity
Digital Enterprise Research Institute
www.deri.ie
Syntactic discrepancies
ImageGUID
cDate
A0657B25
05.08.04
VS
<es:cDate> 05/08/2004 </es:cDate>
Semantic heterogeneity
Extensible standards (XML, RDF, XMP, PSA, WinFS...)
<rdf:Property rdf:ID="width">
<rdfs:label>Width</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
VS
<rdf:Property rdf:ID=“Length-Y">
<rdfs:label>Length-Y</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
100s of evolving schemas for one particular domain (e.g.,
protein information, picture metadata)
➠
Shared representation is not enough
Integrating Data in Distributed
Databases
Digital Enterprise Research Institute
The Wrapper-Mediator architecture
www.deri.ie
Date
Date
Integrating Data in the new Web
Ecology
Digital Enterprise Research Institute
Mediated Architectures
Large Scale Information Systems
(e.g., WWW))
Scale
Number of sources < 100
Number of sources > 1000
Uncertainty
Consistent Data
Uncertain Data
- Coordination
- Manually curated data
- Autonomy
- Semi-automatic creation of data
Schemas created by
administrators
Schemas created by end users
Relatively stable set of sources
Network churn
- stable mediator
- node failures
Sources known a priori
Unknown sources
Relational Data
Structured Schemas
Semi-structured data
Schematas
- Integrity constraints
- Few integrity constraints
Structured Queries
Simple S-P Queries
Dynamicity
Expressiveness
www.deri.ie
Peer Data Management Systems
Digital Enterprise Research Institute
www.deri.ie
date?
<es:cDate> 05/08/2004 </es:cDate>
<xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
es:cDate xap:CreateDate
<myRDF:Date> Jan 1, 2005
</myRDF:Date>
Pairwise mappings
Peer Data Management Systems (PDMS)
Local mappings overcome global heterogeneity
Iterative query reformulation
3-Tier Network
Digital Enterprise Research Institute
www.deri.ie
Semantic
Mediation
Layer
Overlay
Layer
Internet
Layer
Jupp /
P-Grid
DHTs
Data-Centric P2P Systems
Digital Enterprise Research Institute
Piazza / Hyperion
More expressive mapping languages
– LAV-style query reformulation in P2P settings?
Network-intensive
– Large-scale deployment?
Perfect mappings
PIER
Scales a relational engine on top of a DHT
Fixed schema
RDFPeers
Indexes RDF triples in a DHT
No schemas
www.deri.ie
Credits
Digital Enterprise Research Institute
Karl Aberer
Philippe Cudre-Mauroux
Anwitaman Datta
Roman Schmidt
www.deri.ie
Are you still awake?
Digital Enterprise Research Institute
www.deri.ie
Digital Enterprise Research Institute
Introduction to Peer-to-Peer
Networks – Part 2
Manfred Hauswirth, Marcel Karnstedt
Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
133 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
134 of XYZ
www.deri.ie
Examples
Digital Enterprise Research Institute
135
www.deri.ie
Huge heterogeneous data sets, collaboration,
dynamic, structured, deep, linked, …
Clique project:
Integrated structrured storage
Data pre-processing, integration, restructuring, indexing
Data access API and query processor for complex queries
Marcel Karnstedt
IFIP Database Meeting
Nicosia, Cyprus, 2009 135
Public Data Management
Digital Enterprise Research Institute
www.deri.ie
....Semantic & social Web, encyclopedias,
recommender systems ... “The world is a database“
Datasets, which are
Maintained by large communities in a distributed way
Of public interest
„Homogenized“ database, extensible and flexible,
distributed, scalable, structured data and queries
136 of XYZ
Main Challenges
Digital Enterprise Research Institute
Data management
Scalability and robustness
Security, trust, fairness
Guarantees, consistency, integrity
CAP-Theorem [Gilbert et al. 2002], ACID vs. BASE
Query expressiveness
DB-like queries with advanced functionality
Support of IR queries and similarity is mandatory
Schema-unaware queries and/or queries on schema
Efficient processing
Efficient query operators
Cost awareness in changing situations
137 of XYZ
www.deri.ie
Approaches
Digital Enterprise Research Institute
www.deri.ie
Who pays the load?
Who owns the data?
views over 100.000
data sources?
Do we trust them?
Sindice, YARS
Jena, Oracle
SW-Store
138 of XYZ
Mediator
Efficient
query processing?
PIER, PeerDB
AmbientDB
RDFPeers
UniStore
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
139 of XYZ
www.deri.ie
Influences
Digital Enterprise Research Institute
www.deri.ie
Robustness,
self-organization,
scalability
Efficient
lookups
P2P paradigm
DHTs &
SDDS
Index structures
Sensor
networks
Data streams
140 of XYZ
PDMS
Distributed DBS
Transparency,
query processing
The Big Picture
Digital Enterprise Research Institute
www.deri.ie
DB
Who wrote an article for cool
movies?
Wikipedia(article,author)
“Pulp Fiction”,”MK”
Del.icio.us(bookmark,tag,creator)
“http://…pfiction”,”cool”,”MKa”
DBPedia(link,wikilink,category)
“Pulp Fictoon”,”Q. Tarantino”,”movie”
141 of XYZ
Layers of Processing
Digital Enterprise Research Institute
www.deri.ie
Scheduling,
Adaptation,
Costs
Processing Strategies
Similarity /
Approximate
Operators
Query Operators
Multicast,
Aggregation,
Range
Routing
142 of XYZ
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
143 of XYZ
www.deri.ie
Universal Relation Model
Digital Enterprise Research Institute
www.deri.ie
Since the eighties: Model for simplified retrieval in
(relational) databases
Universal relation containing all attributes
Simplifies navigation over multiple relations during
query formulation
SW-Store
A1
144 of XYZ
A2
A3
B1
B2
C1
...
Triple Store
Digital Enterprise Research Institute
www.deri.ie
Universal relation model
Storing each tuple as a set of triples (oid, attribute,
value)
Similar to RDF: subject, predicate, object
OID Car
Mileage HP
Price
232
34.000
28.000
Volvo V70
180
232
Car
Volvo V70
232
Mileage
34.000
232
HP
180
232
Price
28.000
145 of XYZ
SW-Store
RDFPeers
Sindice, YARS
...
Extensible
Flexible
Self-descriptive
No need for
representing null values
Indexing
Digital Enterprise Research Institute
www.deri.ie
Indexing of attributes = key for Hashing
Which attributes? All!
For tuple (oid, v1, v2, ...) of R(OID, A1, A2, ...)
232
Car
Volvo V70
232
Mileage
34.000
232
HP
180
232
Price
28.000
h(oid)
h(A1 || v1)
h(A2 || v2)
...
for object lookup
for Ai ≥ v
(prefix search)
...trade-off storage vs. performance
146 of XYZ
YARS
Hexastore
P-Grid : Range Queries
Digital Enterprise Research Institute
[Datta et al. 2005]
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
148 of XYZ
www.deri.ie
VQL
Digital Enterprise Research Institute
www.deri.ie
Query language VQL
Inspired by RDF query language SPARQL
Conjunctive queries
Enhanced by advanced operators
Operations both on instance and schema level
Basic query form
SELECT ?oid, ?val
WHERE { ?oid price ?val }
ORDER BY ...
LIMIT ...
SPARQL
149 of XYZ
Similarity Queries
Digital Enterprise Research Institute
www.deri.ie
WHERE { ?o attrib ?value
FILTER (edist(?value, v) < 2) }
Numerical similarity: value distance
String similarity:
Edit distance using (positional) q-Grams
[Gravano et al. 2001, Schallehn et al. 2004]
Requires additional key-value pairs in P-Grid
For each triple (oid, A, v)
–
–
LSH forest
SWAM
h(q-grami(A)) oid
h(q-grami(v)) oid
Approach for instance and schema level
150 of XYZ
String Similarity
Digital Enterprise Research Institute
www.deri.ie
0
0
0
1
...
1
...
...
00…0
1
11…1
“All values in distance d=1…”
Query range for attribute vs. query d+1 q-grams
[NetDB06]
151 of XYZ
More Operators
Digital Enterprise Research Institute
Similarity joins [NetDB06]
WHERE { ?o1 attr1 ?v1 . ?o2 attr2 ?v2
FILTER (edist(?v1, ?v2) < k) }
Ranking queries: top-k, skyline [DBRank07]
WHERE { ?o attr ?v }
ORDER BY ?v LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN “A String“ LIMIT k
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
152 of XYZ
www.deri.ie
String Similarity Joins
Digital Enterprise Research Institute
www.deri.ie
Doubled parallel
Doubled sequential
Cloud services
Parallel and sequential
153 of XYZ
Sequential and parallel
Skyline Queries
Digital Enterprise Research Institute
www.deri.ie
Objects that are not “dominated“ by other objects
Scoring function on multiple attributes, no weighting
dominated objects
price
mileage
154 of XYZ
Skylines: Basic Idea
Digital Enterprise Research Institute
“Frame Skyline“ algorithm
over 2 dimensions
Minimum of first
dimension defines
maximum for second
dimension
Minima/Maxima provide
a frame narrowing the
search space
155 of XYZ
www.deri.ie
DSL
Skyframe
Skylines: Processing
Digital Enterprise Research Institute
www.deri.ie
y
...
...
...
Min y
Min x
x
Find minimum in one selective dimension x
2. Use y value of min(x) to limit search range
3. Use range query routing to build local skylines
4. Always ship current skyline with query
5. Determine global skyline at one peer
6. Optionally: distributed range querying “on the
way“
156 of
XYZ to min(x)
1.
Skylines: More Dimensions
Digital Enterprise Research Institute
All projections to 2d
sub-spaces are skyline
candidates
Objects of the searched
frame can dominate
projections
Projections cannot
dominate objects of the
searched frame
157 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
158 of XYZ
www.deri.ie
Query Execution
Digital Enterprise Research Institute
www.deri.ie
Goal: stateless processing „Push“ approach
Messages containing both plan and intermediate results
(based on Mutant Query Plans [Papadimos et al. 2002])
Receiver peer is identified by applying the hash function
Multiple instances of the plan travel trough the network
(A)
p0
(A)
{(A,1),(A,2)}
p1
B
(A)
B
B
p2
{(A,3),(A,4)}
p3
{(A,5),(A,6)}
159 of XYZ
{(A,2)}
PIER, PeerDB
DARQ
B
{(B,2),(C,1),(B,2),(C,4)}
p4
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
{(A,3),(A,4)}
B
p5
{(B,5),(B,6)}
{(A,5)}
B
p0
{(A,5),(B, 5)}
Cost-Based Planning
Digital Enterprise Research Institute
www.deri.ie
[NetDB06, P2P06, DBRank07]
Find all values of attribute A in max. distance d=1
0
1
01
001
p1
p5
Query all peers for A in parallel or
sequence
#msgs = ml + |A|-1
p2
0
h(A)
1
Query d+1 q-grams in parallel
01
#msgs = (d+1)*ml
001
p1
p5
p2
h(A#q-gram2)
h(A#q-gram1)
160 of XYZ
ObjectGlobe
DARQ
Guarantees: Completeness
Digital Enterprise Research Institute
Problem:
“Fire and forget“ query strategy doesn‘t guarantee
complete results
Goal:
Allow to estimate result completeness
– For the user (“98% of all possible answers“)
– For blocking query operators (aggregators, ranking-based
operators) in order to guarantee a certain level of
completeness
Idea:
Not feasible on data level, but perhaps on peer level?!
161 of XYZ
www.deri.ie
Completeness Estimation
Digital Enterprise Research Institute
www.deri.ie
Estimation on peer level
Based on routing graphs and routing methods
Support of probabilistic guarantees
Accuracy improved by Milestone messages (MiMes)
[CIKM08, WIDM08]
Seaweed
...
Join(A=B)
Extract(A)sequ
Extract(B)range
Query graph
P1B
P2B
P3B
...
PmB
P1A
P2A
P3A
...
PnA
P0
Routing graph
PxY
162 of XYZ
Routing
level
Routing
point
CERQ: Initial Estimation
Digital Enterprise Research Institute
www.deri.ie
0
00
000
0000
1
01
P1
001
0001
P4
P0
P2
P3
Example: P-Grid range query 00100-1101 at P0
Predict trie on information from:
1) local path: 0001
2) local routing table (at least one node per level/sub-tree)
=> estimates 8 (out of 10)
...the better, the more information is kept in each routing table
of[P2P07]
163
XYZ
CERQ: Estimation Refinement
Digital Enterprise Research Institute
The initial estimate might not be correct
www.deri.ie
Refinement by other (intermediate) peers
Piggy-back information
Query contains estimation of peers in sub-tree
Query replies can contain corrections
0
00
Sub-query P0 -> P3
Sub-tree 001*
Estimate: 3 peers
P3’s routing table
contains peer(s) of
sub-tree 0010*
164 of XYZ
000
0000
P4
1
01
P1
001
0001
P0
P2
P3
CERQ: Further Improvements
Digital Enterprise Research Institute
Use of structural replication
Peers have more than one entry per level
Each entry might have a different path
Every path allows peers to learn more about sub-trees
More entries mean better initial estimates
Use of caching
P2P networks are dynamic
Though the structure is likely to be stable
The learned structure can be cached for later estimates
165 of XYZ
www.deri.ie
CERQ: Other Overlays
Digital Enterprise Research Institute
SkipGraphs
Prefix hash tree (Chord)
Most similar to P-Grid
Routing information of multiple levels
Unknown number of peers in bucket layer
Peers build a tree-hierarchy
Only applicable if number of children is known
CAN and Mercury
Forwarding along neighbors
No estimation can be given
=> The idea can be mapped under certain conditions
166 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
167 of XYZ
www.deri.ie
Representing Mappings
Digital Enterprise Research Institute
www.deri.ie
Simple kind of attribute correspondences
subsumes
equiv
A1
A2
A3
A4
A5
Triple representation
(A4, equiv, A5)
(A3, subsumes, A6)
Extensible to ontologies and views
[Ideas08]
168 of XYZ
A6
PeerDB
SQPeer
Query Expansion
Digital Enterprise Research Institute
www.deri.ie
Unexpanded query
GridVine
PDMS
Map operators added
169 of XYZFirst
mapping
Expanded query
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
170 of XYZ
www.deri.ie
UniStore
Digital Enterprise Research Institute
[ICDE07]
www.deri.ie
Evaluation: Similarity Joins
Digital Enterprise Research Institute
c1:
c2:
c3:
c4:
c5:
seq & par/seq
par & par/seq
seq & par/par
par& par/par
par & local
172 of XYZ
www.deri.ie
Evaluation: CERQ
Digital Enterprise Research Institute
1 reference
references
www.deri.ie
3 references
5
Estimation is correct after a few corrections
More references lead to (as expected)
Better initial estimate
Less corrections
Smaller errors
Better estimates for smaller ranges (q1 and q2)
Replication factor 1 (similar results for factor 2)
173 of XYZ
Evaluation: Completeness
Digital Enterprise Research Institute
Min, max, avg 74 peers
174 of XYZ
Without MiMes
www.deri.ie
Min, max, avg 50 peers
With MiMes
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
A Distributed Universal Storage
Data & Query Model
Operators
Query Engine
Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
175 of XYZ
www.deri.ie
Summary & Outlook
Digital Enterprise Research Institute
www.deri.ie
Web data is huge, heterogeneous, structured,
linked
Modern applications require a universal and flexible
storage
DB-like and RDF-liking
DHTs well-suited for large-scale data management
UniStore as one solution
Robust and scalable, universal and light-weight
Sophisticated query capabilities
Adaptive, cost-based, stateless and parallel QP
Guarantees, semantic layer
…and all on totally decentralised and self-organising P2P!!
Open issues
Privacy & Trust, reputation
176 of
XYZ
Guarantees, consistency, integrity
Acknowledgements
Digital Enterprise Research Institute
www.deri.ie
Kai-Uwe Sattler, Manfred Hauswirth, Katja Hose,
Roman Schmidt, Renault John, Brahmananda
Sapkota, Conor Hayes, ...
Students: Martin Richtarsky, Michael Haß, Jessica
Müller, Mario Wiegandt, Stefan Schwalm, Matthias
Marx, Thomas Kreyling, ...
Supported by the Science Foundation Ireland under
Grant No. SFI/08/CE/I1380 (Lion-2) and under
Grant No. 08/SRC/I1407 (Clique: Graph & Network
Analysis Cluster)
177 of XYZ
Related Systems
Digital Enterprise Research Institute
www.deri.ie
Sindice: Sindice. The semantic web index. http:// sindice.com/
YARS: A. Harth, J. Umbrich, A. Hogan, S. Decker, Yars2: A federated repository for
querying graph structured data from the web, in: Proc. of ISWC/ASWC, 2007.
Jena: Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval
in Jena2. In: SWDB, pp. 131–150 (2003)
Oracle: Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL- based RDF
querying scheme. In: VLDB, pp. 1216–1227 (2005)
SW-Store: D. J. Abadi, A. Marcus, S. R. Madden, K. Hollenbach. SW-Store: a vertically
partitioned DBMS for Semantic Web data management. The VLDB Journal (2009) 18:385–
406
PIER: R. Huebsch, J. M. Hellerstein, N. Lanham, B. Thau Loo, S. Shenker, and I. Stoica.
Querying the Internet with PIER. In VLDB’03, pages 321–332, 2003.
RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based
on a structured peer-to-peer network. In WWW’04, pages 650–657, 2004.
PeerDB: W. S. Ng, B. Ch. Ooi, and K.-L. Tan. PeerDB: A P2P-based System for Distributed
Data Sharing. In ICDE ’03, pages 633–644, 2003.
AmbientDB: P. Boncz and C. Treijtel. AmbientDB: Relational Query Processing in a P2P
Network. In Workshop On Databases, Information Systems and P2P Computing,
(DBISP2P’03), pages 153–168, 2003.
Hexastore: Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for
semantic web data management. VLDB, 2008
SPARQL: E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, 2006.
W3C Candidate Recommendation.
178 of XYZ
Related Systems /2
Digital Enterprise Research Institute
www.deri.ie
LSH forest: M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for
similarity search. In WWW’05, pages 651–660, 2005.
SWAM: F. Banaei-Kashani and C. Shahabi. SWAM: a family of access methods for
similarity- search in peer-to-peer data networks. In CIKM’04, pages 304–313, 2004.
Cloud services: M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building
a database on S3. In SIGMOD ’08, pages 251–264, 2008.
DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal, and A. El Abbadi. Parallelizing skyline
queries for scalable distribution. In EDBT’06, pages 112–130, 2006.
Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi, A. K. H. Tung, and L. Xu. Skyframe: a
framework for skyline query processing in peer-to-peer systems. The VLDB Journal,
18(1):345–362, 2009.
DARQ: B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In
ESWC’08, pages 524–538, 2008.
ObjectGlobe: R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam,
and K. Stocker. Ob jectGlobe: Ubiquitous query processing on the Internet. VLDB Journal,
10(1):48–71, 2001.
Seaweed: D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware querying
with seaweed. In VLDB’06, pages 727–738, 2006.
SQPeer: G. Kokkinidis, E. Sidirourgos, and V. Christophides. Query Processing in RDF/SBased P2P Database Systems. Semantic Web and Peer-to-Peer, chapter 4, pages 59–81.
Springer, 2006.
GridVine: K. Aberer, P. Cudr´e-Mauroux, M. Hauswirth, and T. Van Pelt. GridVine:
Building Internet-Scale Semantic Overlay Networks. In ISWC’04, pages 107–121, 2004.
PDMS: A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: data management infrastructure for semantic web applications. In WWW’03, pages 556–567, 2003.
179 of XYZ
Thank you!
Digital Enterprise Research Institute
www.deri.ie
[CIKM08]: M. Karnstedt, K.Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt: Estimating the
Number of Answers with Guarantees for Structured Queries in P2P Databases, CIKM 2008,
Napa, USA.
[WIDM08]: M. Karnstedt, K. Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt:
Approximating Query Completeness by Predicting the Number of Answers in DHT-based Web
Applications, WIDM'08 icw CIKM'08, Napa, USA, 2008.
[Ideas08]: M. Karnstedt, K.Sattler, M. Hauswirth, B. Sapkota, R. Schmidt: Ad-hoc Integration and
Querying of Semantic Web Data, Ideas 2008, Coimbra, Portugal.
[DBRank07]: M. Karnstedt, J. Müller, K. Sattler, Cost-Aware Skyline Queries in Structured
Overlays, DBRank‘07@ICDE 2007, Istanbul, Turkey.
[ICDE07]: M. Karnstedt, K.Sattler, M. Richtarsky, J. Müller, M. Hauswirth, R. Schmidt, R. John:
UniStore: Querying a DHT-based Universal Storage, ICDE 2007 Demonstration.
[P2P07]: M. Karnstedt, K.Sattler, R. Schmidt: Completeness Estimation of Range Queries in
Structured Overlays, P2P 2007, Galway, Ireland.
[P2P06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Cost-Aware Processing of Similarity
Queries in Structured Overlays, P2P 2006, Cambridge, UK
[NetDB06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Similarity Queries on Structured
Data in Structured Overlays, NetDB'06 @ ICDE 2006, Atlanta, GA.
[Gilbert et al. 2002]: S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of
consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59, 2002.
[Datta et al. 2005]: A. Datta, M. Hauswirth, R. Schmidt, R. John, and K. Aberer. Range queries in
trie- structured overlays. In P2P’05, pages 57–66, 2005.
[Papadimos et al. 2002]: V. Papadimos, D. Maier. Mutant Query Plans. Information and
Software Technology, 44(4):197–206, April 2002.
[Schallehn et al. 2004]: E. Schallehn, I. Geist, K. Sattler: Supporting Similarity Operations based
on Approximate String Matching on the Web, CoopIS 2004, Larnaca.
[Gravano et al. 2001]: L. Gravano et al.; Approximate String Joins in a Database (almost) for
VLDB 2001, Roma.
180Free,
of XYZ