What is P2P?

Transcript What is P2P?

Digital Enterprise Research Institute
Introduction to Peer-to-Peer
Networks
Manfred Hauswirth, Marcel Karnstedt
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
www.deri.ie
Goals of the Tutorial
Digital Enterprise Research Institute

Position the P2P paradigm
in the design space of
distributed systems

Get an overview of P2P
systems and the underlying
concepts

Understand the problem of
decentralized data
management in P2P
systems
www.deri.ie
What is P2P?
Digital Enterprise Research Institute

Clay Shirkey (The Accelerator Group):

“Peer-to-peer is a class of applications that take advantage of
resources—storage, cycles, content, human presence—
available at the edges of the Internet. Because accessing
these decentralized resources means operating in an
environment of unstable connectivity and unpredictable IP
addresses, peer-to-peer nodes must operate outside the DNS
and have significant or total autonomy of central servers.”

P2P “litmus test:”
– Does it allow for variable connectivity and temporary network
addresses?
– Does it give the nodes at the edges of the network significant
autonomy?

www.deri.ie
P2P ~ an application-level Internet on top of the Internet
P2P in a historical Context
Digital Enterprise Research Institute

www.deri.ie
The original Internet was designed as a P2P system

any 2 computers could send packets to each other
– no firewalls / no network address translation
– no asymmetric connections (V.90, ADSL, cable, etc.)



the back-then “killer apps” FTP and telnet are C/S but anyone
could telnet/FTP anyone else

servers acted as clients and vice versa

cooperation was a central goal and “value”: no spam or
exhaustive bandwidth consumption
Typical examples of “old-fashioned P2P”:

Usenet News

DNS
The emergence of P2P can be seen as a renaissance
of the original Internet model
What is P2P?
Digital Enterprise Research Institute



Every participating node acts as both
a client and a server (“servent”)
Every node “pays” its participation by
providing access to (some of) its
resources
Properties:

no central coordination

no central database

no peer has a global view of the
system

global behavior emerges from local
interactions

all existing data and services are
accessible from any peer

peers are autonomous

peers and connections are unreliable
www.deri.ie
Where is P2P – System layers ?
Digital Enterprise Research Institute

Users


uses
E-commerce systems can be P2P
or centralized
QoS
Application
Information management


User
Commerce and society is P2P
Application layer


www.deri.ie
Directories and databases can be
P2P or centralized
exploits
Information
Management
Networks are P2P

QoS
Internet
exploits
Qos
Network
Types of P2P Systems
Digital Enterprise Research Institute

E-commerce systems


File sharing systems


Napster, Gnutella, Freenet, …
Distributed Databases


eBay, B2B market places, B2B integration servers, …
Mariposa [Stonebraker96], …
Networks

Arpanet

Mobile ad-hoc networks
www.deri.ie
How much P2P is involved?
Digital Enterprise Research Institute
www.deri.ie
P2P user
interaction
P2P application P2P
information
management
eBay
yes
no
no
Napster
yes
yes
no
Gnutella,
Freenet
yes
yes
yes
Related Approaches
Digital Enterprise Research Institute
Related distributed information system approaches:

Event-based systems

Push systems

Mobile agents

Distributed databases
www.deri.ie
Event-based (publish/subscribe)
Digital Enterprise Research Institute

www.deri.ie
System model

Components (peers) interact by generating and receiving events

Components declare interest in receiving specific (patterns of)
events and are notified upon their occurrence

Supports a highly flexible interaction between loosely-coupled
components
XY
Subscribe to
X followed by Y
Y
X
Event-based vs. Peer-to-Peer
Digital Enterprise Research Institute


Common properties:

symmetric communication style

dynamic binding between producers and consumers
Subscription to events ~ “passive” queries

EB: notification

P2P: active discovery
Subscription language supports more sophisticated
queries and pattern matching (event patterns with
time dependencies)
 Event-based systems typically have a specialized
event distribution infrastructure


EB: 2 node types, P2P: 1 node type

EB infrastructure must be deployed
www.deri.ie
Push Systems
Digital Enterprise Research Institute
A set of designated
broadcasters offer information
that is pre-grouped in channels
(weather, news, etc.)
 Receivers subscribe to channels
of their interest and receive
channel information as it is
being “broadcast” (timely
distribution)
 Receivers may have to pay prior
to receiving the information
(pay-per-view, flat fee, etc.)
 Pull  push

www.deri.ie
Push Systems vs. Peer-to-Peer
Digital Enterprise Research Institute
www.deri.ie
Asymmetric communication style (P2P: symmetric)
 Focus is on timely data distribution not on discovery
 Filtering may be deployed to reduce data
transmission requirements
 Subscription to channels is prerequisite
 Producer/consumer binding is static
 Push systems require a specialized distribution
infrastructure


Push: 3 node types, P2P: 1 node type

Push infrastructure must be deployed
Mobile Agents
Digital Enterprise Research Institute


A mobile agent is a computational entity
that moves around in a network at its
own volition to accomplish a task on
behalf of its owner

can cooperate with other agents

“learns” (“Whom to visit next?”)
Mobility (heterogeneous network!)

Weak: code, data

Strong: code, data, execution Stack
www.deri.ie
Mobile Agents vs. Peer-to-Peer
Digital Enterprise Research Institute



www.deri.ie
Very similar in terms of search and navigation

P2P: the peers propagate requests (search, update)

MA: the nodes propagate the agents

Mobile agent ~ “active” query
Mobile agent systems require a considerably more
sophisticated environment

mobile code support (heavy)

security (protect the receiving node from malicious mobile
agents and vice versa)
In many domains P2P systems can take over

more apt for distributed data management

less requirements (sending code requires much
bandwidth, security, etc.)
Distributed Databases
Digital Enterprise Research Institute





www.deri.ie
Fragmenting large databases (e.g., relational) over
physically distributed nodes
Efficient processing of complex queries (e.g., SQL)
by decomposing them
Efficient update strategies (e.g., lazy vs. eager)
Consistent transactions (e.g., 2 phase commit)
Normally approaches rely on central coordination
Distributed Databases vs. Peer-to-Peer
Digital Enterprise Research Institute



Data distribution is a key issue for P2P systems
Approaches in distributed DB that address
scalability

LH* family of scalable hash index structures [Litwin97]

Snowball: scalable storage system for workstation clusters
[Vingralek98]

Fat-Btree: a scalable B-Tree for parallel DB [Yokota 9]
Approaches in distributed DB that address
autonomy (and scalability)


www.deri.ie
Mariposa: distributed relational DBMS based on an
underlying economic model [Stonebraker96]
P2P Data Management has to address both
scalability and autonomy
Usage Patterns to Position P2P
Digital Enterprise Research Institute
www.deri.ie
Discovering information is the predominant problem
 Occasional discovery: search engines
P2P, MA



Notification: event-based systems




push
notification for (correlated) events (event patterns)
E.g., notify me when my stocks drop below a threshold
Systematic discovery: P2P systems


ad hoc requests, irregular
E.g., new town — where is the next car rental?
search engines, MA
find certain type of information on a regular basis
E.g., search for MP3 files of Jethro Tull regularly
Continuous information feed: push systems


event-based
subscription to a certain information type
E.g., sports channel, updates are sent as soon as available
The Interaction Spectrum
Digital Enterprise Research Institute
Event-based systems
Push systems
passive
www.deri.ie
Mobile agents
Peer-to-peer systems
active
Peer-to-Peer vs. C/S and Web
Digital Enterprise Research Institute
www.deri.ie
Client-Server
Peer-to-Peer
Sessionbased
Web-based
tight
loose
very loose
asymmetric
asymmetric
symmetric
Number of
Clients
moderate
(1000)
high
(1,000,000)
high (1,000,000)
Number of
Servers
few (10)
many
(100,000)
none (0)
Coupling
Comm.
Style
Coupling vs. Scalability
Digital Enterprise Research Institute
Coupling
www.deri.ie
session-based
push-based
event-based
web-based
peer-to-peer
Scalability
P2P System Models
Digital Enterprise Research Institute

www.deri.ie
Centralized model

global index held by a central authority
(single point of failure)



direct contact between requestors and providers

Example: Napster
Decentralized model

Examples: Freenet, Gnutella

no global index, no central coordination, global behavior emerges
from local interactions, etc.

direct contact between requestors and providers (Gnutella) or
mediated by a chain of intermediaries (Freenet)
Hierarchical model

introduction of “super-peers”

mix of centralized and decentralized model
Centralized Information Systems
Digital Enterprise Research Institute

Web search engine


www.deri.ie
Client
Global scale application
Client
Example: Google

150 Mio searches/day

1-2 Terabytes of data
Client
(April 2001)
Result
Find
home page of Karl Aberer …
"aberer"


Client
Client
1
2
Google
Server
Client
Client
Strengths

Global ranking

Fast response time
Weaknesses
Client
Client
Client
Google: 15000 servers

Infrastructure, administration, cost

A new company for every global application ?
(Semi-)Decentralized Information Systems
Digital Enterprise Research Institute

P2P Music file sharing


www.deri.ie
Global scale application
Peer
Example: Napster

1.57 Mio. Users

10 TeraByte of data
(2 Mio songs, 220 songs per user)
(February 2001)
Find
Request
and transfer
file
<title> "brick
in the wall"
Result
f.mp3
<artist>
you
find"pink
f.mp3floyd"
at peer x
from
peer
X
directly
<size> "1 MB"
schema
<category> "rock"
PeerX
Peer
3
Peer
Peer
1
2
Napster
Server
Peer
Peer
Peer
Peer
Peer
Napster: 100 servers
Lessons Learned from Napster
Digital Enterprise Research Institute

www.deri.ie
Strengths: Resource Sharing

Every node “pays” its participation by providing access to its resources
– physical resources (disk, network), knowledge (annotations), ownership (files)
Every participating node acts as both a client and a server (“servent”): P2P
 global information system without huge investment
 decentralization of cost and administration = avoiding resource bottlenecks


Weaknesses: Centralization
server is single point of failure
 unique entity required for controlling the system = design bottleneck
 copying copyrighted material made Napster target of legal attack

increasing degree of resource sharing and decentralization
Centralized
System
Decentralized
System
Fully Decentralized Information Systems
Digital Enterprise Research Institute

P2P file sharing


www.deri.ie
•
Strengths
Global scale application
Example: Gnutella

– Good response time, scalable
– No infrastructure, no administration
– No single point of failure
40.000 nodes, 3 Mio files • Weaknesses
(August 2000)
– High network traffic
– No structured search
– Free-riding
I have
Find
"brick_in_the_wall.mp3"
"brick in the wall"
….
Self-organizing System
Gnutella: no servers
Self-Organization
Digital Enterprise Research Institute

www.deri.ie
Self-organized systems well known from physics,
biology, cybernetics

distribution of control ( = decentralization = symmetry in
roles = P2P)

local interactions, information and decisions

emergence of global structures

failure resilience
P2P Architectures
Digital Enterprise Research Institute


www.deri.ie
Principle of self-organization can be applied at
different system layers
Networking
Layer
Internet
Routing
TCP/IP,
DNS
Data Access
Layer
Overlay
Networks
Resource
Location
Gnutella,
FreeNet
Service Layer
P2P
applications
Messaging,
Distributed
Processing
Napster,
Seti,
Groove
User Layer
User
Communities
Collaboration
eBay, Ciao
Original Internet designed as decentralized system:

P2P overlay networks ~ application-level Internet on top of
the Internet

support application-specific addresses
Resource Location in P2P Systems
Digital Enterprise Research Institute

www.deri.ie
Problem: Peers need to locate distributed information
Peers with address p store data items d that are identified by a key kd
 Given a key kd (or a predicate on kd) locate a peer that stores d,
i.e. locate the index information (kd, p)
 Thus, the data we have to manage consists of the key-value pairs (kd, p)


Can such a distributed database be maintained and accessed by a
set of peers without central control ?
P2
P3
P1
P8
kd="jingle-bells"
p="P8"
d="jingle-bells.mp3""
kd ="jingle" ?
P4
("jingle",P8)
P7
P5
P6
Resource Location Problem
Digital Enterprise Research Institute

Operations
search for a key at a peer: p->search(k)
 update a key at a peer: p->update(k,p')
 peers joining and leaving the network: p->join(p’)


Performance Criteria (for search)
search latency:
e.g. searchtime(query)  Log(size(database))
 message bandwidth, e.g. messages(query)  Log(size(database))
messages(update)  Log(size(database))
 storage space used, e.g. storagespace(peer)  Log(size(database))
 resilience to failures (network, peers)


Qualitative Criteria





complex search predicates: equality, prefix, containment, similarity
search
use of global knowledge
peer autonomy
peer anonymity and trust
security (e.g. denial of service attacks)
www.deri.ie
Summary
Digital Enterprise Research Institute

What is a P2P System ?

What is emergence ?

At which layers can the P2P architecture occur ?

How do we define efficiency for a P2P resource
location system ?
www.deri.ie
Unstructured P2P Overlay Networks
Digital Enterprise Research Institute

No index information is used


www.deri.ie
i.e. the information (k, p) is only available directly from p
Simplest approach: Message Flooding (Gossiping)

send query message to C neighbors

messages have limited time-to-live TTL

messages have IDs to eliminate cycles
k="jingle-bells"
Example: C=3, TTL=2
Gnutella
Digital Enterprise Research Institute

Developed in a 14 days “quick hack” by Nullsoft (winamp)


Originally intended for exchange of recipes
Evolution of Gnutella






www.deri.ie
Published under GNU General Public License on the Nullsoft web server
Taken off after a couple of hours by AOL (owner of Nullsoft)
This was enough to “infect” the Internet
Gnutella protocol was reverse engineered from downloaded versions of
the original Gnutella software
Third-party clients were published and Gnutella started to spread
Based on message flooding
Typical values C=4, TTL=7
TTL
 One request leads to 2 *  C * (C  1)i  26,240 messages
i 0
 Hooking up to the Gnutella systems requires that a new peer knows at
least one Gnutella host (gnutellahosts.com:6346; outside the Gnutella
protocol specification)
 Neighbors are found using a basic discovery protocol (ping-pong
messages)

Gnutella: Protocol Message Types
Digital Enterprise Research Institute
Type
Ping
Pong
Query
QueryHit
Push
Description
www.deri.ie
Contained Information
Announce availability and probe for None
other servents
Response to a ping
IP address and port# of responding servent;
number and total kb of files shared
Search request
Minimum network bandwidth of responding
servent; search criteria
Returned by servents that have
IP address, port# and network bandwidth of
the requested file
responding servent; number of results and
result set
File download requests for
Servent identifier; index of requested file; IP
servents behind a firewall
address and port to send file to
Gnutella: Meeting Peers (Ping/Pong)
Digital Enterprise Research Institute
www.deri.ie
C
A
B
A’s ping
B’s pong
C’s pong
D’s pong
E’s pong
D
E
Gnutella: Searching (Query/QueryHit/GET)
Digital Enterprise Research Institute
GET X.mp3
www.deri.ie
X.mp3
X.mp3
C
A
B
A’s query (e.g., X.mp3)
C’s query hit
E’s query hit
D
E
X.mp3
Popularity of Queries
[Sripanidkulchai01]
Digital Enterprise Research Institute



Very popular documents are approximately equally popular
Less popular documents follow a Zipf-like distribution (i.e.,
the probability of seeing a query for the ith most popular
query is proportional to 1/(ialpha)
Access frequency of web documents also follows Zipf-like
distributions  caching might work for Gnutella
www.deri.ie
Free-riding on Gnutella
[Adar00]
Digital Enterprise Research Institute




www.deri.ie
24 hour sampling period:

70% of Gnutella users share no files

50% of all responses are returned by top 1% of sharing hosts
A social problem not a technical one
Problems:

Degradation of system performance: collapse?

Increase of system vulnerability

“Centralized” (“backbone”) Gnutella  copyright issues?
Verified hypotheses:

H1: A significant portion of Gnutella peers are free riders.

H2: Free riders are distributed evenly across domains

H3: Often hosts share files nobody is interested in (are not
downloaded)
Free-riding Statistics - 1
[Adar00]
Digital Enterprise Research Institute


H1: Most Gnutella users are free riders
Of 33,335 hosts:

22,084 (66%) of the peers share no files

24,347 (73%) share ten or less files

Top 1 percent (333) hosts share 37% (1,142,645) of total files shared

Top 5 percent (1,667) hosts share 70% (1,142,645) of total files shared

Top 10 percent (3,334) hosts share 87% (2,692,082) of total files shared
www.deri.ie
Free-riding Statistics - 2
[Adar00]
Digital Enterprise Research Institute


H3: Many servents share files nobody downloads
Of 11,585 sharing hosts:

Top 1% of sites provide nearly 47% of all answers

Top 25% of sites provide 98% of all answers

7,349 (63%) never provide a query response
www.deri.ie
Topology of Gnutella
[Jovanovic01]
Digital Enterprise Research Institute


Small-world properties verified (“find everything close by”)
Backbone + outskirts
www.deri.ie
Gnutella Backbone
Digital Enterprise Research Institute
[Jovanovic01]
www.deri.ie
Categories of Queries
Digital Enterprise Research Institute

Categorized top 20 queries
[Sripanidkulchai01]
www.deri.ie
Caching in Gnutella
[Sripanidkulchai01]
Digital Enterprise Research Institute

Average bandwidth consumption in tests: 3.5Mbps

Best case: trace 2 (73% hit rate = 3.7 times traffic
reduction)
www.deri.ie
Gnutella: Bandwidth Barriers
Digital Enterprise Research Institute


www.deri.ie
Clip2 measured Gnutella over 1 month:

typical query is 560 bits long (including TCP/IP headers)

25% of the traffic are queries, 50% pings, 25% other

on average each peer seems to have 3 other peers actively
connected
Clip2 found a scalability barrier with substantial
performance degradation if queries/sec > 10:
10 queries/sec
* 560 bits/query
*
4 (to account for the other 3 quarters of message traffic)
*
3 simultaneous connections
67,200 bps
 10 queries/sec maximum in the presence of many dialup users
 won’t improve (more bandwidth - larger files)
Gnutella: Summary
Digital Enterprise Research Institute











www.deri.ie
Completely decentralized
Hit rates are high
High fault tolerance
Adopts well and dynamically to changing peer populations
Protocol causes high network traffic (e.g., 3.5Mbps). For example:

4 connections C / peer, TTL = 7

1 ping packet can cause 2 * i 0 C * (C  1)i  26,240 packets
TTL
No estimates on the duration of queries can be given
No probability for successful queries can be given
Topology is unknown  algorithms cannot exploit it
Free riding is a problem
Reputation of peers is not addressed
Simple, robust, and scalable (at the moment)
Modern Gnutella
Digital Enterprise Research Institute

Lots of improvements

Hybrid Super-Peer architecture

“Gnutella + DHT”
www.deri.ie
Improvements of Message Flooding
Digital Enterprise Research Institute

www.deri.ie
Expanding Ring
start search with small TTL (e.g. TTL = 1)
 if no success iteratively increase TTL (e.g. TTL = TTL +2)


k-Random Walkers
forward query to one randomly chosen neighbor only, with large
TTL
 start k random walkers
 random walker periodically checks with requester whether to
continue

Discussion Unstructured Networks
Digital Enterprise Research Institute

www.deri.ie
Performance

Search latency: low (graph properties)

Message Bandwidth: high
– improvements through random walkers, but essentially the
whole network needs to be explored


Storage cost: low (only local neighborhood)

Update and maintenance cost: low (only local updates)

Resilience to failures good: multiple paths are explored
and data is replicated
Qualitative Criteria

search predicates: very flexible, any predicate is possible

global knowledge: none required

peer autonomy: high
Summary
Digital Enterprise Research Institute
www.deri.ie

How are unstructured P2P networks characterized ?

What is the purpose of the ping/pong messages in
Gnutella ?

Why is search latency in Gnutella low ?

Which are methods to reduce message bandwidth
in unstructured networks ?
Hierarchical P2P Overlay Networks
Digital Enterprise Research Institute


www.deri.ie
Servers provide index information, i.e. the
information (k, p) is available from dedicated
servers
Simplest Approach

one central server

user register files

service (file exchange) is organized
as P2P architecture
index server
k="jingle-bells"
Napster
Digital Enterprise Research Institute




www.deri.ie
Central (virtual) database which holds an index of offered
MP3/WMA files
Clients connect to this server, identify themselves (account)
and send a list of MP3/WMA files they are sharing (C/S)
Other clients can search the index and learn from which
clients they can retrieve the file (P2P)
Additional services at server (chat etc.)
Napster Server
register
(user, files)
A
“A has X.mp3”
Download X.mp3
B
Superpeer Networks
Digital Enterprise Research Institute


Improvement of Central Index Server (Morpheus, Kaaza)

multiple index servers build a P2P network

clients are associated with one (or more) superpeers

superpeers use message flooding to forward search requests
Experiences

redundant superpeers
are good

superpeers should have
high outdegree (>20)

TTL should be minimized
www.deri.ie
Discussion
Digital Enterprise Research Institute

www.deri.ie
Performance

Search latency: very low (index)

Message Bandwidth: low
– with superpeers flooding occurs, but the number of
superpeers is comparatively small


Storage cost: low at client, high at index server

Update cost: low (no replication)

Resilience to failures: bad (system has single-point of
failure)
Qualitative Criteria

search predicates: very flexible, any predicate is possible

global knowledge: server

peer autonomy: low
Summary
Digital Enterprise Research Institute
www.deri.ie

Which are the two levels of P2P networks in
superpeer networks, and to which functional layers
are they related ?

Which problem of distribution is avoided in
superpeer networks and addressed in structured
network ? What is the impact on the relation
between nodes and functional layers ?
Structured P2P Overlay Networks
Digital Enterprise Research Institute




Unstructured overlay networks – what we learned

simplicity (simple protocol)

robustness (almost impossible to “kill” – no central
authority)
Performance

search latency O(log n), n number of peers

update and maintenance cost low
Drawbacks

tremendous bandwidth consumption for search

free riding
Can we do better?
www.deri.ie
Efficient Resource Location
Digital Enterprise Research Institute
www.deri.ie
FULL REPLICATION
update cost
high
low
low
search cost
low
STRUCTURED P2P OVERLAY
NETWORKS
(e.g. prefix routing)
high
UNSTRUCTURED P2P
OVERLAY NETWORKS
(e.g. Gnutella)
high
SERVER
(e.g. Napster)
maximal bandwidth
Distribution of Index Information
Digital Enterprise Research Institute



www.deri.ie
Goal: provide efficient search using few messages without using
designated servers
Easy: distribution of index information over all peers, i.e. every peer
maintains and provides part of the index information (k, p)
Difficult: distributing the data access structure to support efficient
search
Search starts here
server
Where to start the search?
?
data access
structure
index information I
I1
I2
I3
I4
peers (storing data and index information)
peers (storing data)
Approaches
Digital Enterprise Research Institute



www.deri.ie
Different strategies

P-Grid: distributing a binary search tree

Chord: constructing a distributed hash table

CAN: Routing in a d-dimensional space

Freenet: caching index information along search paths
Commonalities

each peer maintains a small part of the index information
(routing table)

searches performed by directed message forwarding
Differences

performance and qualitative criteria
P-Grid
Digital Enterprise Research Institute
www.deri.ie
Search tree (prefix tree)
???
extra
data
?
101
0??
00?
000
001
1??
01?
010
011
N objects  log2(N) steps
10?
100
?
101
!
101
101
?
101
11?
110
111
Scalable data access structures
Digital Enterprise Research Institute

Assume number of data objects >> storage of one
node



www.deri.ie
Distributed storage
Given a data access structure

Size of data access structure = number of data objects

Size of data access structure >> storage of one node
Problem: where to store?
Non-scalable Distribution of Search Tree
Digital Enterprise Research Institute
•
www.deri.ie
Distribute search tree over peers
bottleneck
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable Distribution of Search Tree
Digital Enterprise Research Institute
www.deri.ie
"Napster"
bottleneck
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable data access structures
Digital Enterprise Research Institute
www.deri.ie
Associate each peer with a complete path
???
0??
00?
000
001
peer 1
1??
01?
010
011
peer 2
10?
100
101
peer 3
11?
110
111
peer 4
Scalable data access structures
Digital Enterprise Research Institute
www.deri.ie
???
1??
peer 1 peer 2
knows more about
this part of the tree
10?
100
101
peer 3
peer 4
knows more about
this part of the tree
The result is P-Grid
Digital Enterprise Research Institute
www.deri.ie
101
?
Peers cooperate in search
???
101
?
1??
peer 1 peer 2
???
peer 1 peer 2
101
?
101 1??
?
10?
100
101
peer 3
Message
to peer 3
101 ?
peer 3
11?
110
111
peer 4
peer 4
! 101
Construction
Digital Enterprise Research Institute

www.deri.ie
Splitting Approach (P-Grid)
peers meet and decide whether to extend search tree by splitting
the data space
 peers can perform load balancing considering their storage load
 networks with different origins can merge, like Gnutella, Freenet
(loose coupling)


Node Insertion Approach (Chord, CAN, …)
peers determine their "leaf position" based on their IP address
 nodes route from a gateway node to their node-id to populate
the routing table
 network has to start from single origin (strong coupling)


Replication of data items and routing table entries is used to
increase failure resilience
P-Grid Discussion
Digital Enterprise Research Institute


Performance

Search latency: O(log n) (with high probability, provable)

Message Bandwidth: O(log n) (selective routing)

Storage cost: O(log n) (routing table)

Update cost: low (like search)
Qualitative Criteria

search predicates: prefix searches

global knowledge: key hashing

peer autonomy: peers can locally decide on their role
(splitting decision)
www.deri.ie
DHT example: Chord
Digital Enterprise Research Institute

Hashing of search keys AND peer addresses on binary keys of
length m


www.deri.ie
e.g. m=8, key("jingle-bells.mp3")=17, key(196.178.0.1)=3
Data keys are stored at next larger node key
peer with hashed identifier p,
data with hashed identifier k, then
k  ] predecessor(p), p ]
p
k
predecessor
m=8
stored
32 keys
at
p2
p3
Search possibilities
1. every peer knows every other
O(n) routing table size
2. peers know successor
O(n) search cost
Routing Tables
Digital Enterprise Research Institute

www.deri.ie
Every peer knows m peers with exponentially
increasing distance
p p+1
p+2
Each peer p stores a routing table
First peer with hashed identifier si such that
si =successor(p+2i-1) for i=1,..,m
We write also si = finger(i, p)
p+4
s1, s2, s3
s5
p2
p+8
s4
p3
p4
p+16
i
si
1
p2
2
p2
3
p2
4
p3
5
p4
Search
O(log n) routing table size
Search
Digital Enterprise Research Institute
www.deri.ie
search(p, k)
find in routing table largest (i, p*) such that p*  [p,k[
/* largest peer key smaller than the searched data key */
if such a p* exists then search(p*, k)
else return (successor(p)) // found
p p+1
p+2
p+4
s1, s2, s3
s5
k2
p2
p+8
s4
k1
p3
p4
p+16
Search
O(log n) search cost
RT with exp. increasing
distance  O(log n) with high
probability
Node Insertion
Digital Enterprise Research Institute

www.deri.ie
New node q joining the network

q asks existing node p to find predecessor and fingers

cost: O(log2 n)
p p+1
p+2
q
p+4
p2
p+8
p3
p4
p+16
routing table
of p
routing table
of q
i
si
i
si
1
q
1
p2
2
q
2
p2
3
p2
3
p3
4
p3
4
p3
5
p4
5
p4
Load Balancing in Chord
Digital Enterprise Research Institute
Network size n=10^4
5 10^5 keys
uniform data distribution
50 keys per node?
NO, as IP addresses do
not map uniformly into
data key space.
www.deri.ie
Length of Search Paths
Digital Enterprise Research Institute
Network size n=2^12
100 2^12 keys
Path length ½ Log2(n)
RTs can be seen
as an embedding
of search trees
into the network
and thus search
starts at a randomly
selected tree depth
www.deri.ie
Chord Discussion
Digital Enterprise Research Institute


Performance

Search: like P-Grid

Node join/leave cost: O(log2 n)

Resilience to failures: replication to successor nodes
Qualitative Criteria

search predicates: equality of keys only

global knowledge: key hashing, network origin

peer autonomy: nodes have by virtue of their address a
specific role in the network
www.deri.ie
Topological Routing (CAN)
Digital Enterprise Research Institute

www.deri.ie
Based on hashing of keys into a d-dimensional space (a torus)
Each peer is responsible for keys of a subvolume of the space (a zone)
 Each peer stores the adresses of peers responsible for the neighboring
zones for routing
 Search requests are greedily forwarded to the peers in the closest zones


Assignment of peers to zones depends on a random selection made
by the peer
Network Search and Join
Digital Enterprise Research Institute
www.deri.ie
Node 7 joins the network by choosing a coordinate in the volume of 1
=> O(d) updates or RTs
CAN Refinements
Digital Enterprise Research Institute


www.deri.ie
Multiple Realities

We can have r different coordinate spaces

Nodes hold a zone in each of them

Creates r replicas of the (key, value) pairs

Increases robustness

Reduces path length as search can be continued in the
reality where the target is closest
Overloading zones

Different peers are responsible for the same zone

Splits are only performed if a maximum occupancy (e.g. 4)
is reached

Nodes know all other nodes in the same zone

But only one of the neighbors
CAN Path Length
Digital Enterprise Research Institute
www.deri.ie
CAN Discussion
Digital Enterprise Research Institute

Performance







www.deri.ie
Search latency: O(d n1/d), depends on choice of d (with high
probability, provable)
Message Bandwidth: O(d n1/d), (selective routing)
Storage cost: O(d) (routing table)
Update cost: low (like search)
Node join/leave cost: O(d n1/d)
Resilience to failures: realities and overloading
Qualitative Criteria



search predicates: spatial distance of multidimensional keys
global knowledge: key hashing, network origin
peer autonomy: nodes can decide on their position in the key
space
Dynamical Clustering (Freenet)
Digital Enterprise Research Institute

www.deri.ie
Freenet Background
P2P system which supports publication, replication, and retrieval
of data
 Protects anonymity of authors and readers: infeasible to
determine the origin or destination of data
 Nodes are not aware of what they store (keys and files are sent
and stored encrypted)
 Uses an adaptive routing and caching strategy


Index information maintained at each peer (limited cache size)
Key
Data
Address
8e47683isdd0932uje89
456r5wero04d903iksd0
f3682jkjdn9ndaqmmxia
wen09hjfdh03uhn4218
712345jb89b8nbopledh
d0ui43203803ujoejqhh
ZT38hwe01h02hdhgdzu
Rhweui12340jhd091230
eqwe1089341ih0zuhge3
erwq038382hjh3728ee7
tcp/125.45.12.56:6474
tcp/67.12.4.65:4711
tcp/127.156.78.20:8811
tcp/78.6.6.7:2544
tcp/40.56.123.234:1111
tcp/128.121.89.12:9991
Freenet Routing
Digital Enterprise Research Institute

www.deri.ie
If a search request arrives
Either the data is in the table
 Or the request is forwarded to the addresses with the most
similar keys (lexicographic similarity, edit distance) till an answer
is found or TTL reached (e.g. TTL = 500)


If an answer arrives
The key, address and data of the answer are inserted into the
table
 The least recently used key and data is evicted


Quality of routing should improve over time
Node is listed under certain key in routing tables
 Therefore gets more requests for similar keys
 Therefore tends to store more entries with similar keys
(clustering) when receiving results and caching them
 Dynamic replication of data

Freenet Routing
Digital Enterprise Research Institute
www.deri.ie
peer p has k
peer p' has k'
search k
response (k,p)
new link established
search k', k' similar to k
Freenet: Inserting Files
Digital Enterprise Research Institute
www.deri.ie

First a the key of the file is calculated

An insert message with this proposed key and a
hops-to-live value is sent to the neighbor with the
most similar key

Then every peer checks whether the proposed key
is already present in its local store

yes  return stored file (original requester must propose
new key)

no  route to next peer for further checking (routing uses
the same key similarity measure as searching)

continue until hops-to-live are 0 or failure
Freenet: Evolution of Path Length
Digital Enterprise Research Institute

1000 identical nodes

max 50 data
items/node

max 200
references/node

Initial references:
(i-1, i-2, i+1, i+2) mod n

each time-step:

-
randomly insert
-
TTL=20
every 100 time-steps:
300 requests
(TTL=500) from
random nodes and
measure actual path
length (failure=500).
www.deri.ie
median path length 500  6
Freenet Discussion
Digital Enterprise Research Institute

www.deri.ie
Performance

Search latency: low (small world property)

Message Bandwidth: low (selective routing)

Storage cost: relatively low (experimentally not validated !)

Update cost: low (like search)
– but a bootstrapping phase is required


Resilience to failures: good (high degree of replication of
data and keys)
Qualitative Criteria

search predicates: with encryption only equality of keys

global knowledge: none

peer autonomy: high (with encryption risk of storing
undesired data)
Comparison
Digital Enterprise Research Institute
www.deri.ie
Paradigm
Search Type
Gnutella
Breadth-first
String
search on graph comparison
Freenet
Depth-first
Equality
search on graph
Chord
CAN
P-Grid
Implicit binary
search trees
d-dimensional
space
Binary prefix
trees
Search Cost
(messages)
TTL
2*  i 0 C *(C  1)i
O(Log n) ?
Equality
O(Log n)
Equality
O(d n^(1/d))
Prefix
O(Log n)
Small World Graphs
Digital Enterprise Research Institute

Each P2P system can be
interpreted as a directed graph
(overlay network)



peers correspond to nodes
routing table entries as directed
links
Task

Find a decentralized algorithm
(greedy routing) to route a
message from any node A to any
other node B with few hops
compared to the size of the graph

Requires the existence of short
paths in the graph
www.deri.ie
Milgram’s Experiment
Digital Enterprise Research Institute

Finding short chains of acquaintances
linking pairs of people in USA who didn’t
know each other;

Source person in Nebraska

Sends message with first name and location

Target person in Massachusetts.

Average length of the chains that were
completed was between 5 and 6 steps
“Six degrees of separation” principle

BIG QUESTION:


WHY there should be short chains of acquaintances
linking together arbitrary pairs of strangers???
www.deri.ie
Random Graphs
Digital Enterprise Research Institute


www.deri.ie
For many years typical explanation was - random
graphs

Low diameter: expected distance between two nodes is
logkN, where k is the outdegree and N the number of nodes

When pairs or vertices are selected uniformly at random
they are connected by a short path with high probability
But there are some inaccuracies

If A and B have a common friend C it is more likely that they
themselves will be friends! (clustering)

Many real world networks (social networks, biological
networks in nature, artificial networks – power grid, WWW)
exhibit this clustering property

Random networks are NOT clustered.
Clustering
Digital Enterprise Research Institute


Clustering measures the fraction of neighbors of a node that
are connected themselves
Regular Graphs have a high clustering coefficient


but also a high diameter
Random Graphs have a low clustering coefficient


www.deri.ie
but a low diameter
Both models do match the properties expected from real
networks!
Regular Graph (k=4)
Long paths

L ~ n/(2k)
Highly clustered

C~3/4
Random Graph (k=4)
Short path length

L~logkN
Almost no clustering

C~k/n
Small-World Networks
Digital Enterprise Research Institute

www.deri.ie
Random rewiring of regular graph (by Watts and
Strogatz)

With probability p rewire each link in a regular graph to a
randomly selected node

Resulting graph has properties, both of regular and
random graphs
– High clustering and short path length

Freenet has been shown to result in small world graphs
Flashback: Freenet Search Performance
Digital Enterprise Research Institute



www.deri.ie
Modifying routing tables in Freenet through
caching has a "rewiring effect"
Studies show that Freenet graphs have small-world
properties
Explains improving search performance
Regular graph:
n nodes, k nearest neighbors
 path length ~ n/2k
4096/16 = 256
Rewired graph (1% of nodes):
path length ~ random graph
clustering ~ regular graph
Small World Graph
Random graph:
path length ~ log (n)/log(k)
~4
Search in Small World Graphs
Digital Enterprise Research Institute

BUT! Watts-Strogatz can provide a model for
the structure of the graph
 existence
 high

www.deri.ie
of short paths
clustering
It does not explain how the shortest paths
are found
 also
Gnutella networks are small-world graphs
 why
can search be efficient in Freenet?
P2P Overlay Networks as Graphs
Digital Enterprise Research Institute


Each P2P system can be
interpreted as a directed
graph …

peers correspond to nodes

routing table entries as directed
links
… embedded in some space

P-Grid: interval [0,1]

Chord: ring [0,1)

CAN: d-dimensional torus

Freenet: strings + lexicographical
distance
www.deri.ie
Kleinberg’s Small-World Model
Digital Enterprise Research Institute

www.deri.ie
Kleinberg’s Small-World’s model
Embed the graph into an r-dimensional grid
 constant number p of short range links (neighborhood)
 q long range links: choose long-range links such that the probability
to have a long range contact is proportional to 1/dr


Importance of r !

Decentralized (greedy) routing performs best iff. r = dimension of
space
r=2
Influence of “r” (1)
Digital Enterprise Research Institute
www.deri.ie
1
•
Each peer u has link to the peer v with probability proportional to d ( u ,v ) r
where d(u,v) is the distance between u and v.
•
Optimal value: r = dim = dimension of the space
•
•
•
If r < dim we tend to choose more far away neighbors (decentralized
algorithm can quickly approach the neighborhood of target, but then slows
down till finally reaches target itself).
If r > dim we tend to choose more close neighbors (algorithm finds quickly
target in it’s neighborhood, but reaches it slowly if it is far away).
When r = 0 – long range contacts are chosen uniformly. Random graph theory
proves that there exist short paths between every pair of vertices, BUT
there is no decentralized algorithm capable finding these paths
Influence of “r” (2)
Digital Enterprise Research Institute

www.deri.ie
Given node u if we can partition the remaining peers into sets A1, A2,
A3, … , AlogN , where Ai, consists of all nodes whose distance from u is
between 2i and 2i+1, i=0..log(N-1).

Then given r = dim each long range contact of u is nearly equally likely to
belong to any of the sets Ai

When q = log N – on average each node will have a link in each set of Ai
A1
A2
A
3
A
4
DHTs and Kleinberg model
Digital Enterprise Research Institute
P-Grid’s
model
Kleinberg’s
model
www.deri.ie
Conclusions from Kleinberg's Model
Digital Enterprise Research Institute

www.deri.ie
With respect to the Watts and Strogatz model
there is no decentralized algorithm capable performing effective search
in the class of SW networks constructed according to Watts and Strogatz
 J. Kleinberg presented the infinite family of Small World networks that
generalizes the Watts and Strogatz model and shows that decentralized
search algorithms can find short paths with high probability
 there exist only one unique model within that family for which
decentralized algorithms are effective.


With respect to overlay networks
Many of the structured P2P overlay networks are similar to Kleinberg’s
model (e.g. Chord, randomized version, q=log N, r=1)
 Unstructured overlay networks also fit into the model (e.g. Gnutella q=5,
r=0)
 Some variants of structured P2P overlay networks are having no
neighborhood lattice (e.g. P-Grid, p=0)
 Extensions to spaces beyond regular grids are possible (e.g. arbitrary
metric spaces)

Summary
Digital Enterprise Research Institute
www.deri.ie

How can we characterize P2P overlay networks such that we
can study them using graph-theoretic approaches?

What is the main difference between a random graph and a
SW graph?

What is the main difference between the Watts/Strogatz and
the Kleinberg model?

What is the relationship between structured overlay networks
and small world graphs?

What are possible variations of the small world graph model?
Specific problems: Identity management
Digital Enterprise Research Institute
www.deri.ie
Definition:
Consistent mapping of a set of attributes onto an
identifier in a unique, deterministic, and secure way


Identification is an essential building block in
distributed (information) systems
Examples: directory services

DNS: symbolic host names  IP address

X.500: distinguished name  object (attributes)

UDDI: query  web service specification
Identity management issues
Digital Enterprise Research Institute

www.deri.ie
Data management

Uniqueness of identifiers

Centralized vs. distributed data management
– Degree of decentralization (orthogonal to the distribution of
data!)



Update consistency
Security

Access permissions (+ management)

Requires unique identification

Resilience against attacks
Infrastructure

Third party?

Scalability, robustness, 24/7 availability, etc.
Use case: Dynamic IP addresses
Digital Enterprise Research Institute


www.deri.ie
Most computers on the Internet have a dynamic IP
address

Limited number of IP addresses

Dynamic Host Configuration Protocol (lease time)

Host mobility (physical mobility)
Problem for any system that builds a distributed
management structure on top of such networks
Use Case:
Management of dynamic IP addresses in structured
P2P systems (Chord, DKS, Pastry, P-Grid, etc.)
P2P systems and dynamic IP addresses
Digital Enterprise Research Institute


www.deri.ie
Structured P2P systems (Chord, Pastry, P-Grid, DKS,
etc.)

These systems construct a distributed index  routing
tables

Dynamic IP addresses  routing tables become
inconsistent  system can break down
Unstructured (Gnutella) and hierarchical (FastTrack)
systems

Less of a problem

But, they pay with
–
–
–
–
high bandwidth consumption, or
single point of failures, or
high infrastructure costs, or
etc.
Problems to address
Digital Enterprise Research Institute

www.deri.ie
How to find out that an IP address has become invalid?

No response
– Network problem or did the peer get a new address?

Response
– Is it still the same peer? (authenticity, replay, man-in-themiddle attacks)

Frequency of address changes is crucial



Peers can join and leave at any moment
IP address can change at any moment
Security: DOS attacks are very simple



Assume peers report back their new IP address
EvilHacker.org participates in the overlay and thus finds
out IP addresses
EvilHacker.org reports all IP addresses it finds pointing to
random hosts or itself
 A scalable and secure infrastructure is required
Problems of current P2P approaches
Digital Enterprise Research Institute





www.deri.ie
Third-party infrastructures are required
Very costly maintenance protocols
Maintenance protocols may compromise structural
properties (e.g., load-balancing)
Previous knowledge is lost (e.g., reputation of the
peer, QoS, etc.)
No current approach addresses security

Only the owner should be allowed to update the mapping

DOS, replay, man-in-the-middle, etc. are not addressed
IPv6 as an alternative
Digital Enterprise Research Institute




www.deri.ie
With IPv6 dynamic addresses or NAT are no longer
necessary

IPv6 address space is ~3,4 * 1038 (or 1030 addresses per
person on the planet)

IPv4 (current) address space is 232
IPsec (included in IPv6)

solves authentication problem

DOS attacks are more difficult
Mobility is addressed

IPv6: home/foreign address

IPv4: mobility extension but not supported on a large scale
Problem: IPv6 has not been deployed yet
DNS as an alternative
Digital Enterprise Research Institute

DNS extensions

Several RFCs extend the original DNS specification so that
DNS could support secure updates

Problems
– Very heavy-weight
– Configuration is very difficult and error-prone
– Not for the “normal user” (as in a P2P system)

www.deri.ie
DynDNS (and similar services)

Hosts maintain a consistent name/address mapping in a
special DNS domain via a special client

Problems
– Centralized  scalability
– Service may go out of business
Basic idea
Digital Enterprise Research Institute
DNS
lookup IP
address
www.deri.ie
FQDN  IP address
routing based on
FQDN (any overlay)
static
mapping
Index
P-Grid
Data
lookup IP
address
routing based on
logical identifier
P-Grid
logical identifier  IP address
DYNAMIC mapping
Informal discussion
Digital Enterprise Research Institute

www.deri.ie
Use unique logical identifiers (UUIDs) instead of physical
identifiers (IP addresses):

Peer identifiers

Routing based on UUIDs

Use the overlay itself to securely maintain mappings between
the logical and the physical identifier

Self-referential approach


Rate of changes < self-healing rate

Dynamic equilibrium
Advantages:

General identification facility disentangling logical identifiers
from network structure

Tracking of chances (for example, reputation)
Maintenance and routing
Digital Enterprise Research Institute


www.deri.ie
Universal Unique Identifier (UUID) are generated
locally

Cryptographically secure hash function  global
uniqueness

Index / routing tables: UUID

Peers maintain an up-to-date UUID-IP mappings in P-Grid
Routing

Peers cache known UUID-IP mappings

Mapping exists in cache  identity of target peer is
checked before forwarding

No mapping is known  query P-Grid for mapping
Security issues
Digital Enterprise Research Institute

Peer generates public/private key
UUID-public key  P-Grid

Why is it secure?

www.deri.ie
Data in P-Grid is stored at a number of random replicas
 Hard to attack
 Request are

– signed (private key)  authenticity & access permissions
– time-stamped  no replay possible


Quorums are required for each request
Better security than PGP
Quorums
 Independent, random paths avoid weakest link problem
 Revoke and update of security relevant information is possible

113
Directory maintenance: self-healing
Digital Enterprise Research Institute

www.deri.ie
Repair strategies

Eager repair: repair each stale entry encountered immediately

Lazy repair: repair a routing table when all references at one level
become stale
DNS
lookup IP
address
routing based on
FQDN (any overlay)
lookup IP address
in case of failure
routing based on
logical identifier
maintain logical identifier  IP address
mapping: eager or lazy
ID Presently online
query(01*) @ 7
…query(0101) @ 7 (for stale entry 5, cycle -> abort)
ID Presently offnline
…query(1110) @ 7 (for stale entry 14, forward to 12 or 13)
Digital Enterprise
Institute
StoresResearch
mappings
ID
…query(1110) @ 12 (is offline)
of peers
…query(1110) @ 13 (for stale entry 2)
Up-to-date cache ……query(0010) @ 13 (forward to 5)
……query(0010) @ 5 (forward to 7)
1 : 2 ,12
……query(0010) @ 7 (forward to 9)
……query(0010) @ 9 (new entry for 2 found !)
Stale cache
…query(1110) @ 2 (new entry for 14 found !)
query(01*) @ 14 (finally )
0
00
000
1
01
001
10
010
011
100
101
www.deri.ie
11
2
0 : 1,14
10 : 11,13
12
1
1
1 : 12, 13
01 : 5, 10
001: 9,4
7
1
1 : 12, 13
01 : 5,14
001: 9,4
9
2,3
1 : 8,2
01 : 3, 10
000: 1,7
4
2,3
1 : 6,13
01 :10,14
000: 1,7
14
4,5
1 : 2,12
00 : 9,4
011: 3,10
5
4,5
1 : 8, 13
00 : 7,9
011: 3,10
3
6,7
1 : 11,12
00 : 1,9
010: 5,14
10
6,7
1 : 6,8
00 : 1,7
010: 5,14
11
8,9
0 : 4,7
11 : 2,12
101: 8,13
6
8,9
1 : 1,3
11 : 2,12
101: 8,13
13
10,11
0 : 5,9
11 : 2,12
100: 6,11
8
10,11
0 : 4,9
11 : 2,12
100: 6,11
12,13,14
12,13,14
0 : 5,7
10 : 6,13
Eager repair strategy
Digital Enterprise Research Institute

www.deri.ie
System is in a dynamic equilibrium if the rate of changes due to
changing mappings and the rate of repairs is equal 
Dynamic equilibrium equation

LHS
Rate at which repair of stale routing entries occur
 rup changes per 1-rup queries
 Nrec – 1 additional recursive queries
 Repair makes sense only if the routing entry to be repaired
corresponds to an online peer
 A repair is possible only if recursive query succeeds

RHS
Rate of entries turning stale
 rup changes
 1-pdyn probability of non-stale references (only these can turn stale)
 r references at each peer for each of log 2n levels
Lazy repair strategy
Digital Enterprise Research Institute

www.deri.ie
Repair only if all references of a level are stale
Not all routing entries are treated uniformly
 The number of stale entries for each routing level at each peer
defines the state of that level
 Markovian model

0 ref
stale
ID
change
1 ref
stale
ID
change
2 ref
stale
ID
change
…
ID
change
r ref
stale
repairs

Dynamic equilibrium equation
inflow = outflow (for each state)

At dynamic equilibrium, the number of routing levels with
given number of stale entries over the whole system should
not change
N.B. We distinguish stale entries from offline peers
Lazy repair: Analytical vs.
simulation results
Digital Enterprise Research Institute
www.deri.ie
H
L
Number of messages vs. rate of change (N=128,256,512,1024,
replication factor is 8)
Msg
80
60
Lazy Rec., Mess. vs. r_up for p_on=1
sim,N=128
ana,N=128
sim,N=256
ana,N=256
sim,N=512
ana,N=512
sim,N=1024
ana,N=1024
n=N 8 ,
rup
40
20
0.025
0.05
0.075
0.1
0.125
0.15
r_up
Effect of pon on stability and
message overhead
Digital Enterprise Research Institute
www.deri.ie
In networks with more online peers the lazy strategy is advantageous
but collapses earlier
Msg
1200
1000
H HLL
Lazy vs. Eager rec. r_up= 0.2, lg_2 n =5
Directory "unstable"
Lazy rec
800
Eager rec
600
400
Directory "stable"
200
0.3
0.4
0.5
0.6
0.7
0.8
0.9
p_on
Summary
Digital Enterprise Research Institute

Decentralized, self-maintaining, light-weight, and
secure directory service
Robust and applicable in unreliable environments

Contributions

www.deri.ie

Logical independence of identity from network properties

General approach for identification

Structural properties are maintained

Existing knowledge is retained

Dynamic resilience of a P2P system under “churn”
GridVine: Peer Data Management
Digital Enterprise Research Institute
www.deri.ie
Searching semantically richer objects
in large scale heterogeneous networks
<xap:CreateDate>2001-12date?
19T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
<es:DofCreation> 05/08/2004 </es:DofCreation>
?
?
?
?
?
<myRDF:Date> Jan 1, 2005 </myRDF:Date>
➠ Lack of semantic interoperability
Information Heterogeneity
Digital Enterprise Research Institute
www.deri.ie
Syntactic discrepancies
ImageGUID
cDate
A0657B25
05.08.04
VS
<es:cDate> 05/08/2004 </es:cDate>
Semantic heterogeneity
Extensible standards (XML, RDF, XMP, PSA, WinFS...)
<rdf:Property rdf:ID="width">
<rdfs:label>Width</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
VS
<rdf:Property rdf:ID=“Length-Y">
<rdfs:label>Length-Y</rdfs:label>
<rdfs:subPropertyOf rdf:resource="#length"/>
</rdf:Property>
100s of evolving schemas for one particular domain (e.g.,
protein information, picture metadata)
➠
Shared representation is not enough
Integrating Data in Distributed
Databases
Digital Enterprise Research Institute
The Wrapper-Mediator architecture
www.deri.ie
Date 
Date
Integrating Data in the new Web
Ecology
Digital Enterprise Research Institute
Mediated Architectures
Large Scale Information Systems
(e.g., WWW))
Scale
Number of sources < 100
Number of sources > 1000
Uncertainty
Consistent Data
Uncertain Data
- Coordination
- Manually curated data
- Autonomy
- Semi-automatic creation of data
Schemas created by
administrators
Schemas created by end users
Relatively stable set of sources
Network churn
- stable mediator
- node failures
Sources known a priori
Unknown sources
Relational Data
Structured Schemas
Semi-structured data
Schematas
- Integrity constraints
- Few integrity constraints
Structured Queries
Simple S-P Queries
Dynamicity
Expressiveness
www.deri.ie
Peer Data Management Systems
Digital Enterprise Research Institute
www.deri.ie
date?
<es:cDate> 05/08/2004 </es:cDate>
<xap:CreateDate>2001-1219T18:49:03Z</xap:CreateDate>
<xap:ModifyDate>2001-1219T20:09:28Z</xap:ModifyDate>
es:cDate  xap:CreateDate
<myRDF:Date> Jan 1, 2005
</myRDF:Date>
Pairwise mappings
Peer Data Management Systems (PDMS)
Local mappings overcome global heterogeneity
Iterative query reformulation
3-Tier Network
Digital Enterprise Research Institute
www.deri.ie
Semantic
Mediation
Layer
Overlay
Layer
Internet
Layer
Jupp /
P-Grid
DHTs
Data-Centric P2P Systems
Digital Enterprise Research Institute
Piazza / Hyperion
More expressive mapping languages
– LAV-style query reformulation in P2P settings?
Network-intensive
– Large-scale deployment?
Perfect mappings
PIER
Scales a relational engine on top of a DHT
Fixed schema
RDFPeers
Indexes RDF triples in a DHT
No schemas
www.deri.ie
Credits
Digital Enterprise Research Institute
Karl Aberer
 Philippe Cudre-Mauroux
 Anwitaman Datta
 Roman Schmidt

www.deri.ie
Are you still awake?
Digital Enterprise Research Institute
www.deri.ie
Digital Enterprise Research Institute
Introduction to Peer-to-Peer
Networks – Part 2
Manfred Hauswirth, Marcel Karnstedt
 Copyright 2009 Digital Enterprise Research Institute. All rights reserved.
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
133 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
134 of XYZ
www.deri.ie
Examples
Digital Enterprise Research Institute


135
www.deri.ie
Huge heterogeneous data sets, collaboration,
dynamic, structured, deep, linked, …
Clique project:

Integrated structrured storage

Data pre-processing, integration, restructuring, indexing

Data access API and query processor for complex queries
Marcel Karnstedt
IFIP Database Meeting
Nicosia, Cyprus, 2009 135
Public Data Management
Digital Enterprise Research Institute
www.deri.ie
....Semantic & social Web, encyclopedias,
recommender systems ... “The world is a database“
 Datasets, which are



Maintained by large communities in a distributed way

Of public interest
„Homogenized“ database, extensible and flexible,
distributed, scalable, structured data and queries
136 of XYZ
Main Challenges
Digital Enterprise Research Institute
Data management
 Scalability and robustness
 Security, trust, fairness
 Guarantees, consistency, integrity




CAP-Theorem [Gilbert et al. 2002], ACID vs. BASE
Query expressiveness

DB-like queries with advanced functionality

Support of IR queries and similarity is mandatory

Schema-unaware queries and/or queries on schema
Efficient processing

Efficient query operators

Cost awareness in changing situations
137 of XYZ
www.deri.ie
Approaches
Digital Enterprise Research Institute
www.deri.ie
Who pays the load?
Who owns the data?
views over 100.000
data sources?
Do we trust them?
Sindice, YARS
Jena, Oracle
SW-Store
138 of XYZ
Mediator
Efficient
query processing?
PIER, PeerDB
AmbientDB
RDFPeers
UniStore
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
139 of XYZ
www.deri.ie
Influences
Digital Enterprise Research Institute
www.deri.ie
Robustness,
self-organization,
scalability
Efficient
lookups
P2P paradigm
DHTs &
SDDS
Index structures
Sensor
networks
Data streams
140 of XYZ
PDMS
Distributed DBS
Transparency,
query processing
The Big Picture
Digital Enterprise Research Institute
www.deri.ie
DB
Who wrote an article for cool
movies?
Wikipedia(article,author)
“Pulp Fiction”,”MK”
Del.icio.us(bookmark,tag,creator)
“http://…pfiction”,”cool”,”MKa”
DBPedia(link,wikilink,category)
“Pulp Fictoon”,”Q. Tarantino”,”movie”
141 of XYZ
Layers of Processing
Digital Enterprise Research Institute
www.deri.ie
Scheduling,
Adaptation,
Costs
Processing Strategies
Similarity /
Approximate
Operators
Query Operators
Multicast,
Aggregation,
Range
Routing
142 of XYZ
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
143 of XYZ
www.deri.ie
Universal Relation Model
Digital Enterprise Research Institute



www.deri.ie
Since the eighties: Model for simplified retrieval in
(relational) databases
Universal relation containing all attributes
Simplifies navigation over multiple relations during
query formulation
SW-Store
A1
144 of XYZ
A2
A3
B1
B2
C1
...
Triple Store
Digital Enterprise Research Institute
www.deri.ie
Universal relation model
 Storing each tuple as a set of triples (oid, attribute,
value)
 Similar to RDF: subject, predicate, object

OID Car
Mileage HP
Price
232
34.000
28.000
Volvo V70
180

232
Car
Volvo V70

232
Mileage
34.000

232
HP
180
232
Price
28.000
145 of XYZ

SW-Store
RDFPeers
Sindice, YARS
...
Extensible
Flexible
Self-descriptive
No need for
representing null values
Indexing
Digital Enterprise Research Institute



www.deri.ie
Indexing of attributes = key for Hashing
Which attributes? All!
For tuple (oid, v1, v2, ...) of R(OID, A1, A2, ...)
232
Car
Volvo V70
232
Mileage
34.000
232
HP
180
232
Price
28.000
h(oid)
 h(A1 || v1)
h(A2 || v2)
...


for object lookup
for Ai ≥ v
(prefix search)
...trade-off storage vs. performance
146 of XYZ
YARS
Hexastore
P-Grid : Range Queries
Digital Enterprise Research Institute

[Datta et al. 2005]
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
148 of XYZ
www.deri.ie
VQL
Digital Enterprise Research Institute


www.deri.ie
Query language VQL

Inspired by RDF query language SPARQL

Conjunctive queries

Enhanced by advanced operators

Operations both on instance and schema level
Basic query form
SELECT ?oid, ?val
WHERE { ?oid price ?val }
ORDER BY ...
LIMIT ...
SPARQL
149 of XYZ
Similarity Queries
Digital Enterprise Research Institute
www.deri.ie
WHERE { ?o attrib ?value
FILTER (edist(?value, v) < 2) }


Numerical similarity: value distance
String similarity:



Edit distance using (positional) q-Grams
[Gravano et al. 2001, Schallehn et al. 2004]
Requires additional key-value pairs in P-Grid

For each triple (oid, A, v)
–
–

LSH forest
SWAM
h(q-grami(A)) oid
h(q-grami(v)) oid
Approach for instance and schema level
150 of XYZ
String Similarity
Digital Enterprise Research Institute
www.deri.ie
0
0


0
1
...

1
...
...
00…0
1
11…1
“All values in distance d=1…”
Query range for attribute vs. query d+1 q-grams
[NetDB06]
151 of XYZ
More Operators
Digital Enterprise Research Institute

Similarity joins [NetDB06]
WHERE { ?o1 attr1 ?v1 . ?o2 attr2 ?v2
FILTER (edist(?v1, ?v2) < k) }

Ranking queries: top-k, skyline [DBRank07]
WHERE { ?o attr ?v }
ORDER BY ?v LIMIT k
WHERE { ?o attr ?v }
ORDER BY ?v NN “A String“ LIMIT k
WHERE { ?o attr1 ?x . ?o attr2 ?y}
SKYLINE OF ?x MIN, ?y MAX
152 of XYZ
www.deri.ie
String Similarity Joins
Digital Enterprise Research Institute
www.deri.ie
Doubled parallel
Doubled sequential
Cloud services
Parallel and sequential
153 of XYZ
Sequential and parallel
Skyline Queries
Digital Enterprise Research Institute

www.deri.ie
Objects that are not “dominated“ by other objects

Scoring function on multiple attributes, no weighting
dominated objects
price
mileage
154 of XYZ
Skylines: Basic Idea
Digital Enterprise Research Institute
 “Frame Skyline“ algorithm


over 2 dimensions
Minimum of first
dimension defines
maximum for second
dimension
Minima/Maxima provide
a frame narrowing the
search space
155 of XYZ
www.deri.ie
DSL
Skyframe
Skylines: Processing
Digital Enterprise Research Institute
www.deri.ie
y
...
...
...
Min y
Min x
x
Find minimum in one selective dimension x
2. Use y value of min(x) to limit search range
3. Use range query routing to build local skylines
4. Always ship current skyline with query
5. Determine global skyline at one peer
6. Optionally: distributed range querying “on the
way“
156 of
XYZ to min(x)
1.
Skylines: More Dimensions
Digital Enterprise Research Institute
 All projections to 2d
sub-spaces are skyline
candidates
 Objects of the searched
frame can dominate
projections
 Projections cannot
dominate objects of the
searched frame
157 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
158 of XYZ
www.deri.ie
Query Execution
Digital Enterprise Research Institute

www.deri.ie
Goal: stateless processing  „Push“ approach

Messages containing both plan and intermediate results
(based on Mutant Query Plans [Papadimos et al. 2002])

Receiver peer is identified by applying the hash function

Multiple instances of the plan travel trough the network
(A)
p0
(A)
{(A,1),(A,2)}
p1
B
(A)
B
B
p2
{(A,3),(A,4)}
p3
{(A,5),(A,6)}
159 of XYZ
{(A,2)}
PIER, PeerDB
DARQ
B
{(B,2),(C,1),(B,2),(C,4)}
p4
{(A,2,B,2,C,1), (A,2,B,2,C,4)}
{(A,3),(A,4)}
B
p5
{(B,5),(B,6)}
{(A,5)}
B
p0
{(A,5),(B, 5)}
Cost-Based Planning
Digital Enterprise Research Institute


www.deri.ie
[NetDB06, P2P06, DBRank07]
Find all values of attribute A in max. distance d=1
0
1
01
001
p1
p5
Query all peers for A in parallel or
sequence
#msgs = ml + |A|-1
p2
0
h(A)
1
Query d+1 q-grams in parallel
01
#msgs = (d+1)*ml
001
p1
p5
p2
h(A#q-gram2)
h(A#q-gram1)
160 of XYZ
ObjectGlobe
DARQ
Guarantees: Completeness
Digital Enterprise Research Institute

Problem:


“Fire and forget“ query strategy doesn‘t guarantee
complete results
Goal:

Allow to estimate result completeness
– For the user (“98% of all possible answers“)
– For blocking query operators (aggregators, ranking-based
operators) in order to guarantee a certain level of
completeness

Idea:

Not feasible on data level, but perhaps on peer level?!
161 of XYZ
www.deri.ie
Completeness Estimation
Digital Enterprise Research Institute



www.deri.ie
Estimation on peer level

Based on routing graphs and routing methods

Support of probabilistic guarantees
Accuracy improved by Milestone messages (MiMes)
[CIKM08, WIDM08]
Seaweed
...
Join(A=B)
Extract(A)sequ
Extract(B)range
Query graph
P1B
P2B
P3B
...
PmB
P1A
P2A
P3A
...
PnA
P0
Routing graph
PxY
162 of XYZ
Routing
level
Routing
point
CERQ: Initial Estimation
Digital Enterprise Research Institute
www.deri.ie
0
00
000
0000
1
01
P1
001
0001
P4
P0
P2
P3


Example: P-Grid range query 00100-1101 at P0
Predict trie on information from:
1) local path: 0001
2) local routing table (at least one node per level/sub-tree)
=> estimates 8 (out of 10)
...the better, the more information is kept in each routing table
 of[P2P07]
163
XYZ
CERQ: Estimation Refinement
Digital Enterprise Research Institute

The initial estimate might not be correct


www.deri.ie
Refinement by other (intermediate) peers
Piggy-back information


Query contains estimation of peers in sub-tree
Query replies can contain corrections
0

00
Sub-query P0 -> P3



Sub-tree 001*
Estimate: 3 peers
P3’s routing table
contains peer(s) of
sub-tree 0010*
164 of XYZ
000
0000
P4
1
01
P1
001
0001
P0
P2
P3
CERQ: Further Improvements
Digital Enterprise Research Institute


Use of structural replication

Peers have more than one entry per level

Each entry might have a different path

Every path allows peers to learn more about sub-trees

More entries mean better initial estimates
Use of caching

P2P networks are dynamic

Though the structure is likely to be stable

The learned structure can be cached for later estimates
165 of XYZ
www.deri.ie
CERQ: Other Overlays
Digital Enterprise Research Institute

SkipGraphs




Prefix hash tree (Chord)



Most similar to P-Grid
Routing information of multiple levels
Unknown number of peers in bucket layer
Peers build a tree-hierarchy
Only applicable if number of children is known
CAN and Mercury


Forwarding along neighbors
No estimation can be given
=> The idea can be mapped under certain conditions
166 of XYZ
www.deri.ie
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
167 of XYZ
www.deri.ie
Representing Mappings
Digital Enterprise Research Institute

www.deri.ie
Simple kind of attribute correspondences
subsumes
equiv
A1



A2
A3
A4
A5
Triple representation
(A4, equiv, A5)
(A3, subsumes, A6)
Extensible to ontologies and views
[Ideas08]
168 of XYZ
A6
PeerDB
SQPeer
Query Expansion
Digital Enterprise Research Institute
www.deri.ie
Unexpanded query
GridVine
PDMS
Map operators added
169 of XYZFirst
mapping
Expanded query
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
170 of XYZ
www.deri.ie
UniStore
Digital Enterprise Research Institute

[ICDE07]
www.deri.ie
Evaluation: Similarity Joins
Digital Enterprise Research Institute
c1:
 c2:
 c3:
 c4:
 c5:

seq & par/seq
par & par/seq
seq & par/par
par& par/par
par & local
172 of XYZ
www.deri.ie
Evaluation: CERQ
Digital Enterprise Research Institute
1 reference
references


www.deri.ie
3 references
5
Estimation is correct after a few corrections
More references lead to (as expected)



Better initial estimate
Less corrections
Smaller errors
Better estimates for smaller ranges (q1 and q2)
 Replication factor 1 (similar results for factor 2)

173 of XYZ
Evaluation: Completeness
Digital Enterprise Research Institute
Min, max, avg 74 peers
174 of XYZ
Without MiMes
www.deri.ie
Min, max, avg 50 peers
With MiMes
Outline – Part 2
Digital Enterprise Research Institute
The Vision: A Universal Storage
for Web Data
 A Distributed Universal Storage




Data & Query Model

Operators

Query Engine

Mappings & Query Expansion
The Praxis: Implementation
Summary & Outlook
175 of XYZ
www.deri.ie
Summary & Outlook
Digital Enterprise Research Institute

www.deri.ie
Web data is huge, heterogeneous, structured,
linked


Modern applications require a universal and flexible
storage
DB-like and RDF-liking
DHTs well-suited for large-scale data management
 UniStore as one solution







Robust and scalable, universal and light-weight
Sophisticated query capabilities
Adaptive, cost-based, stateless and parallel QP
Guarantees, semantic layer
…and all on totally decentralised and self-organising P2P!!
Open issues
Privacy & Trust, reputation
176 of
XYZ
Guarantees, consistency, integrity

Acknowledgements
Digital Enterprise Research Institute
www.deri.ie

Kai-Uwe Sattler, Manfred Hauswirth, Katja Hose,
Roman Schmidt, Renault John, Brahmananda
Sapkota, Conor Hayes, ...

Students: Martin Richtarsky, Michael Haß, Jessica
Müller, Mario Wiegandt, Stefan Schwalm, Matthias
Marx, Thomas Kreyling, ...

Supported by the Science Foundation Ireland under
Grant No. SFI/08/CE/I1380 (Lion-2) and under
Grant No. 08/SRC/I1407 (Clique: Graph & Network
Analysis Cluster)
177 of XYZ
Related Systems
Digital Enterprise Research Institute











www.deri.ie
Sindice: Sindice. The semantic web index. http:// sindice.com/
YARS: A. Harth, J. Umbrich, A. Hogan, S. Decker, Yars2: A federated repository for
querying graph structured data from the web, in: Proc. of ISWC/ASWC, 2007.
Jena: Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrieval
in Jena2. In: SWDB, pp. 131–150 (2003)
Oracle: Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL- based RDF
querying scheme. In: VLDB, pp. 1216–1227 (2005)
SW-Store: D. J. Abadi, A. Marcus, S. R. Madden, K. Hollenbach. SW-Store: a vertically
partitioned DBMS for Semantic Web data management. The VLDB Journal (2009) 18:385–
406
PIER: R. Huebsch, J. M. Hellerstein, N. Lanham, B. Thau Loo, S. Shenker, and I. Stoica.
Querying the Internet with PIER. In VLDB’03, pages 321–332, 2003.
RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based
on a structured peer-to-peer network. In WWW’04, pages 650–657, 2004.
PeerDB: W. S. Ng, B. Ch. Ooi, and K.-L. Tan. PeerDB: A P2P-based System for Distributed
Data Sharing. In ICDE ’03, pages 633–644, 2003.
AmbientDB: P. Boncz and C. Treijtel. AmbientDB: Relational Query Processing in a P2P
Network. In Workshop On Databases, Information Systems and P2P Computing,
(DBISP2P’03), pages 153–168, 2003.
Hexastore: Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for
semantic web data management. VLDB, 2008
SPARQL: E. Prud’hommeaux and A. Seaborne. SPARQL Query Language for RDF, 2006.
W3C Candidate Recommendation.
178 of XYZ
Related Systems /2
Digital Enterprise Research Institute
www.deri.ie
LSH forest: M. Bawa, T. Condie, and P. Ganesan. LSH forest: self-tuning indexes for
similarity search. In WWW’05, pages 651–660, 2005.

SWAM: F. Banaei-Kashani and C. Shahabi. SWAM: a family of access methods for
similarity- search in peer-to-peer data networks. In CIKM’04, pages 304–313, 2004.

Cloud services: M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Building
a database on S3. In SIGMOD ’08, pages 251–264, 2008.

DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal, and A. El Abbadi. Parallelizing skyline
queries for scalable distribution. In EDBT’06, pages 112–130, 2006.

Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi, A. K. H. Tung, and L. Xu. Skyframe: a
framework for skyline query processing in peer-to-peer systems. The VLDB Journal,
18(1):345–362, 2009.

DARQ: B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In
ESWC’08, pages 524–538, 2008.

ObjectGlobe: R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam,
and K. Stocker. Ob jectGlobe: Ubiquitous query processing on the Internet. VLDB Journal,
10(1):48–71, 2001.

Seaweed: D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware querying
with seaweed. In VLDB’06, pages 727–738, 2006.

SQPeer: G. Kokkinidis, E. Sidirourgos, and V. Christophides. Query Processing in RDF/SBased P2P Database Systems. Semantic Web and Peer-to-Peer, chapter 4, pages 59–81.
Springer, 2006.

GridVine: K. Aberer, P. Cudré-Mauroux, M. Hauswirth, and T. Van Pelt. GridVine:
Building Internet-Scale Semantic Overlay Networks. In ISWC’04, pages 107–121, 2004.

PDMS: A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: data management infrastructure for semantic web applications. In WWW’03, pages 556–567, 2003.
179 of XYZ

Thank you!
Digital Enterprise Research Institute
www.deri.ie
[CIKM08]: M. Karnstedt, K.Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt: Estimating the
Number of Answers with Guarantees for Structured Queries in P2P Databases, CIKM 2008,
Napa, USA.

[WIDM08]: M. Karnstedt, K. Sattler, M. Haß, M. Hauswirth, B. Sapkota, R. Schmidt:
Approximating Query Completeness by Predicting the Number of Answers in DHT-based Web
Applications, WIDM'08 icw CIKM'08, Napa, USA, 2008.

[Ideas08]: M. Karnstedt, K.Sattler, M. Hauswirth, B. Sapkota, R. Schmidt: Ad-hoc Integration and
Querying of Semantic Web Data, Ideas 2008, Coimbra, Portugal.

[DBRank07]: M. Karnstedt, J. Müller, K. Sattler, Cost-Aware Skyline Queries in Structured
Overlays, DBRank‘07@ICDE 2007, Istanbul, Turkey.

[ICDE07]: M. Karnstedt, K.Sattler, M. Richtarsky, J. Müller, M. Hauswirth, R. Schmidt, R. John:
UniStore: Querying a DHT-based Universal Storage, ICDE 2007 Demonstration.

[P2P07]: M. Karnstedt, K.Sattler, R. Schmidt: Completeness Estimation of Range Queries in
Structured Overlays, P2P 2007, Galway, Ireland.

[P2P06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Cost-Aware Processing of Similarity
Queries in Structured Overlays, P2P 2006, Cambridge, UK

[NetDB06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Similarity Queries on Structured
Data in Structured Overlays, NetDB'06 @ ICDE 2006, Atlanta, GA.

[Gilbert et al. 2002]: S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of
consistent, available, partition-tolerant web services. SIGACT News, 33(2):51–59, 2002.

[Datta et al. 2005]: A. Datta, M. Hauswirth, R. Schmidt, R. John, and K. Aberer. Range queries in
trie- structured overlays. In P2P’05, pages 57–66, 2005.

[Papadimos et al. 2002]: V. Papadimos, D. Maier. Mutant Query Plans. Information and
Software Technology, 44(4):197–206, April 2002.

[Schallehn et al. 2004]: E. Schallehn, I. Geist, K. Sattler: Supporting Similarity Operations based
on Approximate String Matching on the Web, CoopIS 2004, Larnaca.

[Gravano et al. 2001]: L. Gravano et al.; Approximate String Joins in a Database (almost) for
VLDB 2001, Roma.
180Free,
of XYZ
