Transcript Document

ALGORITHMS FOR PERFORMANCE AND
TRUST IN PEER-TO-PEER SYSTEMS
Hui Zhang
University of Southern California
Outline
• Introduction to Peer-to-Peer (P2P) systems:
three layers in system design.
• Algorithms for performance on the overlay
routing layer.
• Algorithms for performance on the underlying
network layer.
• Algorithms for trust at the application layer.
• Conclusion & future work.
7/16/2015
Ph.D. thesis defense
2
P2P Networked systems
• A collaborating group of Internet end-hosts
which overlay their own special-purpose network
atop the Internet.
• Examples
– file sharing
[Napster,Gnutella]
; - anonymous publishing
[Freenet]
;
– distributed storage [Dabek2001,Kubiatowicz2000,IVY,PAST];
– web caching [Iyer2002]; - DoS attack prevention[Keromytis2002];
– application-layer multicast [Castro2003,Ratnasamy2001,Zhuang2001];
– naming
[Cox2002,Balakrishnan2004]
– indirection services
7/16/2015
; - event notification[Scribe];
[Stoica2002]
, etc.
Ph.D. thesis defense
3
Challenges in P2P system design
• Allow rapid deployment through self organization;
• Scale with increasing network size;
• Adapt to dynamics from both the underlying
network and the application layer.
7/16/2015
Ph.D. thesis defense
4
Three layers in a P2P system design
Application layer request
Application-level
Overlay routing
Overlay-level
lookup
layer
Underlying network routing
layer
7/16/2015
Ph.D. thesis defense
5
Algorithmic issues in P2P system design
•
Overlay routing layer
–
•
Underlying network layer
–
•
Distributed hash table algorithms.
Exploiting network proximity and Internet topology.
Application layer
–
7/16/2015
Distributed and robust rating algorithms.
Ph.D. thesis defense
6
Outline
• Introduction to Peer-to-Peer (P2P) systems:
three layers in system design.
• Algorithms for performance on the overlay
routing layer.
• Algorithms for performance on the underlying
network layer.
• Algorithms for trust at the application layer.
• Conclusion & future works.
7/16/2015
Ph.D. thesis defense
7
Distributed Hash Tables
• Support a hash table-like functionality on
Internet-like scale
 A global key space: each data item is a key in the space,
and each node is responsible for a portion of the key
space.
 Given a key, map it onto a node.
• Examples: Freenet [Clarke et al. 2001], CAN[Ratnasamy et al. 2001],
Chord [Stoica et al. 2001], Pastry[Rowstron et al. 2001], Tapestry[Zhao
et al. 2001], Kademlia[Maymounkov et al. 2002], Viceroy[Malkhi et al.
2002], Koorde[Kaashoek et al. 2003], etc.
7/16/2015
Ph.D. thesis defense
8
Small-world model
[Kleinberg1999]
• Small-world graphs have
– short-distance clustering (like regular graph)
– long-distance shortcuts (result in short global path
length like random graph)
• How to find short paths in a distributed fashion?
Local contact
log2(N) average routing path length
Shortcut
Probability 1/j
i
i+j
An one-dimensional small-world network example
7/16/2015
Ph.D. thesis defense
9
Small-world Freenet
•
Files are identified by binary file keys obtained by
applying a hash function.
•
Each node maintains a datastore and a routing table
of <key,pointer> values.
•
Greedy forwarding search with backtracking.
•
Enhanced-clustering cache replacement
1.
Each node chooses a seed randomly when joining the network
2.
When a new key (file) u is to be cached, the node chooses in the
current datastore the key v farthest from the seed
7/16/2015
-
If Distance (u, seed) < Distance (v, seed), cache u and evict v with probability 1-P
(clustering)
-
If Distance (u, seed) > Distance (v, seed), cache u and evict v with probability P
(randomness).
Ph.D. thesis defense
10
Small-world Freenet - analysis
• The expected steps to deliver a message in the
idealized small-world Freenet is O(log N) if the
routing table size is (log2 N), where N is the
network size.
System
Avg. hops per Request
Avg. Routing table size
Kleinberg’s model
O(log2 N)
O(1)
CAN
O(dN1/d)
O(d)
CHORD
O(log N)
O(log N)
Tapestry
O(log N)
O(log N)
7/16/2015
Ph.D. thesis defense
11
Small-world Freenet – routing structure
•
When i has log(N) shortcuts, its routing table
resembles that of a Chord node in a probabilistic
way!
Prob[a shortcut falls into the range i+N/2+1 and i+N]
Prob[a shortcut falls into the range i+1 and i+N/2]
Prob[a shortcut falls into the range i+N/4+1 and i+N/2]
Prob[a shortcut falls into the range i+1 and i+N/4]
Distance
[1,
N/4] range[N/4+1,
[1, N/2]N/2]
log(N)
shortcuts
i
i+1
i+N/4

1
log(N )  1

1
log(N )  2
Distance range [N/2+1, N]
i+N/2
i+N
Node i’s shortcut distribution on the distance space
7/16/2015
Ph.D. thesis defense
12
Outline
• Introduction to Peer-to-Peer (P2P) systems:
three layers in system design.
• Algorithms for performance on the overlay
routing layer.
• Algorithms for performance on the underlying
network layer.
• Algorithms for trust at the application layer.
• Conclusion & future works.
7/16/2015
Ph.D. thesis defense
13
Frugal routing
•
A Distributed Hash Table routing scheme is frugal if
–
The search space for the key decreases by a constant factor after
each lookup hop;
–
The next hop after each intermediate node x in the route depends
only on x and the destination.
•
Examples: small-world Freenet, Chord, Pastry, Tapestry.
•
Latency stretch [Ratnasamy et al. 2001]
=
7/16/2015
The lookup latency on the overlay topology between two nodes
The unicast latency on the underlying topology
Ph.D. thesis defense
14
Latency stretch vs. latency expansion
•
Theorem 1
If the underlying topology G is drawn from a family of graphs
with exponential latency expansion, then the expected
latency stretch of any frugal routing scheme is (logN),
where N is the network size.
•
Theorem 2
If
(1) the underlying topology G is drawn from a family of
graphs with d-power-law latency expansion, and
(2) for each node u in a frugal routing network, it samples
(log N)d nodes in each range with uniform randomness and
keeps the pointer to the nearest node for future routing,
then the expected latency stretch of a request is O(1).
7/16/2015
Ph.D. thesis defense
15
Lookup-Parasitic Random Sampling
1. Recursive lookup.
2. Each intermediate hop appends its IP address
to the lookup message.
3. When the lookup reaches its target, the
target informs each listed hop of its identity.
4. Each intermediate hop then sends one (or a
small number) of pings to get a reasonable
estimate of the latency to the target, and
update its routing table accordingly.
5. When the target key is random to the initial
node, a sample on each range (of some node)
happens with the same probability 1/2.
7/16/2015
Ph.D. thesis defense
16
Internet latency expansion
•
The performance of many other networking algorithms relies on
the latency expansion characteristic of the underlying network.
•Request latency reduction in web cache systems [Plaxton et al. 1997 ].
•Nearest neighbor search [Karger et al. 2002] .
•Locality-aware DHT design [Abraham et al. 2004] .
•Gossip-based communication mechanisms [Kempe et al. 2002].
•
The Internet router-level topology has an exponential expansion
[Phillips et al. 1999].
•The expansion is defined on router-level hops
•How about the expansion in terms of latency?
7/16/2015
Ph.D. thesis defense
17
Internet latency expansion:
measurement methodology
•
Collected two router-level topologies.
- one in May 2002 with 328378 routers, and the other in November
2003 with 356648 routers.
•
Randomly sampled about 100,000 node pairs from each
topology and used their latency to estimate Internet
latency expansion.
•
Approximated link latency between any two nodes by the
accumulated geographic distance of the path between the
two nodes in shortest path routing.
- assign geo-locations to nodes using the Geotrack tool.
7/16/2015
Ph.D. thesis defense
18
Internet router-level topology:
latency expansion
•
The hop-count expansion of the Internet router-level
topology is exponential;
•
The latency expansion of the Internet router-level topology
is power-law and has an exponent between 1 and 2.
Internet latency expansion (in log-log)
7/16/2015
Comparison of the two topologies (in log-log)
Ph.D. thesis defense
19
Latency expansion at the city level
• Nu(x)  Cu(x)
– Nu(x): the latency expansion function defined on routers.
– Cu(x): the latency expansion function defined on cities.
The Internet link
topology has low
expansion rate after
being embedded into
the two-dimensional
geographical sphere.
Comparison of latency expansions on
router and city levels
7/16/2015
Ph.D. thesis defense
20
Outline
• Introduction to Peer-to-Peer (P2P) systems:
three layers in system design.
• Algorithms for performance on the overlay
routing layer.
• Algorithms for performance on the underlying
network layer.
• Algorithms for trust at the application layer.
• Conclusion & future works.
7/16/2015
Ph.D. thesis defense
21
Autonomous peers and selfish behaviors
•
Peers have many options to control their own
participation in an open P2P system.
–
•
Resource configuration parameters, file re-sharing decision,
on-off switch, ….
Peers are prone to show selfish behaviors when
there is no incentive to cooperate.
–
7/16/2015
Free-riding phenomenon [Adar et al. 2000][Saroiu et al. 2001].

Most files (e.g., 98% reported in [Adar et al. 2000]) belong to a small
percentage of the users (20\%, respectively).

A majority of the users are ``one-time, one-hour'' users.
Ph.D. thesis defense
22
Building social trust in P2P users
•
Reputation systems [Okita2003]
–
A means of describing social trust networks.
–
A rating system is used to produce a consensus
about the merit of any given member.
–
Effective at
–

Incentivizing user cooperation.

Isolating malicious users.

Adjudging node reliability, …
On-line examples:

7/16/2015
Livejournal, Friendster, eBay, Advogato.
Ph.D. thesis defense
23
Eigenvector-based reputation systems
•
•
A referential link structure
–
Nodes represent entities (users, merchants, authors of
blogs.)
–
Links represent endorsement of one user by another.
Eigenvector or stationary distribution based
rating schemes.
–
7/16/2015
HIST [Kleinberg1999], PageRank [Brin et al. 1998], etc.
Ph.D. thesis defense
24
PageRank
[Brin1998]
• A rating scheme to rank hypertext documents on
the WWW.
• An iterative algorithm to calculate the importance
of a web page based on the importance of its
parent pages.
• Can be applied to other systems than WWW.
7/16/2015
Ph.D. thesis defense
25
PageRank: random walk model
With prob. (1-), I
With prob. , I will
will continue the
restart the walk at
walk to a random
a random node.
successor node.
node
referential link
The walker
: resetting probability
X
1/2
Y
1/3
Z
• As time goes on, the expected percentage of
steps the walker is at each node v converges to
the PageRank weight PR(v).
7/16/2015
Ph.D. thesis defense
26
P2P rating schemes
• Design issues
– “distributed” rating
– “collusion-proof” rating
7/16/2015
Ph.D. thesis defense
27
A supervisor–based distributed
implementation of PageRank
•
In a supervisory directed graph
–
Each user v has a designated supervisor to calculate
PR(v).
–
Any user is random to its supervisors.
–
No small supervising loop exists.
–
There is a fast reactive approach for any user j to
deliver a message to any other user i’s supervisors,
and the path never includes i.
7/16/2015
Ph.D. thesis defense
28
A Chord supervising overlay
00
Network user
256
Supervising
Pointer
224
32
Each user has a
O(logN)-entry routing table
and a O(logN)-successor list.
All routing operations go clockwise
64
192
96
160
128
A Chord network with 8 users and 8-bit key space
7/16/2015
Ph.D. thesis defense
29
PageRank: is it collusion-proof?
• Can a node easily boost its rank by manipulating
its out-going (endorsing) links with others’?
I’m not colluding!
7/16/2015
I’m not colluding!
I’m not colluding!
Ph.D. thesis defense
I’m not colluding!
30
Amp(G): a metric on group collusion
: resetting probability
WG(G’) =PR(i)+PR(j)
real group weight
y
x
G
PR(x) PR(y)
4
3
G’
PR(x)
Win(G’) = 3 (1-) + PR(y) (1-)
2
2
(1-W(G’))
+
PR(y)
4
N
j
i
“actual” group weight
• In the system of node group G, for a subgroup G’,
W (G ' )
the amplification factor Amp(G’) =WG (G' )
– WG (G' ) 
– Win (G' ) 
7/16/2015
in
 PR(i)
i: iG '

( i , j ): iG ',
PR(i )
| G' |
(1   ) 
(1  WG (G' ))
|G |
i  j out(i )
jG ',
Ph.D. thesis defense
31
Answer for (1+1 = ?) in PageRank
• In the original PageRank system,
G '  G , Amp (G ' ) 
2

where  is the resetting probability.
Specifically ,
7/16/2015
when
| G' |
1
 1, Am p(G' ) 
|G |

Ph.D. thesis defense
32
Two experimental topologies
• W, a Web link topology
– Contains the link structure of upwards of 80 million URLs.
– Source: the Stanford WebBase.
• B, a weblog blogrolling topology
– Contains the blogrolling structure of upwards of 72,000
blogs.
– Source: www.blogstreet.com, the XML-RPC webblog
service.
7/16/2015
Ph.D. thesis defense
33
Experiment: Collusion200
• Model a small number of web pages simultaneously
colluding.
• Methodology:
– 100 colluding groups;
– Each colluding group has the circle topology consisting of
two nodes with adjacent ranks;
– Arbitrarily chose nodes originally ranked around 1000th,
2000th, …, 100000th.
–  = 0.15.
7/16/2015
Ph.D. thesis defense
34
Experiment result of Collusion200 (I)
W - Amplification factors of the 100
colluding groups in Collusion200.
7/16/2015
Ph.D. thesis defense
35
Experiment result of Collusion200 (III)
W – new PR rank after Collusion200.
7/16/2015
Ph.D. thesis defense
36
There is a long flat portion…
The PR weight distribution of 4 topologies.
7/16/2015
Ph.D. thesis defense
37
Next step: how to detect collusions?
• Theorem on group detection hardness.
Max G’G Amp(G’) is a NP-Hard problem.
7/16/2015
Ph.D. thesis defense
38
An observation on collusion behaviors
•
To increase their PR weight, i.e., the stationary
weight in the random walk, the colluding nodes will
stall the random walk.
G
G’
•
When the resetting probability  increases, the colluding
nodes must suffer a significant drop in PR weight.
•
Therefore, we expect the PR weight of colluding nodes
to be highly correlated with 1/  (the average walk
length), while that of non-colluding nodes is relatively
insensitive to the change in .
7/16/2015
Ph.D. thesis defense
39
Adaptive-resetting scheme
•
Part I – collusion detection:
Given the topology, calculate the PR vector under different  values.
 {] = {0.0375, 0.05, 0.075, 0.15, 0.3, 0.45, 0.6], default = 0.15.
Calculate the correlation coefficient between the curve of each node
x's PR weight and the curve of 1/ . Label it as co-co(x).
•
Part II –  personalization:
Calculate each node x's out-link personalized- = F(default, co-co(x)).
Exponential function FExp=
(1.0coco( x ))
.
 default
Linear function FLinear= default+(0.5-default)*co-co(x)
The final PR weight vector is calculated with these personalized
resetting values.
7/16/2015
Ph.D. thesis defense
40
Experiment result of Collusion200 (IV)
W - Amplification factors of the 100
colluding groups in Collusion200.
7/16/2015
Ph.D. thesis defense
41
Experiment result of Collusion200 (VI)
W – new PR rank after Collusion200.
7/16/2015
Ph.D. thesis defense
42
Topology analysis on W and B
a small loop of two
top nodes in W
7/16/2015
a star-like sub-graph in B
Ph.D. thesis defense
43
New top-25 URL list in W
7/16/2015
Ph.D. thesis defense
Dropped out
Dropping
New
44
Conclusion
• A set of algorithms for performance and trust
in P2P systems.
– Their technical merits were well acknowledged.
– They have or will been implemented in real applications.
– Small-world Freenet source code implemented and
tested in Freenet.
– LPRS-Chord in grid computing [Min et al. 2004].
– Collusion-robust reputation systems for Weblog
community.
7/16/2015
Ph.D. thesis defense
45
Future work
• Building dynamic virtual communities with P2P
techniques.
– An unified information management infrastructure to
accommodate and connect diverse virtual communities.
• Web study from algorithmic perspective.
– Web ranking
– Web community detection.
7/16/2015
Ph.D. thesis defense
46
Publications
–Improving Eigenvector-based Reputation Systems Against Collusion. With Ashish Goel, Ramesh
Govindan, Kahn Mason, and Benjamin Van Roy. Invited for Special issue of Journal of Internet
Mathematics for WAW04, under review.
–Improving Lookup Latency in Distributed Hash Table Systems using Random Sampling. With
Ashish Goel and Ramesh Govindan. To appear in ACM/IEEE Transactions on Networking.
–An Empirical Evaluation of Internet Latency Expansion. With Ashish Goel and Ramesh Govindan.
To appear in ACM SIGCOMM Computer Communication Review.
–Using the Small-World Model to Improve Freenet Performance. With Ashish Goel and Ramesh
Govindan. Computer Networks Journal. Volume 46, Issue 4, Page 555-574, November 2004.
–Advanced Query Techniques for Wide-Area Network Monitoring . With Xin Li, Fang Bian,
Christophe Diot, Ramesh Govindan, Wei Hong, and Gianluca Iannacoone. The first IEEE
International Workshop on Networking Meets Databases, 2005.
–Fast, Memory-Efficient Traffic Estimation by Coincidence Counting . With Fang Hao,
Muralidharan S. Kodialam, T. V. Lakshman. IEEE INFOCOM 2005.
–Making Eigenvector-based Reputation Systems Robust to Collusion . With Ashish Goel, Ramesh
Govindan, Kahn Mason, and Benjamin Van Roy. The third Workshop on Algorithms and Models for
the Web Graph, October 2004.
–The Design of A Distributed Rating Scheme for Peer-to-peer Systems . With Debojyoti Dutta,
Ashish Goel and Ramesh Govindan. The first Workshop on Economic Issues in Peer-to-Peer
Systems, Berkeley, CA (June 5-6, 2003).
–Incrementally Improving Lookup Latency in Distributed Hash Table Systems . With Ashish Goel
and Ramesh Govindan. appeared in ACM SIGMETRICS, 2003.
–Using the Small-World Model to Improve Freenet Performance. With Ashish Goel and Ramesh
Govindan. appeared in IEEE INFOCOM, 2002.
7/16/2015
Ph.D. thesis defense
47
Thanks!
7/16/2015
Ph.D. thesis defense
48
backup
7/16/2015
Ph.D. thesis defense
49
In a Freenet node
Node x
7/16/2015
datastore
Ph.D. thesis defense
routing table
50
In a Freenet node
File 22, with data source as node A
Node x
7/16/2015
datastore
Ph.D. thesis defense
routing table
51
In a Freenet node
Node x
datastore
File 22
7/16/2015
Ph.D. thesis defense
routing table
22
A
52
In a Freenet node
File 75, with data source as node B
Node x
datastore
File 22
7/16/2015
Ph.D. thesis defense
routing table
22
A
53
In a Freenet node
Node x
datastore
File 75
File 22
7/16/2015
Ph.D. thesis defense
routing table
75
22
B
A
54
In a Freenet node
File 10, with data source as node C
Node x
datastore
File 75
File 22
7/16/2015
Ph.D. thesis defense
routing table
75
22
B
A
55
In a Freenet node
Node x
datastore
File 10
File 75
File 22
7/16/2015
Ph.D. thesis defense
routing table
10
75
22
C
B
A
56
In a Freenet node
File 43, with data source as node D
Node x
datastore
File 10
File 75
File 22
7/16/2015
Ph.D. thesis defense
routing table
10
75
22
C
B
A
57
In a Freenet node
File 43, with data source as node D
Node x
datastore
File 10
File 75
7/16/2015
Ph.D. thesis defense
routing table
10
75
22
C
B
A
58
In a Freenet node
Node x
datastore
File 43
File 10
File 75
7/16/2015
Ph.D. thesis defense
routing table
43
10
75
22
D
C
B
A
59
In a Freenet node
request for file 70
Node x
datastore
File 43
File 10
File 75
7/16/2015
Ph.D. thesis defense
routing table
43
10
75
22
D
C
B
A
60
In a Freenet node
request for file 70
Node x
datastore
File 43
File 10
File 75
7/16/2015
Ph.D. thesis defense
routing table
43
10
75
22
D
C
B
A
61
In a Freenet node
Node B
request for file 70
Node x
datastore
File 43
File 10
File 75
7/16/2015
Ph.D. thesis defense
routing table
43
10
75
22
D
C
B
A
62
Latency stretch in frugal routing
• =
latency for each lookup on the overlay topology
average latency on the underlying topology
• In frugal routing, (logN) hops per lookup in average
– (logN) stretch with no optimization.
– Could it be done better, e.g., O(1) stretch, without much
change?
7/16/2015
Ph.D. thesis defense
63
Term definition: Latency expansion
• Let Nu(x) denote the number of nodes in the
network G that are within latency x of node u.
- Power-law latency expansion: Nu(x) grows (i.e. ``expands'‘)
proportionally to xd, for all nodes u.
Examples: ring (d=1), mesh (d=2).
- Exponential latency expansion: Nu(x) grows proportionally
to x for some constant  > 1.
Examples: random graphs.
7/16/2015
Ph.D. thesis defense
64
LPRS-Chord:
topology with power-law expansion
Ring Stretch
(at time 2logN)
7/16/2015
Ph.D. thesis defense
65
LPRS-Chord:
convergence time
Convergence Time
7/16/2015
Ph.D. thesis defense
66
LPRS-Chord on Internet subgraphs
Stretch on the router-level graph (at time 2logN)
7/16/2015
Ph.D. thesis defense
67