L27 - Cornell University
Download
Report
Transcript L27 - Cornell University
Peer to Peer Systems and File Sharing
Carl Lagoze – CS431 – Cornell University
May 3, 2004
Portions borrowed from sources
listed on next slide
Sources of this lecture
• J. Berkes, Decentralized Peer-to-Peer
Network Architecture: Gnutella and Freenet
• R. Morris, Chord+DHash+Ivy: Building
Principled Peer-to-Peer Systems
• S. Kamvar, M. Schlosser, H. Garcia Molina,
EigenRep: Reputation Management in P2P
Networks
• J. Golbeck, B. Paris, J. Hendler, Trust
Networks on the Semantic Web
Characteristics of P2P Network
• Sharing of computing resources by direct
exchange
• Blur between clients, servers, routers
• Nodes are autonomous
P2P Advantages
•
•
•
•
•
Efficient use of resources
Scalability
Reliability
Administrative simplicity
Democracy
P2P’s Political History
• Major basis in music sharing context
• Overshadows numerous applications
• Recent research is investigating generic
applicability
– DHTs
– Reputation and Trust
Small-World Phenomenon
• Milgram’s “six degrees of separation (1967)
• Forwarding of letters from Nebraska to Boston, MA
• Average chain 6 of six hops
Power Laws and Small Worlds
• Out-degree distribution is:
– 1/Kα where α > 0
• Characteristics of a variety of phenomenon
–
–
–
–
–
Web Graph
IMDb connection (acted in same movie)
Social interactions
P2P networks (Gnutella)
Epidemiology
Strength of Weak Ties
• Extension of power-law
phenomenon
• Short-cuts (between
cliques) critical to small
world phenomenon
Napster and P2P
• Not really P2P
• Central search index
• Direct interaction for
access (p2p)
• Central index was key to
litigation
Gnutella
• Fully P2P
• Flooded query
– Scalability problems
– TTL controls broadcast
– “Query Memory” controls
circularity
• Reliability problems
• But “whom to sue?”
Kazaa
• Hybrid of Napster model and Gnutella model
• Notion of a super peer
– Like a regionalized napster server
– Dynamically chosen by characteristics
• P2P relationship among super-peers
• Queries directed towards super-peers
“Free-riding”
• Definition: downloading but not sharing any
data
• On gnutella networks 15% of users contribute
94% of content
Freenet
• Goal: create an uncensorable and secure global
information store
– Anonymity and fault tolerance
• http://freenet.sourceforge.net/
• Three types of network messages:
– Advertise storage space to store unknown data
– Insert a file to the network
– Request a file (with a key) from the network
• Use of one-way secure hashes to identify files and
encryption to store files
– Node does not know what it is storing
• Non-traceability of messages
– A node can not determine where its message is stored
Freenet Request/Response Sequence
Distributed Hash Tables (DHT)
•
•
•
•
Overcoming the flooded search problem
Operationally like standard hash tables
Data is distributed around the network
Features
– Efficient:
• O(log N) messages per lookup
• Even distribution of keys among nodes
– Adaptable
• Network reconfiguration does not cascade to all nodes
– Robust: replication of tables provides survival to
node failures
Chord
• One implementation of DHT within a larger
P2P project
• http://www.pdos.lcs.mit.edu/chord/
• Algorithm properties
– Common hash function distributes node ID (IP) and
document ID uniformly
– Maps a content key to its node successor
Chord Key Mapping
Key ID Node ID
K100 N100
N10 K5, K10
Circular
ID Space
N32 K11, K30
K65, K70 N80
N60
K33, K40, K52
Robustness via each node remembering N successors and replicating table
at successors
Use of finger table to avoid linear
lookups
ith finger table
position points
to first node
that succeeds
n by at least
2i+1
Key location with finger table
• Use finger
table to find
furthest
node that
precedes
key
• O(logN)
hops leads
to target
From DHTs to P-trees
• DHTs only support equality queries
– Return the value of resources with ID1
• Need to support range queries
– OAI type query, find all nodes resources that were
changed between D1 and D2
• P-tree reuses aspects of fault-tolerant ring
of Chord with logarithmic search properties
for equality and range queries.
Pastry Project
• Factors in network locality as part of DHT
algorithm
• http://research.microsoft.com/~antr/Pastry/
Identity, Trust, Reputation
• Identity
– Who is making a statement
– Certificates, PKI
• Trust
– Can I believe the person who is making the
statement
– PGP Web of Trust
• Reputation
– What is the history of trust in the person making
the statement
– Reputation management
Reputation Issues
• Small world phenomenon makes web of trust
feasible
• Reputation is context specific
– I can be trusted with questions about OAI-PMH
– Can you trust me belaying for you?
Simple reputation network
A
C
B
•A knows and trusts B
•B knows and trusts C
•A can infer trust for C
Reputation Inference Algorithm
• Begin at source (node seeking a reputation)
• Poll each of neighbors whose reputation it
trusts
– Ignore neighbors with bad reputation
• Have each neighbor recursively find
reputation of sink (node for which reputation
is sought)
Accuracy of inferences
• Incorrect bad rating by a node has minimal effect
– Will be dropped from path in reputation seeking
– Will be overcome by “correct” good rating by another node.
• Incorrect good rating by a node can have cascading
effect
– Can cause ratings of good nodes to be ignored through lies
– Serious threat to network
• Good trust algorithm minimizes effect of “bad nodes”
From Golbeck and Hendler
Trellis
• http://trellis.semanticweb.org
• Semantic web based system for decision
making assessing reliability of information and
sources
• Decision maker can construct compound
statements justifying decision and providing
basis for others’ decisions
Trellis (cont.)
• Components
– Statements (Carl Lagoze is a bad teacher)
– Basis of statement
• http://cornellbigred.collegesports.com/sports/mcrew/mtt/kruse_william00.html
– Principal source of basis/statement
• William Kruse
– Qualifications to state certainty of component
Trellis compound statement
From Gil and Ratnaker
Advogato
• Trust metrics for open source software
developers
• http://www.advogato.org/
• Three levels of trust/certification
– Master
– Journeyer
– Apprentice
Advogato (cont)
• Graph structure of trust
– Domain of master is only master
– Domain of journeyer is master and journeyer
– Domain of apprentice is all
• Computation of trust is via network flow (well known
problem with efficient solutions)
– Hard-wired set of users from which all trust flows (gods of
the system)
– people reached by the flow are those accepted by the trust
metric
– With the three levels, the maxflow is computed three times:
• Robust (resistant to attack) and efficient
Eigentrust
• Algorithm for Reputation Management in P2P
Networks
• Kamvar, Schlosser, Garcia-Molia (Stanford)
• http://www.stanford.edu/~sdkamvar/research.html
Eigentrust Approach
• Goal: Identify sources of inauthentic files and bias
peers agains downloading from them
• Method: Give each peer a trust value based on its
previous behavior
• Trust values
– Local: open a peer has on another based on past experience
– Global: trust that entire system places in a peer
• Want latter computed from aggregate of former
• Dual goals
– Know all peers
– Perform minimal computation and store minimal data
Past History Approach
• Each peer biases its choice of downloads using
its own opinion vector
• Problems:
– Each peer has limited past experience
– Inertia – if a peer has good past experience with
another, it will be biased towards relying on it
Friends of friends approach
• Ask for opinions of the people who you trust
• Weigh their opinions by your trust in them
• Problems
– You have a lot of friends: too much to compute and
store
– Few friends: won’t have enough data
Eigentrust Approach
• Whole networks cooperates to store and
compute trust vector
• Each peer holds its own opinions
• Each peer holds its own global reputation
• Iterative algorithm that converges to
compute global trust ratings (in the nature of
PageRank)
More Eigentrust Issues
• Secure Score Management
– Voting among multiple score managers
– Peer score held by another peer
• Threat scenarios
– Malicious individuals (always bad)
– Malicious collectives (always bad, think highly of
each other)
– Camouflaged collectives (sometimes good to trick
people)
– Malicious spies (good all the time but friends with
bad folks)