FRIEND - Computer Science and Engineering

Download Report

Transcript FRIEND - Computer Science and Engineering

FRIENDS: File Retrieval In a
dEcentralized Network
Distribution System
Steven Huang, Kevin Li
Computer Science and Engineering
University of California, San Diego
FRIENDS





Motivation
Related works
Initial approaches
Implementation
Future work
Peer-to-peer File Sharing


Illegal file sharing has drawn negative
publicity
P2P systems have nice features



Low barrier of entry
Aggregates computation and storage on
large scale
Robust, secure
Early P2P File Distribution
Systems

Napster (http://www.napster.com)



Gnutella (http://www.gnutella.com)



Uses central database
reliability issues
Uses broadcast
Not scalable
Kazaa, Grokster, Morpheus



Hierarchy of nodes like DNS
Bandwidth is not well distributed
Reliability issues
More P2P File Distribution
Systems

Freenet [Clarke, 2000]


Focus on anonymity, files get lost
BitTorrent (http://www.bittorrent.com)


Focus on distributing file to many peers
Not fully decentralized, requires tracker
Related Works



Recent popularity has resulted in lots of
research in P2P
Focus is on scalability, balance, flexibility
Overlay Networks





CAN [RATNASAMY, 2001]
Chord [STOICA, 2001]
Pastry [ROWSTRON, 2001]
Tapestry [HILDRUM, 2002]
Object Integrity unchecked
Related Works



MD5
Replication
Reputation Management


Distributed EigenTrust [Kamvar, 2003]
Distributed Hash Tables
Research Question

How can a fully decentralized P2P file
distribution network support the
verification of object integrity in a
hostile environment?
Initial Approach

Global table of file names with hash values


Use time of object entry as indicator of file validity


Malicious nodes can tamper with time
Bootstrap to physically close node using IP address




Not decentralized, not scalable
Avoid multiple malicious nodes connecting to each other
Reduces overall bandwidth
Scalability issues, limited view of network
Use a single MD5 value per file

Waste a lot of bandwidth downloading 99% of an invalid file
FRIENDS System
Implementation

Built using MACEDON


Built protocol on top of Chord



MACEDON hides low level details allowing us to
focus on higher level research questions and
reducing code
Pros: Scalable, de-centralized, supports distributed
index
Cons: Overlay network lacks bandwidth
optimizations
Focus is on “proof of concept” rather than
performance
FRIENDS System Details


Verification needs to be done client size since anyone
can be an adversary
Each unique object in the system has a set of MD5
hash value associated with it




Files are broken up into 512kB sized chunks, with an
associated MD5 value
There is also an MD5 of all the MD5s which serves to reduce
bootstrapping costs
After downloading a portion of a file, run the object
through the given hash function to verify the data is
correct
Reputation system limits downloads from malicious
nodes
Distributed Hash Table

Use a Distributed Hash Table to store
the global table of object to hash value.


Upon entering the system, the node will
calculate the hash values of any new
objects it wishes to add to the system
Hashing “greenday basketcase” sends file
information to node for “greenday” and
node for “basketcase”
Can I trust you?

Malicious users can target the system in a
number of ways




Incorrect routing – checks can be made to ensure
progress
Hosting invalid files – MD5 hashes ensure that the
file being retrieved is the file desired
Corrupted Hash Table – solved with replication
across multiple nodes
Malicious nodes can still affect a cooperative
node by wasting its time and bandwidth
Blacklisting

Keep a rating of nodes that have hosted
downloaded objects.




Each successfully hashed object increases that
node’s rating (+1)
Failed hash decreases it (-5)
Filter potential sources against personal reputation
records
Friends trust friends


Increase rating of nodes trusted by trusted nodes
Exponential back-off
Future Work

Replication




Square-root replication [Lv, 2002]: Optimal
number of replications = 1/ρ (Σ√qi)2; where qi is
the request frequency of object I and ρ is the
average number of replicas per site
Bandwidth comparison tests using ModelNET
Improve file name indexing
Exploit locality to optimize bandwidth
Questions?