Transcript document

Peer-To-Peer Networks
BitTorrent vs. FreeNet
Presentation
By
Bishara Khleif
Comp 529
OVERVIEW
• P2P NETWORKS INTRODUCTION
• BITTORRENT
• FREENET
INTRODUCTION TO P2P
NETWORKS
 Peer To Peer distributed file system allows peers to communicate through its
routing protocol over the application layer.
 There are two designation of nodes: Client and Server
►Client node searches the database for a specific file to
download cost free, such as mp3.
►Server acts as an up loader when the it has the
requested file.
 Nodes have four generally distinguishable stages
►Newly joining node exchanges basic information with other
nodes and declares its available files. (neighbor finding)
►P2P location protocol helps locate queried files and routing protocol route
its messages to the appropriate nodes. (location and routing protocol)
►Upon searching success, it begins to download the requested file.
►When done the node announces departure and logs off.
P2P GRAPH
P2P PERFORMANCE
 Performance is divided according to three categorize:
1) Resilience is measured in reference to query success
to failure ratio. The query fails if the node operator is
down or the link to it fails.
2) Query efficiency is measured in reference to the
average of query numbers and average query path
length. And depending on which message forwarding
system used the message number, for flooding
based forwarding would be bigger than single route
message forwarding.
3) scalability is measured in how adaptable and expandable
the system to new nodes.
PERFORMANCE GRAPH
P2P ARCHITECTURE
 There are three type of P2P architecture:
1) Centralized structure such as Napster where all
objects are indexed in a central server. Peers looking
for certain file will find its location or information
about it from the central server, suffers from single
point failure.
2) Decentralized but unstructured such as Gnutella has
a random topology and queries are executed hop by
hop till success of failure.
3) Decentralized and structured system use Distributed
Hashing Table (DHT) to locate the queried object.
P2P COMPARISON GRAPH
SEARCH ALGORITHM
 Object searching algorithm divides into two section:
Random and Deterministic object search.
 When using random search, queries would be forwarded to randomly
selected neighboring nodes. The query keep on hoping from one node
to its next randomly selected neighboring node. There are two kinds of
random selection
1)Uniform random selects its neighboring node arbitrary
from within its neighboring group.
2)Proportional random selects the next neighboring hop in
proportion to its connection degree. So the bigger the connection
degree the larger the query hit ratio would be.
 When using Non-Random query search, the query’s next hop is
structury determined first.
BittTorrent Definition
 Webster dictionary defines a Torrent as a
violently flowing stream of water/lava
Concatenating Bit to Torrent one would get
a violently flowing stream of bits and that, I
think, why the founder decided on naming
it this way since its main purpose is to
exchange rapidly large files between
other peers at minimal cost.
Popularity
 It’s popular because it feeds off the users files
and storage.
►Traditional p2p file distribution system
such as Napster had a file server that
catered to its users that was costly and
when failed service was disrupted.
► Popularly requested files congested the
traffic and thus lacked effectiveness.
SOLUTION
 Create a system that duplicates a large file among its
users and have them upload & download to each other
simultaneously and freely.
 Make it effective so it can carry on a large amount of
users to facilitate a rapid exchange of shared files.
 Make it robust so that it does not have a single point of
failure and disrupt its operation.
 Make it user friendly so it can be adopted quickly by all
level users.
 Make it perform over an existing hardware that is easily
accessible to everyone
PEERS CLASSIFICATION
 Peers are categorized into two types
► Seeds which are peers that have already completed
downloading a copy of the file. Now they can help
replicate and distribute it among other active users.
► Leechers which are in the process of leeching the file
off the system.
BitTorrent Example
ARCHITECTURAL FEATURES
 An efficient swarming technique that BitTorrent
algorithm dictates manages to balance the rate of
exchange between peers.
► the file is divided into equal chunks and every
user declares which chunk he has.
► the chunks are downloaded in parallel at the
same time from different users.
► uses rarest first chunk so that chunks that do not have
high request rate get downloaded first to keep
users in exchanging mode.
► File chunks are further broken down into smaller pieces
to avoid transmission delay over TCP connection and
thus insuring proper sequence of delivery.
► Further, leechers will not get to download a new chunk
unless the whole sub-pieces of the previous chunk has
been received as to avoid arbitrary and unorganized
download of bits and pieces.
► If for some reason two peers experience slow upload
they can instead upload to each other while getting a
better downloading performance.
► enforces tit-for-tat (selflessness) policy where
information exchange is encouraged.
► if a user download without uploading he will
temporary be chocked, or denied uploading.
► The back bone of the system is based on a Tracker
that keeps track of all the active peers and their
activities without involvement in the file distribution
process.
► seeds usually limit simultaneous leechers service to only 4
but rotate every 10 seconds when needed.
► To further enhance downloading ability, seeds download to
leechers with high downloading rate so that they in turn
finish and become seeds to alleviate the system load.
► The older peer have a higher priority for downloading
since he would have downloaded more chunks of the file
than the newly comer and thus would have better chance
in completing the download.
PERFORMANCE
 The result of a five month study shows that on average
down loaders achieved above 500kb/s which provided
a clear understanding that most users have high
bandwidth connection such as ADSL.
 Although The chocking algorithm that advocates
cooperation between peers is not part of the technical
support of BitTorrent, it insures better reciprocating
performance for all users having either high or low
bandwidth.
GRAPH RESULT
EFFICIENCY
 When BitTorrent realizes that one peer is downloading
while not uploading, it will choke him and deny him service to
insure efficiency of the system.
 The goal of the algorithm is to have several active peers
working together in a unidirectional way replicating and
transferring files among themselves to insure rapid
completeness.
 The algorithm also adheres to globally optimize
reciprocating peers connection as a whole.
 It also allows users to suspend and resume downloading either by
choice or by necessity. Its done by the tracker log that identifies the
user by its session which consists of hashing his IP address with
current time of download.
BItTORRENT INTERFACE
 To start using bitTorrent you would download the latest
version from a web sites that advocates it (lots).
 Start the tracker and surf the web looking for a .torrent
file to leech and maybe also seed.
 The tracker exchange information between users over
regular HTTP or HTTPS.
 The tracker gets updated by each active user of its up
or down loading progress.
 The bitTorrent connections protocol between users is
done over TCP.
 Usually the one having the original copy is uploading
only.
BitTorrent Coding
 BitTorrent protocol uses bee encoding format between the Tracker and
Metainfo file correspondence.
 Message transfer that are be-encoded consist of nested dictionaries and
lists that include strings and integers.
► Strings are be-encoded with length-prefixed base
ten followed by a colon and the string
for example 4:spam means simply spam
► Integers are be-encoded by representing them with
an ‘i’ followed by a base ten number and an ‘e’.
There is no size limitation on integers.
for example i4e corresponds to number 4.
► Lists are be-encoded as an ‘l’ followed by their bencoded
elements and an ‘e’. For example 14:spam4:eggse=spam,eggs
► Dictionaries are bencoded by ‘d’ followed by a list of alternating keys,
their values and an ‘e’. For example, d3:cow3:moo4:spam4:eggse
translates to {cow, moo, spam, eggse}
FREENET INTRODUCTION
 FreeNet is another distributed P2P network that offers
its users a cheap interface to exchange information.
 However FreeNet advocates secrecy for it’s users that is
beyond the censoring agency's reach.
 Its protocol uses GUID (globally unique identifier) keys
to disguise the file location and the requester/provider
identity.
 The use of these keys also prevent the storer of the file
from accessing its content.
PRIVACY DRIVEN
 Governments around the world have been eavesdropping
and intercepting internet users messages.
They justified these action to investigate criminals
behaviors.
 They campaign to limit access to internet content they do
not approve off regardless of personal preference.
 The founders of this network argued that the postal mail
(anthrax) and/or payphones is another communication
tool without censorship.
 They also argued that the essence of any democratic
society is its freedom of speech, good or bad.
DESIGN PURPOSE
 To facilitate information exchange medium between
information producing peers, information consuming
peers and information holding peers without the prowling
eyes of any governmental or other nosy agency
regulation.
 To decentralize the traditional one point failure central
server for reliability and availability.
 To advocate adaptability, scalability, efficiency, security
and integrity in comparison to other P2P networks.
ACHITECTURE
 The network information is stored on its members allocated storage
devices.
 Information storage is maintained by a Least Recent Requested file.
► Old files get replaced by new ones based on
their frequency of use. If the allocated storage
capacity of the information storer reaches its
limit the least recently requested files get
bumped for the new one. However if for some reason the
file is suddenly being needed, the original provider could
another copy.
► More popular nodes would have more copies than the newly
added ones that are located further down the tree like
structure because they frequented more often and their
routing algorithm is better trained.
 There two factors that influence file distribution:
► The first one is called tree pruning which like a garden
tree with dying trunks get removed to enhance the
growth of the rest of the tree.
► The second one is called tree growth where a
popularly requested file gets copied to the specific
area of which improves query response time and
prevents overloading other nodes of that area.
GUIDE KEYS

There are two types of GUID keys CHK and SSK
► Content-Hash key (CHK) is created when hashing the
content of the file to be stored. So when users look for
a specific file the key gets the same hashed value and
a reference to its exact location.
► Signed-subspace key (SSK)- the subspace is created
by first generating a private-public key pair to identify
it and then hashing the public half of the subspace
key and the descriptive string independently before
concatenating then and hashing again. To insure integrity, the file
is signed with its private key so that other nodes verifies its
signature before accepting it.
► Only the creator of that particular file can modify it, the rest can only read
it which insures trustworthy virtual space.
PRIVACY
 Variation of Chaum’s mix-net scheme for private
information exchange is used to guaranty anonymity of
users.
► Messages hop from one node to the other while
encrypting its link to avoid destiny detection.
► No one node can have information on all of the nodes
but its immediate neighbor.
► Attackers would be prevented from corrupting the
file from its keepers.
ROUNTING ALGORITHM
 FreeNet message routing algorithm implements steepest-ascent hill-climbing search
to avoid problematic centralized design used by other P2P like Napster Or the one
used by Gnutella that broadcast its queries to every one connected.
►Nodes forward queries to others that have table
entry in close proximity to the files assigned key.
►Queries bounce back & forth between nodes till it
homes in the right one
►Nodes get trained and expand by collecting
information from others querying it.
►Nodes also store cluster of files having similar keys
because they search by specific key and not topic.
And since topics are assigned hash keys, topic similarities will
be dispersed among many nodes insuring against single point
failures and thus system robustness. Study shows a median path length of just 8
hops within a network of 10,000 nodes.
FREENET FUNCTIONALITY
 Each user maintains information about other users routing tables
and possible addresses they have.
 Every time a user receives a request for a file the users data base is
checked first. If the file is not found, the user forwards the query to
the next node having the closest possibility of having a similar key.
 Users avoid identifying their addresses by faking a reply to
themselves to deceive attackers.
 Queries are given Time-To-Live (TTL) so it does not look forever and
tie up system resources.
 New files are inserted according to their GUID key values
► The receiving node checks to see if it already have and if not, it
will facilitate storage for it and create a table entry for it.
 Data Encryption is not done automatically by the freeNet
network. It is the responsibility of information publisher to
do it.
 Information publishers provide GUID keys to end user
only to protect the information from being modified by
information holders that would not be able to descramble
the keys.
 Newly added nodes are identified by their public-private
keys. They get connected to other nodes by sending an
announcement message .
 The receiving node while adding it to its routing table
propagate the information to the next node and so on till
TTL expires. At that time the nodes new group assigns it
a unique identifying GUID randomly and region.
PERFORMANCE
 FreeNet is modeled after small world network that
exhibit scalability and fault tolerance where significantly
few nodes have huge connection to other nodes that
enhances performance and create short path connection
due to their vast connection.
 The network eliminates poorly connected nodes that fail
to respond to queries.
 Robustness is majored against failure and unless the
well connected nodes are taken out first the network will
recover from malicious attacks from within the users or
from an outsiders.
PERFORMANCE GRAPHS
FREENET SECURITY
 Files to be stored on the system need to be protected just as the participant
should be. If one impostor node can intercept and modify any files’ data,
the system should be able to weed it out and restore the files’ data.
 There are two levels of anonymity the system advocates
1) Senders and receivers should not be able to determine
the sending node identity or the receiving node identity
2) Ranges of anonymity the participants would use to achieve their goal;
absolute privacy – exposed.
 All sent Messages have:
1) A randomly generated 64bit transaction ID that is uniquely generated
and have a low collision rate over its live time.
2) A hops-to-live limit that is set by the sender and gets decremented
with each passing node. The message does not stop when reaching
its destination so to confuse a would be attacker and disguise the
receivers identity.
3) A depth counter gets incremented at each hop and is initialized to
small value to hide the messages’ true destination.
 There is no message protection between a sender and the
first node contacted, that might be an eavesdropper. The best
way around this problem is to own the first node on the edge
of the network to be used as an entry point to it.
 Messages can be repeatedly encrypted by its public key that
specifies the path to take. And when it reaches its destination
it get injected to the net as if it was coming from the receiver
end.
 Data originators have their identity hidden by the frequent
resetting of the data source field in the routing table replies to
requests. There would be no way to find out if the data holder
is the originator or simply a forwarding node.
 To defend against file grabbing by an attacker, content-hash
keys and subspace keys (CHK,SSK) should be used because
unless the attacker can decipher the signature he will have a
slim chance access it.
 Denial-of-service attacks (DoS) can happen but will not have a
significant effect because a scheme such as Hash Cash can
counter it.
 The network can also protect against such and attack by
subdividing the data base into two section where new insertion
do not get mingled with the old files.
QUESTIONS?