Gnutella - Montclair State University

Download Report

Transcript Gnutella - Montclair State University

Security in P2P Networks
A study of the gnutella protocol and it’s weaknesses
By:
Date:
Imran Qureshi
December 9, 2004
Gnutella Security - Overview
- What is Gnutella? The history
- The topology of Gnutella
- no central server (de-centralized - second generation)
- direct peer connection
- Gnutella Protocol
- Gnutella Descriptors
- 5 descriptors - ping, pong, query, queryhit, push
- byte structure of the descriptors
- descriptor header - byte structure
- Communication in Gnutella
- Finding and connecting to other servents
- Downloading resources - offline
- Firewalled servents
Overview
- Security Risks
-Spamming
- Denial of service attacks
- Pong attack
- IP harvesting
- Spreading viruses through the push descriptor
- Man in the Middle attacks
- Solutions
- Validation
- Gnutella Proxy Server
Gnutella History
History of Gnutella
• Gnutella was developed at Nullsoft, a subsidiary of AOL, by Justin
Frankel and Tom Pepper
• Justin Frankel, as some call him “the world’s most dangerous geek,
created Winamp at the age of 18 and a few years later, Gnutella while
working for AOL.
• Gnutella was released on 14th March 2000
• During those days, Napster was under scrutiny of lawsuits regarding
illegal copyrighted material. When people came to know about
Gnutella, a large number of people downloaded it.
• AOL forced Nullsoft to take down all links to Gnutella from it’s
website since it promoted piracy. But for the small time that gnutella
was available, one day, a large group of people already had it.
• Gnutella was open source, so people started reverse engineering the
protocol and now we have different programs using the Gnutella
protocol:
Gnutella Clients
Source: Peer-to-Peer Networks, by Prof. N.Vlajic
Gnutella Topology
Gnutella - Topology
• Gnutella topology is known as “de-centralized topology”. Meaning
that the communication between two peers or users or nodes on the
network take place directly. Each node acts as a client or a server,
giving permission to download resources or asking other nodes to
access there resources.
• Famous P2P clients; Napster, Kazaa, Gnutella
• The total number of peers found on the Gnutella network during a
weekday is around 43,546, sharing approximately 1,843,549 files.
• The communication does not go through a central server, unlike
Napster.
• Each node or peer on the network is called a “servent”. The word
servent comes from:
Each peer = SERVer + cliENT = “SERVENT”
Gnutella – Topology (contd…)
Napster (central server)
Gnutella (no central server)
Gnutella Protocol
Gnutella Protocol
• The Gnutella Protocol are a set of rules by which users communicate
over the network.
• All the communication is done via the use of “descriptors”
• There are 5 basic descriptors used, namely :
- Ping, Pong, Query, QueryHit and Push
• Each descriptor is preceded by a “descriptor header”
• In the following slides, we will describe the purposes of the descriptors
and there byte structure.
Gnutella Protocol – Byte Structures
The Descriptors:
• When a peer talks to another peer, the communication is done via
descriptors.
• The byte structure of a typical message is as follows:
Descriptor Header
Descriptor Payload
0
22 23
variable,0…max
• Note:
- All the following structures are in little-endian byte order (least
significant value is stores first)
- All IP addresses are in IPv4 format:
0xD0
0x11
0x32
0x04
byte1
byte2
byte3
byte4
Gnutella Protocol – Byte Structures
Descriptor Header:
• Byte Structure
Descriptor ID
Payload
Descriptor
TTL
Hops
Payload
Length
- Descriptor ID – Unique identifier for the descriptor on the network (16-byte String)
- Payload Descriptor – This value depends on the descriptor being sent:
ping
0x00
pong
0x01
query
0x80
queryhit 0x81
push
0x40
- TTL (Time to live or Horizon) – The number of times that the descriptor will be
forwarded. Each servent that receives a descriptor, will decrement the value of TTL and
forward it on to the next peer. When TTL reaches 0, the descriptor is no longer forwarded.
TTL is the best way available to reduce the amount of network traffic and prevent poor
performance.
Gnutella Protocol – Byte Structures
- Hops – Total number of times the descriptor has already been forwarded. The hop value is
incremented by each peer who receives it.
TTL(initial) = TTL(current) + Hops(current)
- Payload Length – The length of the next descriptor. Used to find the beginning of the nest
descriptor.
•
Right after the descriptor header, is a descriptor payload. This payload could be :
Ping
• A ping descriptor is used by a servent to find or search for other servents on the network.
• A servent who receives a ping descriptor, responds back with a pong.
• Ping have a length of 0 and have no payload. Hence they have no byte structure.
• The descriptor header identifies a ping by having a value of 0x00 in the payload descriptor
field and a value of 0x00000000 in the payload length field
Gnutella Protocol – Byte Structures
Pong
Port
•
•
IP Address
Number of
files shared
Number of Kb
shared
Sent as a response to a ping
Defining values:
- Port: the port at which this responding can accept incoming connections
- IP Address: IP Address of the responding host (big-endian format)
- Number of files shared: Total number of files the responding is sharing on the network
(usually found in the “shared folder”
- Number of Kb’s shared: Total number of Kb’s the responding host (with the given IP and
Port) is sharing.
Gnutella Protocol – Byte Structures
Query
Minimum Speed
•
•
Search Criteria
After a servent has the IP address and the port of other servents, it may search for particular
files using the query descriptor.
Defining values:
- Minimum Speed: The minimum speed (in kb/s) of the servents who should respond to
this query request. A query with the minimum speed requirements of m (kb/s), should only
responded to with a queryhit by a servent who has a speed greater than m.
- Search Criteria: A search string terminated by a null (0x00). The maximum length is
bounded by the payload_length field of the descriptor header.
eg: “nameofthesong.mp3”
Gnutella Protocol – Byte Structures
QueryHit
No. of
Hits
Port
IP
Address
Speed
Result Set
Servent
Identifier
- No. of Hits: Total number of hits or matches for the query in the result set
- Port: the port at which this responding can accept incoming connections
- IP Address: IP Address of the responding host (big-endian format)
- Speed: Speed of the responding host
- Result Set: Set of No. of hits responses for the correspoding query. In otherwords, how
many files in the shared folder of the responding host met the search criteria. Each of the set
of the No. of hits elements, has the following structure:
File Index
File Size
File Name
- File Index: Location and the ID of the file matching the query. (assigned by the
responsing host)
- File size: Size in bytes of the file.
- File Name: name of the file (double null terminated 0x0000)
- Servent Identifier: Unique 16-byte string identifier of the responding servent on the
network.
Gnutella Protocol – Byte Structures
Push
Servent
Identifier
•
•
File Index
IP Address
Port
The basic purpose of a push descriptor is to connect to a servent who is behind a firewall.
This topic is discussed in detail later on.
Defining values:
- Servent Identifier: targeted or firewalled servents unique 16-byte string identifer on the
network, being requested to push the file with a index of File Index
- File Index: index of the file to be pushed on the targeted servents shared folder.
- IP Address: IP Address of the servent (big-endian format) to whom will be pushed
- Port: the port on the targeted host, through which the file should be pushed.
Communication in Gnutella
Communication in Gnutella
Finding servents
• In order to connect to a gnutella network and share files, a servent needs to run
one of the many gnutella clients (ex; bearshare, morpheus etc..).
•
After the network is launched, this peer or node will let it’s neighboring node
(let’s say B) know of its existence. (You should know the Domain Name Server
DNS or IP Address of some neighbor at the start).
•
A will let’s its neighbors know of its existence by sending out the ping
descriptor.
•
B in turn will forward the ping to it’s neighbors and this descriptor will keep
going throughout the network letting the nodes know of A’s existence. Like
that, the information is broadcasted, and will keep on going to different nodes
on the network until the time-to-live (TTL) packet expires or reaches 0.
Communication in Gnutella
• Now, A has become part of the network and everyone know of it’s
existence.
•
If a servent wants to acknowledge, it will send a pong descriptor to A, letting it
know which of its port is accepting traffic and what’s the IP address.
•
Like that, A will have a file of all the IP address and ports of the servents who
responded with a pong descriptor.
Communication in Gnutella
Servent A announcing existence to peers
Source: Prof. Igor Ivkovic, Dept. of Compt. Science at Univ. of Waterloo
“Improving Gnutella Protocol”
Communication in Gnutella
Connecting to Servents
• Now that A has a file containing other servents addresses and ports, it will try to
connect to one of those servents (lets say B)
• After an TCP session is established with B, A will then send the following
commands in ASCII :
•
•
•
GNUTELLA CONNECT/<protocol version string>\n\n
where protocol version is the current version of Gnutella (ex: “0.4”)
If B wants to connect, it responds to the command by sending:
GNUTELLA OK\n\n
Now, there is a valid direct connection between A and B.
If B responds with any other command, A will know that B has no willingness
to create a connection.
Communication in Gnutella
• Now that this connection has been established, the communication
between A and B will carry on with the use descriptor and descriptor
headers, as described before.
Ping, Pong, Query, Queryhit and Push
Communication in Gnutella
Downloading resources or files from other servents
Before downloading is done, we need to search for the files.
Searching for files
• Let’s again take our two servents A and B
• Suppose that A wants to search for a file called “ushersong.mp3”.
• It will send out a query descriptor as follows:
- Let’s suppose that the minimum speed requirements are x:
X
•
ushersong.mp3
If a servent has a file or files which has the file “ushersong.mp3” and has a
speed >= x (kb/s), it may chose to send a queryhit descriptor as follows:
Communication in Gnutella
1
30
120.168.10.2
>x
Result Set
Result Set:
2
4661248 bytes
“ushersong.mp3”
Servent
Identifier
Communication in Gnutella
•
A will receive the queryhit descriptor and ask for downloading the file.
Downloading
• All searches on the gnutella network are done online while the downloads are done
offline
• Hence, two servents who wish to download, communicate using HTTP commands.
• So, in our example A creates a TCP connection with B and sends the following
command to download the file:
GET /get/<File Index>/<File Name>/ HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0-\r\n
User-Agent: Gnutella\r\n
\r\n
source: Mattias Jansson, “Gnutella” Feb 1, 2004
Communication in Gnutella
•
For our example, the HTTP command will read:
GET /get/2/ushersong.mp3/ HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0-\r\n
User-Agent: Gnutella\r\n
\r\n
•
A response to this could be :
HTTP 200 OK\r\n
Server: Gnutella\r\n
Content-type: application/binary\r\n
content-length: 4661248\r\n
\r\n
… data …
source: Mattias Jansson, “Gnutella” Feb 1, 2004
Communication in Gnutella
Fire walled Servents:
• If a targeted servent, from whom a file needs to be downloaded, is behind a
firewall, it is not possible to create a direct connection in order to download the
file.
• The fire wall will not allow incoming connections to it’s gnutella port.
• Hence, the requesting servent sends a push descriptor.
• Upon receiving the push request, the targeted servent tries to create a TCP/IP
connection with that host. If this connection is not established, then it means
that both the servents/hosts are behind a firewall.
• So the targeted servent sends the following command:
GIV/<File Index>:<Servent Identifier>/<File Name>\n\n
• After receiving this command, the requesting servent sends the following HTTP
GET request:
Communication in Gnutella
GET /get/<File Index>/<File Name>/ HTTP/1.0\r\n
Connection: Keep-Alive\r\n
Range: bytes=0-\r\n
User-Agent: Gnutella\r\n
\r\n
•
The rest of the download process is similar to what I described before.
Security Risks of Gnutella
Security Risks of Gnutella
Spamming and Denial of Service Attacks
• In emails, spammed messages can easily be deletd and there will be no further
harm.
• But, if you accept a spammed query, the consequences can be very harsh and
you could actively play a part in the Denial of Service Attacks (DOS)
• DOS attacks in Gnutella are achieved very very easily.
• If a user (A) asks for a file to be downloaded from another peer(B), it will query
it.
• Let’s say that B in our case is a malicious peer and is misbehaving on the
network.
• B will receive the query from A and respond positively, and urge A to
download the file from C (the host under attack)
• Hence A will start downloading the files from C, without knowing that it is
actually downloading it from C.
Security Risks of Gnutella
•
•
•
•
This way, the malicious B will direct many peers to download files from C and
hence create a denial of service attack
The important to understand in this concept is that, any body could be playing a
role in a DOS, with out knowing it.
At some point, the load on C could be so much that it could be unable to allows
connections to more peers and may even crash.
It will also be very hard for any to identify who originated this attack, since
request to C could be coming from many different IP and many different
Domains.
Security Risks of Gnutella
Pong Attack
• The concept behind a pong attack is the same as the DOS attack
• When the malicious B receives a ping from A, it might reply back with a pong,
containing the IP and port of C (host under attack)
Port
•
IP Address
Number of
files shared
Number of Kb
shared
A believes that a connection has been established with B, and will start
forwarding queries, even though they are going to C
Security Risks of Gnutella
IP Harvesting
• Hackers are always in search for people’s IP addresses.
• They continuously search and scan the internet in order to see people’s IP
addresses.
• Since most web servers have highly protective firewalls, it is hard for them for
break through.
• But in Gnutella, IP are easily derived.
• P2P networks work in a way that requires you to advertise your IP address.
• A hacker could easily gather or harvest IP addresses and attack vulnerable user
on the network.
• This is not a problem for people with Dial-up Connections, wince there IP keeps
on changes.
• But the people with static IP addresses (such as montclair state university or
“.edu” domains) are in trouble.
Security Risks of Gnutella
Transferring viruses through the push descriptor
•
•
•
•
•
A typical push descriptor contains the IP addresses of the responding host and
the port that is accepting traffic.
When a user sends out a query to a peer , that peer might lie and say that it has
the file even though it doesn’t.
Then the user will send a push request to the responding peer and the
responding peer will create a TCP/IP connection with the user.
Now, the responding host can easily transfer any files to the user, since it has
already gained trust by lying.
These files could be “.exe” files, that could transfer a virus to the user’s
computer
Security Risks of Gnutella
Man in the Middle Attacks:
• I will describe this with the use of an example:
• We have three people:
A – searching for a file
B – has the file
C – malicious user
•
•
•
•
•
A pings the network searching for a file.
B has the file, and responds back with a query.
Suppose C receives one of these queries, changes it to it’s own IP and port , and
directs it to A
A, who gets the reply from C, creates the connection with C but not B
C, on the other hand, download the original file from B, infects it with malicious
content, and then transfers it to A
Solutions
Solutions
1)
Validation
2)
Unique Network Identifier
3)
Reduce Network traffic
Thank You For Your Attention
Questions or Suggestions about any
concepts discussed ?