Hybrid Method for P2P Traffic Detection

Download Report

Transcript Hybrid Method for P2P Traffic Detection

Detection of Encrypted Traffic
in Peer-to-Peer Network
Mário M. Freire
Instituto de Telecomunicações
Departamento de Informática
Universidade da Beira Interior
([email protected])
Ciência 2010 – Encontro com a Ciência e a Tecnologia em Portugal
Lisboa, 4-7 de Julho de 2010
Overview
• Overview About Peer-to-Peer Systems
• Methods for P2P Traffic Classification
• Deep Packet Inspection
• Behaviour-based Methods
• Hybrid Method for P2P Traffic Detection
• P2P Traffic Detection Using a Behavioural
Method Based on Entropy
• Method for P2P Traffic Detection Using Deep
Packet Inspection
• Main Conclusions
2
Overview About Peer-to-Peer Systems
Main features
– Scalability
– Resiliency
– Redundancy
Advantages
– Less expensive;
– More fault tolerance;
– It is possible to put the
services in points of
the network where
they are more needed.
Disadvantages
– Security
issues.
3
and
legal
Overview About Peer-to-Peer Systems
Functional classification of P2P application
• Management and contents sharing (eg: BitTorrent)
• Distributed processing (eg: Seti@Home)
• Collaboration and Communication (eg: MSN)
4
Degree of Decentralization
• Purely decentralized systems (eg: eMule, GNUtella)
• Partially decentralized systems (DirectConnect)
• Hybrid decentralized systems (BitTorrent)
• Centralized (Napster)
Structure of the Information System
• Unstructured systems (eg: GNutella)
• Structured systems (Chord, CAN, Pastry, BitTorrent)
• Loosely structured systems (Freenet)
Overview About Peer-to-Peer Systems
• This work is focused on the corporation
perspective of P2P applications;
• The traffic generated by P2P file-sharing
and P2P TV applications may compromise
the performance of critical networked
applications or network-based tasks in
corporations/institutions.
5
Methods for P2P Traffic Classification
Traditional Traffic Classification:
– Based on port number
– obsolete!
Current Methods for P2P Traffic Classification
•Payload Inspection, Deep Packet Inspection or
Signature-based Detection
•Based on Flow Traffic Behaviour or
Classification in the Dark
6
Methods for P2P Traffic Classification
Traditional traffic
classification:
– Based on port number
– obsolete!
Search for traffic in the
ports that are usually used
by known peer-to-peer
applications.
Unable to classify:
–new or unknown protocols;
–applications that choose a random port number;
–applications that disguise the traffic using ports
usually used by different protocols (80, 25, 110, …).
7
Deep Packet Inspection
Underlying
approach:
Search for specific
signatures (string
series) in the
payload of IP
packets.
8
Most of the already
known P2P
protocols may be
identified by
patterns contained
in the payload of
an IP packet.
Deep Packet Inspection
Useful for:
– accurate protocol
identification;
– well know protocols;
– non evasive
applications;
– mechanisms for
service charging
systems.
9
Problems:
• new or unknown
protocols;
• encrypted payloads;
• legal issues;
• heavy computation
needed to process
huge portions of
traffic at very high
bitrates and/or low
latency
communications.
Behaviour-based Methods
• Underlying
approach:
• identifying
patterns, in the
traffic
behaviour,
without looking
into the
payload
contents.
10
There are several mechanisms
for traffic classification based on
traffic behaviour:
Patterns are identified on several
traffic characteristics as:
– (IP, ports) pairs;
– number of connections;
– TCP flags;
– inter arrival times.
Behaviour-based Methods
Different mechanisms have been investigated
to classify traffic using behaviour patterns:
• Statistical Mechanisms. Statistical methods usually
rely on flow and packet level properties of the
traffic, such as the flow duration and size, interarrival times, IP addresses, TCP and UDP port
numbers, TCP flags, packet size, etc;
• Heuristics Based Methods. Many behavioral
mechanisms for traffic classification are based on a
predefined set of heuristics. Typical heuristics
include the source-destination IP pairs that use both
TCP and UDP, the number of distinct addresses and
ports a user is connected to, etc.
11
Behaviour-based Methods
Different mechanisms have been
investigated to classify traffic using
behaviour patterns:
• Machine Learning Techniques. A large part of the
studies propose classification mechanisms based
on different supervised or unsupervised ML
techniques,such as Bayesian estimators or
networks, clustering, and decision trees.
12
Behaviour-based Methods
Useful for:
– Unknown
protocols;
– Encrypted traffic;
– Public networks
under data
protection laws.
13
Disadvantages:
– Lack of accuracy
– Unsuitable for
service charging
systems.
Hybrid Method for P2P Traffic Detection
Proposed approaches up to now may fail to identify
Peer-to-Peer traffic when:
–traffic is encrypted;
–payload signatures for a new protocol are unknown;
–the aggregation point may such a heavy load that
may become infusible to deeply inspect all the
packets under high-speed and/or low latency
operation.
Starting Point
Hybrid Method for P2P Traffic Detection
14
Hybrid Method for P2P Traffic Detection
Hybrid Method for P2P Traffic Detection
Combines both strategies:
–Flow Traffic Behaviour or Classification in the Dark
 P2P Traffic Detection Using a Behavioural
Method Based on Entropy
 More details for instance in:
“Analysis of Peer-to-Peer Traffic Using a Behavioural Method
Based on Entropy”, Proc. IEEE Int. Performance, Computing
and Communications Conf. (IPCCC 2008), pp. 201 – 208.
–Deep Packet Inspection (Signature-based
Detection)
15
Hybrid Method for P2P Traffic Detection
–Deep Packet Inspection (Signature-based Detection)
 More details for instance in:
David A. Carvalho, Manuela Pereira and Mário M. Freire
"Towards the Detection of Encrypted BitTorrent Traffic Through Deep
Packet Inspection“, in Security Technology, Communications in
Computer and Information Science, CCIS 58, Springer-Verlag, Berlin
Heidelberg, December 2009, ISBN: ISBN: 978-3-642-10846-4, pp.
265–272, 2009 (Invited Paper).
Mário M. Freire, David A. Carvalho, and Manuela Pereira
"Detection of Encrypted Traffic in eDonkey Network Through Application
Signatures"
Proceedings of 2009 1st International Conference on Advances in P2P
Systems (AP2PS 2009), Sliema, Malta, October 11-16, 2009
IEEE Computer Society Press, Los Alamitos, CA, ISBN: 978-0-76953831-0, pp. 174 - 179.
16
Hybrid Method for P2P Traffic Detection
Traffic classification through specific user feedback mechanism
Deep packet
inspection module
Internet
Traffic coming
from the Internet
Classifier
Traffic characterization
on the dark module
Entropy analysis module
“filtered”
traffic
Private
network
Other traffic characterization
in the dark modules
Equipment responsible for monitoring the traffic
(Intrusion Detection/Prevention System, Firewall, Gateway, etc).
If the deep packet inspection module
is not capable of classifying the
traffic, forward it to the traffic
characterization in dark module
Traffic characterization
on the dark module
Entropy analysis module
Other traffic characterization
in the dark modules
Filter/control module
Internet
Deep packet
inspection module
“classified” traffic
Traffic coming
from the Internet
Classifier
17
Classification in the dark
can be used :
 only if DPI methods are
unable to classify the
traffic;
 or, in every case, in
cooperation with DPI
methods.
(Intrusion Detection/Prevention System, Firewall, Gateway, etc).
Filter/control module
The module for traffic
classification in the dark
can be used cooperatively
with deep packet
inspection techniques,
concurrently or
sequentially.
Equipment responsible for monitoring the traffic
Data unit
copying module
“classified” traffic
We considered
classification in the dark,
not as an alternative, but
as a complement to DPI
techniques.
“filtered”
traffic
Private
network
Traffic classification through specific
user feedback mechanism
P2P Traffic Detection Using a
Behavioural Method Based on Entropy
One can say that the Entropy reflects the degree of
certainty (or uncertainty) of a given variable.
From a rough perspective, and for the sake of simplicity, we
will just say that it can also disclose the heterogeneity of a
pool of sample values, observed for a given period of time.
Entropy:
n
  px i ln px i 
Entropy
Heterogeneity
High
High
Low
Low
i 1
Maximum Entropy Value:
ln n 
18
where n is the size of the values pool
Source: J. Gomes, P. Inácio, M. Freire, M. Pereira, P. Monteiro, IEEE IPCCC 2008
P2P Traffic Detection Using a
Behavioural Method Based on Entropy
Peer-to-Peer traffic presents a bigger heterogeneity between packet
size values, when compared with other traffic classes.
0.6
0.5
0.5
0.4
0.3
0.2
VoIP call using
Skype
0.04
Probability
0.6
0.1
0.4
0.3
0.2
0.03
0.02
0.01
0.1
0.0
19
Download traffic using eMule
file sharing application
Probability
Probability
A simple HTTP
download from the Web
0
500
1000
Packet Size (bytes)
1500
0.00
0.0
0
500
1000
Packet Size (bytes)
1500
0
500
1000
Packet Size (bytes)
Source: J. Gomes, P. Inácio, M. Freire, M. Pereira, P. Monteiro, IEEE IPCCC 2008
1500
P2P Traffic Detection Using a
Behavioural Method Based on Entropy
Several measures were tested
– variance
– mean
– amplitude
However, Entropy is the measure that
better reflects the heterogeneity of
the packets size.
For several traces containing traffic
from different classes, we calculated
the entropy value for a sliding window
of 100 packets.
20
Traces containing Peer-to-Peer traffic
were, almost perfectly, organized in
the top of a table containing the
average of the entropy value.
Skype VoIP 1
Skype VoIP 2
MSN VoIP 1
Google Talk VoIP
eMule download 2
BitTorrent
Skype IM 2
eMule download 1
MSN IM 1
Google Talk IM 2
Skype IM 1
eMule upload 1
Google Talk IM 1
MSN VoIP 2
MSN IM 2
HTTP
eMule upload 2
Live Streaming 1
sFTP download
Streaming download 1
Live Streaming 2
sFTP upload
Download from Web 4
Streaming download 2
Download from Web 3
Download from Web 2
Mail download
Download from Web 1
Source: J. Gomes, P. Inácio, M. Freire, M. Pereira, P. Monteiro, IEEE IPCCC 2008
3.729
3.698
3.260
2.855
2.498
2.273
2.153
2.141
1.959
1.917
1.886
1.843
1.810
1.740
1.612
1.427
1.334
1.278
1.004
0.772
0.639
0.552
0.352
0.282
0.175
0.073
0.050
0.014
P2P Traffic Detection Using a
Behavioural Method Based on Entropy
The approach was tested for traces containing mixed traffic from
several applications.
The results were depicted in charts.
Packet size entropy for an observation window of 100 packets
4.61
entropy max value
4.11
3.61
Download from web
entropy value
3.11
DC++ and Download from web
2.61
2.11
1.61
1.11
BitTorrent and Emule and DC++
0.61
Emule and DC++ and Download from web
BitTorrent and Emule
BitTorrent
BitTorrent and Emule and DC++ and Download from web
0.11
21
-200
-0.39
0
200
400
600
800
1000
1200
1400
packet number (x 100)
1600
1800
2000
2200
Source: J. Gomes, P. Inácio, M. Freire, M. Pereira, P. Monteiro, Submitted for publication
2400
2600
Method for P2P Traffic Detection Using
Deep Packet Inspection
•The methodology used for the detection of P2P
traffic makes use of an open source and widely used
intrusion detection system, called SNORT.
•The identification of signatures associated with
application packets was made manually through the
observation of repetitive patterns in the payload of a
sequence of packets generated by a P2P application,
even with obfuscation (encryption of the payload).
•The signatures in payloads of P2P applications to
identify are expressed in terms of SNORT rules.
22
Method for P2P Traffic Detection Using
Deep Packet Inspection
•Using this methodology, we developed SNORT rules
for the detection of P2P traffic generated by the
following applications: BitTorrent, Vuze, eMule,
aMule, Limewire and GTK-Gnutella.
•A complete set of SNORT rules for those P2P
applications is available in NMCG Lab at:
http://floyd.di.ubi.pt/nmcg/pdf/snortrules.pdf
•Particular attention is being paid to encrypted
traffic.
23
Method for P2P Traffic Detection Using
Deep Packet Inspection
24
Protocol
Application
BitTorrent
BitTorrent
Vuze
eDonkey
eMule
aMule
Gnutella
Limewire
Gtk-Gnutella
P2P TV
Livestation
TVUPlayer
Goalbit
Experimental Testbed for P2P Traffic
Detection
25
Experimental Testbed for P2P Traffic
Detection
Characteristics of hardware and software
used in the testbed for eMule traffic detection
26
Type
Operating System
CPU
RAM
Software
Workstation
Fedora 9
Core 2 Duo
2.66 GHz
1 GB
Snort, Wireshark,
BASE, Barnyard,
Gtk-Gnutella
Workstation
Windows XP SP3
Pentium III
800 MHz
512
MB
eMule, Limewire
Laptop
Windows Vista
SP1 / Fedora 10
Core 2 Duo
2.4 GHz
3 GB
Wireshark, eMule
Laptop
MAC OS X (10.5)
Power PC
G4 1GHz
769
MB
Livestation,
TVUPlayer
Experimental Testbed for P2P Traffic
Detection
•In all lab experiences reported here, Snort was
forced to analyse other network traffic than P2P,
like HTTP, Windows Remote Desktop Connection
(RDC), SSH, etc.
•In fact, this was quite worthy, since it enabled
the testbed to run in similar circumstances of
those of deployed P2P classifiers, which also
have to deal with network traffic generated by a
vast number of applications and then to correctly
identify P2P among it.
27
Experimental Testbed for P2P Traffic
Detection
• Experimental results presented in the next tables for
the most triggered rules were obtained through the
download of media objects such as the documentary
“Inside the Space Shuttle”.
• Tables with experimental results show the
effectiveness of the proposed Snort rules to detect
plain or encrypted traffic generated by eMule.
• Example of a SNORT Rule: Snort Rule 1000307:
alert udp $EXTERNAL_NET any -> $HOME_NET any
(msg:"LocalRule: P2P BitTorrent UDP - Incoming DHT for
trackerless comunication request (d1:ad2:id20)";
content:"d1:ad2:id20"; nocase; depth:11; classtype:policyviolation; sid:1000307; rev:3;)
28
Experiments When Obfuscation Is Not Used
29
Starting
Time
Ending
Time
Number of
Packets
Volume
in Bytes
Alert
Count
14:15
14:23
8876
1096078
1000001
1000065
1000317
166
2
1
14:31
14:33
2725
487614
1000001
1000067
1000068
1000069
1000088
1000090
1000098
13
2
2
3
3
6
18
10:05
10:11
14452
1875946
1000001
1000306
1000307
1000308
2008581
486
581
287
6
1
Experiments Using Obfuscation
30
Starting
Time
11:04
Ending
Time
11:24
12:01
13:37
Number of Volume
Packets
in Bytes
46138
28618596
392168
21128650
3
eMule Traffic
Downloaded
10.83MB
Traffic
Uploaded
133.94KB
60.73MB
22.86MB
Alert
Count
1000019
1000020
1000024
1000025
1000090
1000096
1000098
1000306
1000307
1000308
1000005
1000019
1000020
1000024
1000025
1000030
1000068
1000306
1000307
1000308
1000309
1000090
4
4
4
3
5
18
12
638
303
11
58
29
21
21
21
3
4
3489
1711
36
6
4
Main Conclusions
• We presented an overview about peer-topeer networks and approaches for the
detection of P2P Traffic.
• We proposed a new Hybrid Method for P2P
Traffic Detection.
• Several lab experiments were carried out to
validate the proposed method and to
evaluate its accuracy.
31
Acknowledgements
Acknowledgements
FCT PTDC/EIA/73072/2006 TRAMANET Project:
Traffic and Trust Management in Peer-to-Peer
Networks
Thank you
for your attention!
Questions?
32