Creativity Session

Download Report

Transcript Creativity Session

Towards Understanding Network
Traffic through
Whole Packet Analysis
Abdulrahman Hijazi
Hajime Inoue
Ashraf Matrawy
P.C. van Oorschot
Anil Somayaji
Agenda
• Introduction
• Project in a nutshell
• ADHIC
• NetADHICT
• Overview
• In progress
• Results
• Performance
• Multimedia & encrypted traffic
• P2P
• No-headers
• Limitations
• Applications
Towards Understanding Network Traffic
through Whole Packet Analysis
2
Introduction
• Complexity of modern computer networks
• Common network analysis strategies
• Predetermined classifiers (port, address, …)
• Protocol dissectors (wireshark, …)
• High-level view of network structure through packets
clustering
• Header information
• Payload
• Better distinguishes: p2p, worms, …
• Performance issue
Towards Understanding Network Traffic
through Whole Packet Analysis
3
Introduction
• We developed a packet clustering technique that:
• finds semantically interesting clusters
• adapts to the changing nature of traffic patterns
• does not require explicit a priori information
• does not rely on any specific fields in the packets
• can run in sub-linear time (packets length)
• Two innovations:
• (p,n)-grams: n-bytes substrings at p byte offset
• ADHIC (Approximate Divisive HIerarchical Clustering)
• Two key features:
• Network traffic redundancy
• Optimal clustering is not required
Towards Understanding Network Traffic
through Whole Packet Analysis
4
Project in a Nutshell
• NetADHICT: our implementation of ADHIC
• It can analyze data as it is received by a network
interface, or offline using libpcap files.
• Observed data is used to generate & update a
(p,n)-gram decision tree.
• This tree serves as a classifier tree reflecting the highlevel structure of network traffic at a given time.
• Deduced structure corresponds to
• the typical network traffic division (TCP vs. UDP;
web vs. non-web), which is
• arrived at using automatically generated context
related (p,n)-grams.
Towards Understanding Network Traffic
through Whole Packet Analysis
5
ADHIC
• Using sampled measure of similarity, ADHIC recursively
subdivides traffic into binary classes until resulting traffic is:
• below certain threshold or
• too similar or dissimilar
• Produced binary tree consists of:
• internal decision nodes with one (p,n)-gram per node
• leaf nodes that constitute final clusters
• Classification rule is based on matching (p,n)-grams.
• Traffic at each terminal cluster is a result of a Boolean
equation constructed by following the path from root to
leaf.
Towards Understanding Network Traffic
through Whole Packet Analysis
6
ADHIC
Towards Understanding Network Traffic
through Whole Packet Analysis
7
ADHIC
• ADHIC adapts to changing traffic by performing the
following two tree operations:
• Splitting, when:
• a leaf contains more than preset threshold of
traffic and
• there is a (p,n)-gram that matches a percentage
between certain range (e.g. 40%-60%).
• Deletion, when:
• a subtree has not matched a minimum threshold
• Both of these statistics are measured over a preset
period of time called: maturation window.
Towards Understanding Network Traffic
through Whole Packet Analysis
8
NetADHICT: Overview
• Licensed under GNU GPL
• It usually starts by separating IP from non-IP, then
later in lower nodes it sequesters specific protocols.
• NetADHICT segregates packets by protocol and
other characteristics (e.g. length).
• (p,n)-grams corresponding to special header or
payload fields allow unconventional classification
measures.
• NetADHICT was tested against four week-long traces
from our CCSL lab.
Towards Understanding Network Traffic
through Whole Packet Analysis
9
NetADHICT: Overview
Towards Understanding Network Traffic
through Whole Packet Analysis
10
NetADHICT: Overview
Towards Understanding Network Traffic
through Whole Packet Analysis
11
NetADHICT: In progress
Towards Understanding Network Traffic
through Whole Packet Analysis
12
NetADHICT: In progress
Towards Understanding Network Traffic
through Whole Packet Analysis
13
NetADHICT: In progress
Towards Understanding Network Traffic
through Whole Packet Analysis
14
NetADHICT: In progress
• Examples of interesting segregation through (p,n)grams:
• (51, 0x00 0x00): part of ARP’s Ethernet frame
trailer
• (64, 0x00 0x0f): part of EIGRP’s non-IP header
• (22, 0x2c 0x06) and (54, 0x01 0x01): part of
IMAPS’s TTL & protocol ID and “NOP, NOP”
options field respectively
• (37, 0xc1 0x0c): HSRP’s 2nd byte of dest port & 1st
byte of UDP length
• (174, 0x00 0x00): part of NetBIOS-DGM’s payload
Towards Understanding Network Traffic
through Whole Packet Analysis
15
Results: Performance
Single protocol cluster: clusters that the traditional classifier
reports as containing packets of only one protocol.
Towards Understanding Network Traffic
through Whole Packet Analysis
16
Results: Performance
• NetADHICT does well with most traffic types.
• Structured packets (e.g. non-IP, UDP, …) are
segregated through header and/or payload
(p,n)-grams.
• Unstructured packets (e.g. TCP) are more
segregated through header (p,n)-grams
including fields like the five tuples and others
(e.g. packet length, QoS field, TTL, options,
padding, …).
• NetADHICT also clusters same protocol packets
running on different port numbers together (e.g.
HTTP on 80 and 8080).
Towards Understanding Network Traffic
through Whole Packet Analysis
17
Results: Multimedia & Encrypted Traffic
• In addition: multimedia (e.g. MS-Streaming) &
encrypted (e.g. SSH, HTTPS, IMAPS) traffic are both:
• Segregated from unencrypted traffic: NetADHICT
either segregates them through header (p,n)grams or shunts them to default clusters
• Distinguished from each other: NetADHICT finds
suitable header (p,n)-grams to separate different
encrypted traffic from each other.
Towards Understanding Network Traffic
through Whole Packet Analysis
18
Results: P2P
• Many P2P applications feature using constantly
changing non-standard port numbers in the same
network session.
• In all the experiments done, NetADHICT was able to:
• cluster the P2P UDP tracker packets together
through a non-IP-header (p,n)-gram.
• cluster all other related TCP packets (data and
control) to the tree’s global default cluster and its
adjacent cluster.
• Even when the running port of all the P2P packets
was maliciously changed to the standard HTTP port
number (i.e. 80), packets were clustered exactly like
before.
Towards Understanding Network Traffic
through Whole Packet Analysis
19
Results: P2P
Towards Understanding Network Traffic
through Whole Packet Analysis
20
Results: P2P
• Two observations:
• NetADHICT rarely uses ports to cluster traffic.
• NetADHICT managed to segregate P2P traffic by
characterizing other network traffic as having
patterns that were absent in the P2P traffic.
• Conclusion:
• So long as most well-behaved traffic can be
appropriately clustered, evasive protocols can
be identified.
Towards Understanding Network Traffic
through Whole Packet Analysis
21
Results: No-Headers
• NetADHICT can also do semantically meaningful
clustering even without looking at the IP header (first
38 bytes).
• Although performance is occasionally degraded,
decision trees made with no header information are
qualitatively similar to those done using all packet
information.
• The main difference is in NetADHICT’s inability to
separate different encrypted traffic when headers
are restricted.
Towards Understanding Network Traffic
through Whole Packet Analysis
22
Results: No-Headers
Towards Understanding Network Traffic
through Whole Packet Analysis
23
Limitations
• Analysis challenge:
• Difficulty (work and time) in analyzing clusters
both manually and automatically
• Privacy issues:
• Our algorithm looks at both headers and
payloads
• Sophisticated design:
• Large configuration space, making it difficult to
choose an optimal set of parameters
Towards Understanding Network Traffic
through Whole Packet Analysis
24
Applications
• Network administration:
• understand overall structure of network traffic
and further assist in monitoring its changes.
• Network security:
• isolate malicious traffic from normal traffic,
(featuring no outdated signatures, long training,
or false alarms).
• Quality of Service:
• actively manage bandwidth by giving each leaf
cluster an equal share of the bandwidth.
• Other applications:
• ADHIC has no built-in knowledge of networking!
Towards Understanding Network Traffic
through Whole Packet Analysis
25
Thank you
Towards Understanding Network Traffic
through Whole Packet Analysis
26