Packet Sampling for Worm and Botnet Detection in TCP Connections

Download Report

Transcript Packet Sampling for Worm and Botnet Detection in TCP Connections

Packet Sampling for Worm and
Botnet Detection in TCP
Connections
Reporter: 林佳宜
Email: [email protected]
2010/10/25
1
References

Lothar Braun, Gerhard Münz, Georg Carle.
"Packet Sampling for Worm and Botnet
Detection in TCP Connections." In Proc. of
IEEE/IFIP Network Operations and Management
Symposium (NOMS) 2010
2
Outline





Introduction
Sampling algorithm
TCP connection
Bloom filters
Conclusion
3
Introduction

Signature-based network intrusion detection
systems NIDS
 Ex: Snort
 packet headers and packet payload are checked

The permanent growth of network traffic
 single system are often not sufficient to analyse the entire traffic
 random packet is likely loss

Present a new “sampling algorithm”
 selects packets carrying the first N payload bytes of every TCP
connection
 can deployment in high-speed networks
 the required amount of memory is constant
4
Previous work

Detect First 1000 bytes of payload
 Anomalous Payload-Based Network Intrusion Detection. (RAID)
2004

Classify a flow as belonging to a P2P
application after looking at the first ten
packets
 Accurate, Scalable In-Network Identification of P2P Traffic Using
Application Signatures (www) 2004

Most application signatures appear within
the first five packets of a flow
 A Hybrid Approach for Accurate Application Traffic Identification
(E2EMon) 2006
5
TCP connections

For NIDS performing session analysis
 the traffic volume in both direction is usually not
symmetric
 separate limits for both directions
 result in an earlier cut-off of one direction

IDS cannot properly continue session
analysis as one direction of traffic is missing

Beginning of a flow is that first packets
 sufficient information to classify the entire flow as
harmful or benign
6
Analysis is based on traffic

93 different worms and bots
 run in a controlled environment

The detection is done with two rule sets
 shipped with Snort 2.8.4.1
 emergingthreats.net (on June 11th 2009)

4030 alarms are raised
 3511 by TCP packets
 451 by UDP packets
 68 by ICMP packets

The TCP alarms constitute the majority of
alarms.
7
TCP Alarms
95 alarms are raised by the shipped
 3416 by setemergingthreats rule set

8
Calculate the position

We calculate the position within the TCP
connection at which an alarm is raised using
 TCP sequence
 acknowledgement numbers
 TCP payload length
30% of all alarms are only need the
handshake packets
 96% are found within less than 3kB of
payload
 The last alarms are found at about 682kB

 belong to a generic shellcode rule
9
Number of payload bytes
10
Accurate TCP Connection
Tracking

TCP connections are identified by the
addresses and ports

Start and end of a regular TCP connection
are defined
 SYN, FIN, and RST packets that are exchanged
 a buffer is needed to withhold a packet

Exceptional situations resulting in an
increased number of TCP connections
 TCP port and network scans
 SYN flooding attacks
11
Simplified TCP Connection
Tracking[1/2]

The first simplification concerns the
detection of connection establishments
 SYN packet shortly followed by a second packet
without SYN flag

Both packets have to be exchanged between
the same endpoints
 identified by tuples of IP address and port number

The two packets have to be observed within
a small time interval
 determined three seconds to be an appropriate value
12
Simplified TCP Connection
Tracking[2/2]

The second simplification concerns the
connection reassembly:
 do not perform any packet reordering nor do we
remove duplicated packets

Leave these tasks to the subsequent
packet analysis step
 e.g., an NIDS
13
Selecting and dropping packets
Sample those packets containing the first
N bytes of payload
 They count the payload lengths in the
order of packet arrival

 But without regarding sequence numbers

In the presence of packet reordering and
duplicates
 selecting packets which should not be sampled (false
positives)
 dropping packets which should be sampled (false
negatives).
14
Bloom filters

Bloom filter is a probabilistic data structure
 composed of a bit array and a set of hash functions
 Initially, every bit in the array is set to zero

If a new element is tobe inserted into the
set
 hash functions must be calculated for this element
 All corresponding bits are then set to one

False positives are possible due to collisions
in the hash functions
 query id true, but not in the set
15
Memory consumption



calculated depending on the number of used
hash functions (l)
the collision probability of the hash functions
(p)
the number of stored elements (k)
16
Bloom filters variants

They use two Bloom filters variants
summarized

The first variant is called Time-out Bloom
filter[1]
 Its array is composed of timestamps instead of bits.

The second Bloom filter variant is called
Count-Min Sketch (CMS)[2]
17
Sampling algorithm

Selects the first packets of a TCP connection
until a maximum of N bytes of payload has
been exported
 uses two Time-out Bloom filters
 one CMS to store the required connection states

Each TCP connection is identified
 source IP address (SA), destination IP address (DA),
source port (SP), and destination port (DP)
18
The three filters

The first Time-out Bloom filter
 stores the timestamps of all observed SYN packets

The CMS
 stores the number of payload bytes which need to be
exported for an established TCP connection

The second Time-out Bloom filter
 stores the point in time
 because the maximum number of payload bytes was
reached or because a FIN or RST packet was observed.
19
Traces Used in Evaluation

Using two traffic traces
 a student residence at the University of Twente
 their research group at the Technische Universit¨at
Munich

The two traces will be called
 Twente trace and Munich trace
20
Amount of sampled traffic


Twente trace, between 476,817 ( N = 1kB) and
512,334 (N = 25kB) packets are sampled
Munich trace , between 314,063 and 326,742 TCP
packets are sampled
21
Empirical Analysis of Sampling
Errors

The number of hash functions influences the
computing cost
 a small number of hash functions is desired

A large number of hash functions reduces
the probability of collisions

A good trade-off between computational
complexity and collision probability
 found that l = 3 hash functions
22
Sampling errors

Sampling of payload in the Twente trace
 1kB only 30,130 (7.35%) and 30,694 (7.36%) error
 25kB only 59,789 (11.6%) and 73,925 (14.4%) error

Bloom filters increases the sampling errors
by 0.1% to 24%, depending on the traffic
23
Evading detection

Attacker to circumvent packet selection
 locate the malicious part of payload beyond the
first N bytes of a TCP connection
 inserting packets with identical addresses and ports
but invalid sequence numbers
 performing a TCP scan in order to poison the
values stored
 sending empty TCP ACK packets

To make evasion more difficult
 can dynamically vary N over time
24
Conclusion

We demonstrate the usability of our
sampling strategy for botnet

Using of a simplified TCP connection
tracking mechanism and Bloom filters to
store the required connection states

the algorithm can be efficiently implemented
in software or hardware

It would be possible to develop a similar
sampling algorithm for UDP traffic
25
Questions
26
1.
S. Kong, T. He, X. Shao, and X. Li, “Time-out Bloom Filter: A
New Sampling Method for Recording More Flows,” in Proc. of
International Conference on Information Networking (ICOIN) 2006,
Sendai, Japan, Jan. 2006.
2.
G. Cormode and S. Muthukrishnan, “An Improved Data Stream
Summary: The Count-Min Sketch and its Applications,” Journal of
Algorithms, vol. 55, no. 1, 2005.
27