Packet Sampling for Worm and Botnet Detection in TCP Connections
Download
Report
Transcript Packet Sampling for Worm and Botnet Detection in TCP Connections
Packet Sampling for Worm and
Botnet Detection in TCP
Connections
Reporter: 林佳宜
Email: [email protected]
2010/10/25
1
References
Lothar Braun, Gerhard Münz, Georg Carle.
"Packet Sampling for Worm and Botnet
Detection in TCP Connections." In Proc. of
IEEE/IFIP Network Operations and Management
Symposium (NOMS) 2010
2
Outline
Introduction
Sampling algorithm
TCP connection
Bloom filters
Conclusion
3
Introduction
Signature-based network intrusion detection
systems NIDS
Ex: Snort
packet headers and packet payload are checked
The permanent growth of network traffic
single system are often not sufficient to analyse the entire traffic
random packet is likely loss
Present a new “sampling algorithm”
selects packets carrying the first N payload bytes of every TCP
connection
can deployment in high-speed networks
the required amount of memory is constant
4
Previous work
Detect First 1000 bytes of payload
Anomalous Payload-Based Network Intrusion Detection. (RAID)
2004
Classify a flow as belonging to a P2P
application after looking at the first ten
packets
Accurate, Scalable In-Network Identification of P2P Traffic Using
Application Signatures (www) 2004
Most application signatures appear within
the first five packets of a flow
A Hybrid Approach for Accurate Application Traffic Identification
(E2EMon) 2006
5
TCP connections
For NIDS performing session analysis
the traffic volume in both direction is usually not
symmetric
separate limits for both directions
result in an earlier cut-off of one direction
IDS cannot properly continue session
analysis as one direction of traffic is missing
Beginning of a flow is that first packets
sufficient information to classify the entire flow as
harmful or benign
6
Analysis is based on traffic
93 different worms and bots
run in a controlled environment
The detection is done with two rule sets
shipped with Snort 2.8.4.1
emergingthreats.net (on June 11th 2009)
4030 alarms are raised
3511 by TCP packets
451 by UDP packets
68 by ICMP packets
The TCP alarms constitute the majority of
alarms.
7
TCP Alarms
95 alarms are raised by the shipped
3416 by setemergingthreats rule set
8
Calculate the position
We calculate the position within the TCP
connection at which an alarm is raised using
TCP sequence
acknowledgement numbers
TCP payload length
30% of all alarms are only need the
handshake packets
96% are found within less than 3kB of
payload
The last alarms are found at about 682kB
belong to a generic shellcode rule
9
Number of payload bytes
10
Accurate TCP Connection
Tracking
TCP connections are identified by the
addresses and ports
Start and end of a regular TCP connection
are defined
SYN, FIN, and RST packets that are exchanged
a buffer is needed to withhold a packet
Exceptional situations resulting in an
increased number of TCP connections
TCP port and network scans
SYN flooding attacks
11
Simplified TCP Connection
Tracking[1/2]
The first simplification concerns the
detection of connection establishments
SYN packet shortly followed by a second packet
without SYN flag
Both packets have to be exchanged between
the same endpoints
identified by tuples of IP address and port number
The two packets have to be observed within
a small time interval
determined three seconds to be an appropriate value
12
Simplified TCP Connection
Tracking[2/2]
The second simplification concerns the
connection reassembly:
do not perform any packet reordering nor do we
remove duplicated packets
Leave these tasks to the subsequent
packet analysis step
e.g., an NIDS
13
Selecting and dropping packets
Sample those packets containing the first
N bytes of payload
They count the payload lengths in the
order of packet arrival
But without regarding sequence numbers
In the presence of packet reordering and
duplicates
selecting packets which should not be sampled (false
positives)
dropping packets which should be sampled (false
negatives).
14
Bloom filters
Bloom filter is a probabilistic data structure
composed of a bit array and a set of hash functions
Initially, every bit in the array is set to zero
If a new element is tobe inserted into the
set
hash functions must be calculated for this element
All corresponding bits are then set to one
False positives are possible due to collisions
in the hash functions
query id true, but not in the set
15
Memory consumption
calculated depending on the number of used
hash functions (l)
the collision probability of the hash functions
(p)
the number of stored elements (k)
16
Bloom filters variants
They use two Bloom filters variants
summarized
The first variant is called Time-out Bloom
filter[1]
Its array is composed of timestamps instead of bits.
The second Bloom filter variant is called
Count-Min Sketch (CMS)[2]
17
Sampling algorithm
Selects the first packets of a TCP connection
until a maximum of N bytes of payload has
been exported
uses two Time-out Bloom filters
one CMS to store the required connection states
Each TCP connection is identified
source IP address (SA), destination IP address (DA),
source port (SP), and destination port (DP)
18
The three filters
The first Time-out Bloom filter
stores the timestamps of all observed SYN packets
The CMS
stores the number of payload bytes which need to be
exported for an established TCP connection
The second Time-out Bloom filter
stores the point in time
because the maximum number of payload bytes was
reached or because a FIN or RST packet was observed.
19
Traces Used in Evaluation
Using two traffic traces
a student residence at the University of Twente
their research group at the Technische Universit¨at
Munich
The two traces will be called
Twente trace and Munich trace
20
Amount of sampled traffic
Twente trace, between 476,817 ( N = 1kB) and
512,334 (N = 25kB) packets are sampled
Munich trace , between 314,063 and 326,742 TCP
packets are sampled
21
Empirical Analysis of Sampling
Errors
The number of hash functions influences the
computing cost
a small number of hash functions is desired
A large number of hash functions reduces
the probability of collisions
A good trade-off between computational
complexity and collision probability
found that l = 3 hash functions
22
Sampling errors
Sampling of payload in the Twente trace
1kB only 30,130 (7.35%) and 30,694 (7.36%) error
25kB only 59,789 (11.6%) and 73,925 (14.4%) error
Bloom filters increases the sampling errors
by 0.1% to 24%, depending on the traffic
23
Evading detection
Attacker to circumvent packet selection
locate the malicious part of payload beyond the
first N bytes of a TCP connection
inserting packets with identical addresses and ports
but invalid sequence numbers
performing a TCP scan in order to poison the
values stored
sending empty TCP ACK packets
To make evasion more difficult
can dynamically vary N over time
24
Conclusion
We demonstrate the usability of our
sampling strategy for botnet
Using of a simplified TCP connection
tracking mechanism and Bloom filters to
store the required connection states
the algorithm can be efficiently implemented
in software or hardware
It would be possible to develop a similar
sampling algorithm for UDP traffic
25
Questions
26
1.
S. Kong, T. He, X. Shao, and X. Li, “Time-out Bloom Filter: A
New Sampling Method for Recording More Flows,” in Proc. of
International Conference on Information Networking (ICOIN) 2006,
Sendai, Japan, Jan. 2006.
2.
G. Cormode and S. Muthukrishnan, “An Improved Data Stream
Summary: The Count-Min Sketch and its Applications,” Journal of
Algorithms, vol. 55, no. 1, 2005.
27