An active queue management scheme to contain high

Download Report

Transcript An active queue management scheme to contain high

Modeling Network Traffic as Images
Seong Soo Kim and A. L. Narasimha Reddy
Computer Engineering
Department of Electrical Engineering
Texas A&M University
{skim, reddy}@ee.tamu.edu
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
2
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
3
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Attack/ Anomaly
• Bandwidth attacks/anomalies, Flash crowds
• DoS – Denial of Service :
– UDP flooding, TCP SYN flooding, ICMP flooding
• Typical Types:
- Single attacker (DoS)
- Multiple Attackers (DDoS)
- Multiple Victims (Worm)
Aggregate Packet header data as signals
Signal/image based anomaly/attack detectors
4
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Motivation (1)
• Previous studies looked at individual flow’s behavior
- Partial state
- RED-PD
These become ineffective with DDoS  Aggregate
• Link speeds are increasing
- currently at G b/s, soon to be at 10~100 G b/s
Need simple, effective mechanisms to implement at line speeds.
• Look at aggregate information of traffic
- Use sampling to reduce the cost of processing
Process aggregate data to detect anomalies.
5
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Motivation (2)
• Signature (rule)-based approaches are tailored to known attacks
– Look for packets with port number #1434 (SQL Slammer)
- Become ineffective when traffic patterns or attacks change
New threats are constantly emerging
Do not want to rely on attack specific information
• Most current monitoring/policing tools are done off-line
- Flowscan, FlowAnalyzer, AutoFocus
Quick identification of network anomalies is necessary to
contain threat
• Can we design generic (and generalized) mechanisms for attack
detection and containment?
Measurement (network)-based real-time detection
6
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
7
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Packet Header
• Carry a rich set of information
- Data : Packet counts, Byte counts, Number of Flows
- Domain : source/destination Address, source/destination
Port numbers, Protocol numbers
Image/Video can represent each data in each domain
• Image processing/Video analysis decipher the
patterns of traffic
- single  multiple (Worm) : horizontal lines
- multiple  single (DDoS) : vertical lines
8
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Domain size Reduction(1)
• Header fields may have large domain spaces
– IPv4 addresses 232, IPv6 addresses 264
• Need to minimize storage and processing complexity for real-time processing
• Employ “domain folding”
• For example: A data structure of a 2 dimensional array count[i][j]
- To record the packet count for the address j in ith field of the IP address
• Effects
- 32-bit address into four 8-bit fields
- Smaller memory 232 (4G)  4*256 (1K)
- Running time O(n) to O(lgn)
- Form of hashing
- Advantages
- It is possible to reverse the hashing to identify the target IP address
restrictively
9
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Data structure for reducing domain size (2)
• Simple example
0
64
128
192
255
3
3
3
3
•
IP 1 = 165. 91. 212. 255,
IP 2 = 64. 58. 179. 230,
IP 3 = 216. 239. 51. 100,
IP 4 = 211. 40. 179. 102,
IP 5 = 203. 255. 98. 2,
No. of Flows = 3
No. of Flows = 2
No. of Flows = 1
No. of Flows = 10
No. of Flows = 2
10
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Data structure for reducing domain size (2)
• Simple example
0
64
128
2
2
10
1
3
255
2 10 1
3
1
2
2
•
192
IP 1 = 165. 91. 212. 255,
IP 2 = 64. 58. 179. 230,
IP 3 = 216. 239. 51. 100,
IP 4 = 211. 40. 179. 102,
IP 5 = 203. 255. 98. 2,
12
1
10
2
3
2
3
No. of Flows = 3
No. of Flows = 2
No. of Flows = 1
No. of Flows = 10
No. of Flows = 2
11
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Visual Representation
0
1
..........
14
15
0
0
0
1
..........
0
254
0
255
16
17
..........
30
31
1
0
1
1
..........
1
254
1
255
..........
..........
IP byte 0
(source IP address,
destination IP address)
..........
..........
..........
..........
IP byte 0
224
225
..........
238
239
254
0
254
1
..........
254
254
254
255
240
241
..........
254
255
255
0
255
1
..........
255
254
255
255
IP byte 1
IP byte 0
IP byte 1
IP byte 2
IP byte 3
IP byte 2
IP byte 3
source IP address
IP byte 0
destination IP address
(a) 1 dimension
(b) 2 dimension
Figure 2. The visualization of network traffic signal in IP address
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
12
ICC 2005
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
13
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Image based analysis
•
•
•
•
Generating useful signals based on traffic image
Treat the traffic data as images
Apply image processing based analysis
Enables applying image/video processing for the analysis
of network traffic.
– Some attacks become clearly visible to the human eye.
– Video compression techniques lead to data reduction
– Scene change analysis leads to anomaly detection
– Motion prediction leads to attack prediction
– Pattern recognition leads to anomaly identification
14
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Impacts of Design Factors for presenting
Network traffic as Images (1)
• Sampling Rates
– For discriminating current traffic
situation based on stationary property,
we should select a sampling frequency
for deriving the most stable images
– The periodicity of traffic
MSE 
  I ( i , j )  I '( i , j ) 
N
2
2
,
 I(i, j) is original image
for intra - frame 
 I' (i, j) is reconstruc ted image
for inter - frame, I(i, j) and I' (i, j) are consecutiv e images
15
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Impacts of Design Factors for presenting
Network traffic as Images (2)
• Sampling Rates
– The traffic is stationary
in normal times and the
selection of sampling
period is not crucial.
– The traffic changes
dynamically with time
in attack times and the
sampling period is a
crucial factor.
– 30 ~ 120 sec. sampling.
16
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Flow-based Network Traffic Images
• The number of flows based visual
representation
– The number of flows in
(source/destination) address domain
– The black dots/lines illustrate more
concentrated traffic intensity.
– An analysis is effective for revealing
flood types of attacks
• Image reveals the characteristics of traffic
– Normal behavior mode
– A single target (DoS)
– Semi-random target : a subnet is fixed
and other portion of address is changed
(Prefix-based attacks)
– Random target :
horizontal (Worm) and vertical scan
(DDoS)
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
17
ICC 2005
Network traffic as images –
normal network traffic
• Standard deviation of
most significant DCT
coefficients of images
– energy distribution of
number of flows over
address domain.
• At normal traffic state,
this signal is at a middle
level between later two
anomalous cases.
• Legitimate flows do not
form any regular shape
due to their random
distribution over address
domain.
18
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Network traffic as images –
semi-random targeted attacks
• The difference between
attackers (or victims) and
legitimate users is
remarkable
– higher variance than
normal traffic
•
The specific area of data
structure is shown in a darker
shade.
– traffic is concentrated on a
(aggregated) single
destination or a subnet.
19
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Network traffic as images –random targeted attacks
•
All of the addresses are
exploited in hostscans attacks
–
•
•
•
Uniform intensity  low
variances
Whole region of the image in
uniform intensity.
Horizontal/vertical lines
indicate anomalies in 2D image
Random (sequential, dictionary
scan) attacks
- Horizontal scan : From the
same source aimed at
multiple targets -Worm propagation
- Vertical scan : From
several machines (in a
subnet) to a single
destination -- DDOS
• Worm propagation type attack
• DDoS propagation type attack
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
20
Summary of Visual representation of
traffic data
• Worm attacks – horizontal line in 2D image
• DDoS attacks – vertical line in 2D image
 Line detection algorithm
• Visual images look different in different traffic modes
• Motion prediction can lead to attack prediction
21
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
22
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Generation of useful Signal
Scene change analysis - DCT
• We can apply various image processing techniques
• From generated images, we can generate useful signals through DCT
(Discrete Cosine Transform)
• DCT is effective for storage reduction and approximation of the
energy distribution in image
• Variance of leading DCT coefficients in 8-by-8 blocks
1
 1 16
22
    ( xk  x ) 
 16 k 1

, where
x k are DCT coefficien
ts and x 
1
16
16
k 1
 xk
 Instead of whole DCT coefficients, we can choose only the dominant
coefficient
23
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Impact of Selecting DCT coefficients (1)
• TCG (GT) : Transformation Coding Gain
– TCG measures the amount of energy packed in the
low frequency (leading) coefficient
DCT transform
[ A ]i , k  a i cos
matrix
 ( 2 k  1) i
, with a 0 
2
 n  diagonal
elements
, for i, k  0,..., N -1,
2N
1
N
, ai
2
N
, i  0
of A  A , where  is covariance
T
matrix
 is correlatio n coefficien
GT 
1
N 1
N
n0
 
2
n
N 1
N

n0
t
2
n
– The higher TCG leads to smaller intra-frame MSE
and higher compression
24
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Impacts of Selecting DCT coefficients (2)
• Intra_frame DCT
– Random traffic can be
packed within fewer
coefficients than semirandom traffic
– Using inter-frame
differential coding,we
can improve the GT
– For MSE of 0.3349, the
required coefficients
reduce from 42 to 3
– TCG increases 2.6
times
25
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Impacts of Design Factors for presenting
Network traffic as Images
• Sampling rates on DCT coefficients
– A sampling rate of 60 seconds maintains the minimum intraframe MSE over the entire range of retained DCT
coefficients
- We can choose 30 ~ 120 sec. as appropriate sampling period.
26
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Attack Estimation (1)
- Motion prediction
• Step 1: complexity reduction
count [ i ][ j ][ n ]  count [ i ][ j  1][ n ]
– Pixels below a mean packet count
– Normalized absolute difference similarity
count [ i ][ j ][ n ]
• Step 2: to find a block of addresses
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
27
ICC 2005
 1 .0
Attack Estimation (2)
- Motion prediction
• Step 3: to calculate the quantitative components
– Starting position
– Motion vector
• Step 4: compensating errors
28
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Advantages
• Not looking for specific known attacks
• Generic mechanism
• Works in real-time
– Latencies of a few samples
– Simple enough to be implemented inline
29
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Contents
• Introduction and Motivation
• Network Traffic as Images
- Visual Representation
• Requirements for Representing Network Traffic as
Images
- Sampling Rates
- Visual modeling Network Traffic as Images
 normal traffic, semi-random attacks, random attacks
• Image Processing for Network Traffic
- Validity of intra-frame DCT
- Inter-frame differential coding
• Conclusion
30
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Conclusion
• We studied the feasibility of analyzing packet header data
through Image and DCT analysis for detecting traffic
anomalies.
• We evaluated the effectiveness of our approach by
employing network traffic.
• Can rely on many tools from signal/image processing area
– More robust offline analysis possible
– Concise for logging and playback
• Real-time resource accounting is feasible
• Real-time traffic monitoring is feasible
– Simple enough to be implemented inline
31
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Thank you !!
32
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005
Processing and memory complexity
• Two samples of packet header data 2*P, P is the size of the
sample data
• Summary information (DCT coefficients etc.) over
samples S
• Total space requirement O(P+S)
• P is 232  4*256 = 1024 (1D), 264  256K (2D)
• S is 32*32  16
 Memory requires 258K
• Processing O(P+S)
• Update 4 counters per domain
• Per-packet data-plane cost low.
33
Seong Soo Kim and A. L. Narasimha Reddy
Texas A & M University
ICC 2005