Fig. 1 - Carleton University

Download Report

Transcript Fig. 1 - Carleton University

Algorithms for Identification of Network Data Streams
Jun Li*, and Peter Rabinovitch**
*Carleton University, **Bell Labs, Alcatel-Lucent
Supervisor: Dr. Yiqiang Q. Zhao (Carleton University)
ALGORITHMS (CONT.)
RESULTS (CONT.)
•flow length in packets
INTRODUCTION
Background:
There is too much traffic in the Internet and identifying accurately its
essential traits is a challenging problem. Existing techniques typically
rely on manually generated signatures specified in packet headers,
which makes traffic identification tests relatively simple. However, it
lacks the flexibility required to deal with the constant changes in
network traffic patterns.
Data space
•mean packet size
Problems:
Cluster of atypical traffic
Cluster of typical traffic
• How to constantly sense/detect changes of network traffic streams
• How to identify suspicious traffic streams without pre-specified
signatures
• Can we generate network traffic signatures automatically (i.e.,
without consumption of a network expert’s power)
Fig. 2: Clustering 2-dimensional Data
Fig. 6: Flow Clustering and Classification in S1
Fig. 5: Change Detection in S3
3. Signature Extraction
 A signature-based algorithm similar to Bro [2], SNORT [3], and based on
[4]
 Only the cluster of atypical traffic is examined for extracting signatures
• Allocate network resources only when needed
RESULTS
Proposed Solution:
AutoImmune System: an Intelligent IP Service Infrastructure
AUTOIMMUNE SYSTEM ARCHITECTURE
Alarms of
signature changes
Stalenes
s
Detector
New
signatures
5-tuple,
packet size,
…
Purchase
d
Signature
s
Signature
Factory
Packets of
changed
signature
Router
As traffic flows through router,
Staleness Detector monitors the
characteristics of traffic and
triggers an alarm if the behavior
has changed significantly. The
alarm starts a process on
Signature Factory, which clusters
the flows matching the alarmed
signature into groups. The new
cluster is analyzed for signature.
The new signatures are merged
with purchased signatures, and
then the new set of signatures is
tested against a corpus of end
user traffic.
Fig. 1: AutoImmune Architecture
ALGORITHMS
In an implemented system, 20 computers are connected through Router (shown
in Fig. 1) and communicate multimedia traffic. Staleness Detector and
Signature Factory connect Router and run separately. Five types of traffic flows
are Web, Mix, Smtp, VoIP, and Video. The statistics of the traffic flows are
shown in Table 1.
Avg. flow length Std. flow length Avg. Packet size Std. packet size
(# of packets)
Web
6
2
1500
100
SMTP
3
2
1500
100
VoIP
200
50
200
100
Video
600
100
400
200
Mix
40
2
1000
100
The algorithm (in Staleness Detector) keeps a dictionary of data elements that
are deemed useful in predicting future data elements. New data points that are
not well explained by this dictionary are signaled as alarms. For each new data
point
Network speed is assumed to be 1 Gbps. At the beginning of simulation, each
computer generates traffic without Mix flows. When simulation enters steady
state, Mix flows start to be generated on each computer with a specified
proportion shown in Table 2. The payload of each Mix packet is injected with a
synthetic worm. The injected Mix traffic is of Web type while passing through
the router.
S1
45%
20%
20%
10%
5%
S2
49%
20%
20%
10%
1%
S3
49.8%
20%
20%
10%
0.2%
Table 2: Proportions of Traffic Flows
 Compute distance from this point to the points already in a dictionary
Define the following parameters for each simulation run:
 If this point is very far, then set Red Alarm
1) T -- Period from when malware (e.g., Mix traffic) starts until new signature is obtained by Router
 If it is somewhat far, then set Orange Alarm
2) N -- Number of items in the Cluster of atypical traffic
 If it is close, then no alarm
3) N’ -- Number of items in the atypical traffic Cluster that are NOT malware (or of Mix type)
 Periodically, evaluate Orange Alarms, and clean up dictionary
Simulation run
T
Fig. 8: Flow Clustering and Classification in S3
N
N’
MEAN
L
S1
0.25 679 48
1030.6
21
S2
0.679 859 80
1048.9
18
S3
3.14 789 117 1091.2
20
Table 3: Numerical Values of Parameters
Table 1: Five Types of Traffic Flows
Simulation run Web Smtp Voip Video Mix (or Malicious)
1. Change Detection
Fig. 7: Flow Clustering and Classification in S2
4) MEAN -- Mean of the length (in Bytes) of packets in the Cluster of atypical traffic
5) L – Length (in Bytes) of the signature extracted
CONCLUSION
AutoImmune addressees a more general traffic stream identification problem that
needs complex packet-payload based membership tests without pre-specified
signature sets. We implemented AutoImmune by integrating the three developed
algorithms, and tested the system against simulated data traffic. The system runs
very well in various networking environments for non-stationary traffic streams. It
adapts automatically to changes in the characteristics of network traffic and
identifies new types of traffic patterns almost in real time. (It takes less than 10
seconds in a Gbps communication network to obtain a new traffic pattern).
Simulation results showed that the system successfully identifies a new type of
network traffic, which occupies as small as 0.2% of total network traffic. To the best
of our knowledge, the lowest reachable worm detection rate that has been
reported in the literature is 1.1% by a worm detection system referred to as
DoWitcher. The smaller the percentage of the new type of traffic is, the longer the
time spent for identifying the new type of signature is.
REFERENCES
[1] T. Ahmed, M. Coates and A. Lakhina, Multivariate online anomaly detection using kernel recursive
least squares, in Proc. IEEE INFOCOM, Anchorage, AK, May 2007.
[2] Paxson, Vern, “Bro: A System for Detecting Network Intruders in Real-Time,” Lawrence Berkeley
National Laboratory Proceedings, the 7th USENIX Security Symposium, Jan. 26-29, 1998, San Antonio
TX.
[3] Roesch, Martin, “Snort - Lightweight Intrusion Detection for Networks,” Proc. USENIX Lisa '99,
Seattle: Nov. 7-12,1999.
[4] F. Hao, M.S. Kodialam, T.V. Lakshman, and H. Zhang, “Fast Payload-Based Flow Estimation for
Traffic Monitoring and Network Security,” in Proc. ANCS 2005, Oc. 26-28, 2005, New Jersey, USA.
A related study to our change detection algorithm is [1].
2. Data Clustering and Classification
The algorithm (in Signature Factory) classifies test data points into two
clusters, typical and atypical traffic clusters.
 The data space is split into small regions
 Obtaining TWO density estimates for each region
ACKNOWLEDEMENT
1. The proportion of known observations
2. The proportion of test observations
 The observations in areas that have a nil (or very small) estimate under
typical traffic, but a relatively large estimate assuming test traffic, are
classified as atypical traffic.
Fig. 3: Change Detection in S1
Fig. 4: Change Detection in S2
This research was supported in part by the MITACS Internship Program. The authors
would like to acknowledge the contributions made by Katrina Rogers-Stewart, Yihui Tang,
and Pin Yuan.