MultiClassifier: A combination of DPI and ML for application

Download Report

Transcript MultiClassifier: A combination of DPI and ML for application

Presenter
: Muhammad Reza Zulman
Advisor
: Dr. Kai-Wei Ke
Date
: November 19th , 2015
1
 INTRODUCTION.
 APPLICATION LAYER CLASSIFICATION.
 SDN OVERVIEW.
 MULTI-CLASSIFIER
 OVERALL ARCHITECTURE
 MULTI-CLASSIFIER STRUCTURE
 The Selector
 Reliability Threshold
 Optimization
 TESTBED AND EVALUATION
 CONCLUSION
 REFERENCES
2
 The variety of applications used in the Internet has been increasing, in addition to
‘traditional’ applications(e.g. email, web or ftp) new applications have gained strong
momentum(e.g. streaming, gaming or P2P).
 The Ability to dynamically identify and classify flows according to their network
applications is highly beneficial.
3
 The benefits of traffic classification:
 Trend analyses, estimating the size and origins of capacity demand trends for network




planning
Adaptive, network-based marking of traffic requiring specific QoS without direct client
application or end-host involvement.
Dynamic access control, adaptive firewall that can detect forbidden applications, DoS or
other attack
Lawful Interception, enabling minimally invasive warrants and wire-taps based on
statistical summaries of traffic details
Intrusion Detection, detect suspicious activities related to security breaches due to
malicious users or worms
4
 The general way of doing application-layer classification is by using specific devices
that support application-layer classification. Application-layers policies are set on them
to achieve fine-grain management.
 However, in a network, e.g. campus network, may have different vendor’s device.
Different vendor’s device may have different realization, that may end to different result
even to the same flows.
 Thus, it is a challenge to set a policy applied and synchronized in a whole network.
5
 On the other hand, in traditional campus network, the network architecture always
divided into three layers; core, aggregation/distribution, access.
 Switch in the access layer always connect to diverse endpoint and do not have the
ability to do the application-layers classification. It is difficult to apply.
 To increase the granularity of management, we may have to deploy many devices
in different network areas.
 That will increase the burden of the network administrators and raise the rate of
error.
6
 SDN concept introduce an
external and programmable
network control plane, that
create flexible, adaptable,
and open interface to the
network.
 In SDN, control plane and
data plane has been
separated.
 The control plane moved to
logically centralized network
control plane. Simply called
“Controller”
7
• As a software entity, the controller can be
freely programmed according to the
operator’s need.
• SDN Controller: OpenDaylight, Ryu, Onos,
FloodLight, etc.
• The “Controller” are based on protocols to
allow servers to tell the switch where to send
packet.
• Currently, the most popular protocol of SDN
controller is OpenFlow.
8
 Openflow is a set of specification maintained by
ONF.
 There are 12 field in a flow table, which can be
used for L2-L4 classification. By using this field,
the controller able to apply policies.
 However, for Application layer, Openflow is not
effective.
9
 There are two common ways to application-layer classification, Deep Packet
Inspection and Machine Learning based classification.
 DPI checks the packets payload of a flow to match predefined signatures in the
packet’s payload of a flow to match predefined signatures in the packet of flow to
find out what protocol the packet or flow belong to.
 Machine Learning based packet classification commonly use supervised machine
learning algorithm such as Bayesian Network, Decision Tree, or Other algorithm.
 Machine Learning use network traffic that already known as training data to get the
flows character. Then, a new flow that comes will compare to the one that already
train to determine what protocols it belongs to.
10
The comparison of DPI and ML:
•
•
•
•
DPI
High accuracy
Depend on the
signature in database.
Not effective for
encrypted traffic
Consume more CPU
resources.
ML
• Low accuracy
• Depend on training
data set
• Faster than DPI
11
 Since SDN implemented to address the issue on proprietary devices. Another issue
rise to the surface.
 In Software defined network, the controller is the core role of the whole network.
Its performance has a great impact on the network.
 If we just run the DPI classifier, that will obviously consumes too much controller’s
resource, and reduce the throughput of the controller.
 However, if we only use ML as classifier, the accuracy rate of the application might
be low.
12
• To address the issue,
MultiClassifier is presented.
• It is an application layer
classifier combining both DPI
and ML to do the classification.
• Multiclassifier take advantages
of the two classifier to achieve a
high speed while maintain
acceptable accuracy rate.
13
• Classifier Forwarding collects
flows for the classification.
• Receive take packet in, distribute
to selector.
• Selector chose the classification
method.
• API provider provide API for other
module.
• Rest-API to northbound application
to make use of the classification
result.
• Classifier consist of DPI and ML.
14
Classifier Forwarding send flow
tables to switches to do forwarding
and collects different flows, the
Receive get packet and send them to
the selector, the Selector take packets
and send them to the classifier to do
the classification, the result will be
returned to Multi-Classifier.
Multi-Classifier provides the result of
flows to other modules or northbound
applications through API Provider.
15
Selector job is to choose between DPI and
ML.
Since DPI and ML have their own advantages
and disadvantages.
How selector decide which one to choose?
16
 Since ML is much faster than DPI, ML get a higher priority.
 When a new flow comes, Selector will first choose ML to do
classification.
 According to ML result, Selector will decide to select the
result or not.
 There is threshold value that has been set and also can be
changed dynamically according to Ml result.
17
Rthr = Threshold value
Rml = Reliability of ML result
Step 1
• All packet
classified
with ML.
• Proceed to
step 2
Step 2
• Check the ML
result is it
reliable or not
(if Rml > Rthr,
Rml is reliable)
• If not,
proceed to
step 3
Step 3
• Classify
packets with
DPI
classification.
• If DPI result is
unknown, ML
result will be
choosen.
18
 Reliability threshold (Rthr)has a great impact t the speed
and accuracy of the Multi-Classifier.
 If Rthr is too big, only few flows will do DPI. The speed might
fast, but the accuracy would be low.
 If Rthr is to small, then too many flows have to do DPI, which
will consume lot of CPU resources.
19
 To balance between efficiency and accuracy, there is a policies to
update Rthr.
 Assume that DPI Result is always correct.
 Let:
 Tm = Number of ML classification.
 Td = Number of DPI classification.
 Tr = Both have same result.
20
 Let:
 PmlAcc = Accuracy rate of ML classification.
PmlAcc = Tr / Td
 PdpiRate = The proportion of DPI in all calssification.
PdpiRate = Td / Tm
 PdpiThr = Threshold for PdpiRate (Constant)
 PmlAccThr = Threshold for PmlAcc (Constant)
21
 Algorithm for Rthr Update:
If PdpiRate < PdpiThr && PmlAcc < PmlAccThr
then decrease Rthr
End if
If PdpiRate >PdpiThr && PmlAcc < PmlAccThr
then restudy
End if
If PdpiRate < PdpiThr && PmlAcc > PmlAccThr
then do nothing
End if
If PdpiRate > PdpiThr && PmlAcc > PmlAccThr
then increase Rthr
End if
22
 To save the controller resource, the classification only performed for
first 10 packets of a flow to balance between efficiency and accuracy.
 To accelerate the classification process, DPI is not use for encrypted
traffic since it is not effective. e.g. The selector will not choose DPI on
flows with a 443 port.
 In addition, as the controller is running on multi-core processors, the
performance of parallel processing is essential. Multi-classifier is a
multithreaded to scale with CPU cores and use thread pool
technology to do the classification.
23
The testbed is running a server with
specification:
- 2 intel Xeon with 6 cores, 2.0 Ghz.
- 16GB RAM.
- Ubuntu 12.04 LTS.
- Mininet.
• Mininet is running on virtual machine that
is running on the server.
• Lots of traffic captured and save into 5
different data set, 100MB, 500MB, 1GB,
5GB, 10GB.
• Than generate this dataset as the
“packet_in” to controller from Host 1
24
Recognition or Accuracy Rate
Accuracy rate of ML and Multi-classifier:
- Remove all flows that DPI that has not recognized. (Nf)
- Replay all packet to test ML
- Nr is #packets that have same result from both DPI and ML.
PmlAccResult = Nr / Nf
25
• DPI has a relatively high recognition
rate and stable which means, the
signature database of DPI contain
most of the signature in the test
dataset.
• ML has relatively low accuracy rate
and the accuracy rate is varies which
means, the training data a particular
dataset contain too many packet that
ML false recognition.
• Multi-classifier has better accuracy
than ML and more stable because
when ML get low reliability, DPI will
perform.
26
Performance.
In general, ML run faster that DPI and
relatively stable than Multi-classifier,
this is because both DPI and ML has
more stable computation.
Multi-classifier get low performance at
500M, this is because in that dataset, ML
accuracy is low, so DPI must performed.
However, Multi-classifier, again better
DPI.
27
 SDN system bring benefit to address the issue of proprietary devise usage for
application layer classification.
 By combining multiple application layer classifier, the system can achieve much
high classification speed while maintaining a pretty accurate result.
 However, doing application-layer classification brings more traffic and calculation
to the controller, which degrade the performance of the controller, thus the number
of switch manage by controller cannot be scale to large.
 With the development of chips and distribution computing, it may change in the
future.
28
 Li Yungchun, Li Jingxuan, “MultiClassifier: A combination of DPI and ML for
application-layer classification in SDN”, 2nd International Conference on System
and Informatics, pp. 682-686, 2014.
 Qazi Z A, Lee J, Jin T, “Application-awareness in SDN”, Proceedings of the ACM
SIGCOMM 2013 conference on SIGCOMM. ACM, 2013: 478-448.
 Michael Jarschel, Florian Wamser, Thomas Ho’hn, Thomas Zinner, Phouc Tran-Gia,
“SDN-based application-aware networking n the example of youtube video
streaming”, Software Defined Networks (EWSDN), 2013 Second European
Workshop on. IEEE, 2013: 87-92
29
End