Slides - IEEE CloudNet 2013

Download Report

Transcript Slides - IEEE CloudNet 2013

Classification of Applications in
HTTP Tunnels
By
Gajen Piraisoody, Changcheng Huang ,Biswajit Nandy, Nabil
Seddigh
Electrical and Computer Engineering
Carleton University.
Ottawa, ON. Canada.
12 November 2013
Outline
•
•
•
•
•
•
•
Overview
Motivation
Problem Statement
Contribution
Approach to classification
Evaluation
Conclusion
Slide 2
Overview – HTTP Tunnel
 What is HTTP Tunnelled Traffic?
• HTTP port used to carry web traffic
• Non-HTTP applications are wrapped in HTTP protocols
• HTTP port now tunnels email, chat, video, image, audio, file-transfer and
peer to peer traffic
 Why HTTP Tunnel non-HTTP applications?
• HTTP clients (browser) are readily available and deployable
• Tunneling permits applications to by-pass restricted network connectivity
that exists in the form of firewalls, proxy and NAT
Slide 3
Motivation
 HTTP Traffic Classification
• HTTP traffic in an entire network is about 80%
• HTTP tunneled traffic is not identifiable by ports alone
• Tunneled traffic like YouTube and Netflix is increasing in cloud network
• Info on tunneled traffic helps cloud-centre management with planning,
provisioning and ensuring quality of service
 Why flow-based against DPI classification process?
• Provides a scalable software solution(less CPU consumption)
• Can classify encrypted data
Slide 4
Problem Statement
 Given network traffic measured with NetFlow
 Find a way to classify HTTP tunnelled traffic
• Audio (Radio & Music), Video and File-transfer
 No training dataset needed for the proposed algorithm
 Use information available from NetFlow only
Slide 5
Contribution
 Proposed scheme classifies HTTP tunneled traffic: audio(radio
& music), video and file-transfer
 Proposed scheme helps audio classification by using
‘occupancy’ feature
 Proposed scheme enhances classification performance by
including flow-group found using flows from Content
Servers(subnet masked IP of long-flow)
Slide 6
Approach in detail
Identify long-flow HTTP traffic
Parameter : BPF
Classify radio traffic
Parameter : BPF, BPP, BPS, Occupancy
Classify music traffic
Parameter : BPF, BPP, BPS, Occupancy
Classify video traffic
Parameter : BPF, BPP, BPS, Flow-group
Classify file-transfer traffic
Parameter : BPF, BPP, BPS, Flow-group
Bytes-per-second(BPS), Bytes-per-flow(BPF), Bytes-per-pkt(BPP)
Slide 7
Approach to Classification
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 8
Identify Long-flow HTTP Traffic
 Identifying HTTP Traffic
HTTP_PORTS
80, 443, 1935, 8008, 8080, 8088, 8090
 Long-flow has byte size larger than a threshold
 Audio, video and file-transfer are generally long-flow
Slide 9
Approach
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 10
Classify Audio Traffic
99.4 % of radio rates are between 20 and 320 Kbps (Statistics from 3683 online radio web sites)
98% of online music rates are between 64 and 320Kbps (Statistics from >20 online music sites)
95% Confidence Interval of radio bytes-per-packet are between 900 and 1470 (Samruay et.al [1])
95% Confidence Interval of music bytes-per-packet are between 1260 and 1500 (Samruay et.al [1])
Slide 11
Classify Audio Traffic
 Behavioral analysis: Online audio listener typically listens to
audio for more than 5 minutes
Average download rate (Mbps)
 There are two distinct audio types : Radio & Music(songs)
6
5
4
3
2
1
0
music
(Grooveshark)
radio
(Hdradio)
video
(CTV)
 New concept : Occupancy helps classify audio. Occupancy is a ratio of
the flow duration over the entire duration of a chunk of time.
Slide 12
Classify Audio Traffic
Difference between Radio & Music
Continuous - Radio contents appears to download
Dirac - Songs in a playlist are downloaded & played
every second of the flow
one at a time
The max/min size of a radio flow is dependent on
The max/min size of a music flow is dependent on
maximum flow-period configuration and the offered
max/min song duration and offered online music
radio rates
rates
95% confidence interval of radio occupancy from DS- 95% confidence interval of music occupancy from
1,DS-2,SME-6,SME-7 and SME-8 is 82%,100%
DS-1,DS-2,SME-6,SME-7 and SME-8 is 0%,55%
Assumption : Minimum number of radio-flows are
Assumption : Minimum number of music-flows are
two (5 minutes at least)
two ( 5 minutes at least)
Assumption : Maximum radio-phase timeout is
Maximum music-phase timeout is based on
based on a flow-period(120 seconds)
maximum song duration (382 seconds)
Slide 13
Approach
Identify Long-flow HTTP Traffic
Classify Audio Traffic
Classify Video & File-transfer Traffic
Slide 14
Background
•
Multimedia Distribution (3 types)
Client
Server
Listening
3) Metafile
Web Browser
HTTP Server
Media Player
CDN’s Authoritative DNS Server
CDN_1
CDN_n
Slide 15
Classify Video & File-transfer Traffic
 Video flow-attributes (bytes-per-packet, bytes-per-flow, download rates)
& flow-group technique (FG) are used to classify video & file-transfers
 Flow-group (FG)
• Video flow is associated with meta-data, style sheet, advertisements
• Kei.et.al[3] defined FG as the number of flows that occur within a
few seconds of video-flow with same destination-IP address
• Our expanded flow-group also includes flows that occur within a
longer duration that have the same subnet masked source-IP
address and the same destination-IP address
Slide 16
An Example
Log10(Bytes)
Flow Size
8
7
6
5
4
3
2
1
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536
flow-index
TIme (Seconds)
Flow Duration
90
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
flow-index
Slide 17
Example cont`d
Bytes-per-packet
1600
1400
1200
1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233343536
Flow Index
Type of Flow
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
flow-index
Slide 18
Classify Video & File-transfer Traffic
All flow-group statistics are estimated from dataset DS-4 and DS-5
Kei.et.al's flow-group - 98% within 4 seconds
before video-flow and 97.8% of flow-group are
within 1 seconds after video-flow
-60
-4
0
video-flow
1
10
Flow-group range (seconds)
Improved flow-group - 94.4% within 60 seconds
before video-flow and 94.1% of flow-group are
within 10 seconds after video-flow
-92.6% of flow-group-bytes-per-flow is above 1000 and below 500000
-Almost 100% of flow-group bytes-per-packet are above 200
Slide 19
Classify Video & File-transfer Traffic
Green is original flow-group(FG), Yellow is improvised flow-group. Both FG are run
:
Start
Gather potential V/F flows
For every potential V/F flow, gather potential
• flow > 0.5MB
flow-group(FG) flows when:
• & > 1260 bytes-per-pkt
• & > 128Kbps
• FG flow > V/F start-time – 4
If FG == true:
• &FG flow < V/F start-time + 1
• & FG flow and V/F has same dest-IP
• & order by destination-IP
and flow start time
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
If FG >0:
Label video
else:
Label file-transfer
For V/F-phase gather potential FG flows:
• Same source IP address-subnet
• Same destination IP address
• & FG flow > V/F start-time – 60
End
• &FG flow < V/F start-time + 10
If FG == true:
inc FG counter
• & FG flow between 1000B and 0.5 MB
• & FG flow between 200 and 1500 BPP
Slide 20
inc FG counter
Evaluation
 Datasets used to test algorithms
 Accuracy measurement assessment
•
•
•
•
Precision is the systems correct predictions against all predicted value.
That is precision = TP / (TP+FP)
Recall is the systems correct predictions against all actual correct value.
That is recall = TP / (TP + FN)
F-Measure is the harmonic mean of recall and precision. That is Fmeasure => 2 * Precision * Recall / (Precision + Recall)
accuracy = TP + TN / (TP + FP + FN + TN) – true results
 Compare against other algorithms


NaïveBayes
SVM (Support Vector Algorithm)
Slide 21
Evaluation –
Datasets
SME-6
SME-7
SME-8
Date
1/7/2013
1/22/2013
1/23/2013
Duration(s)
24723
28207
13628
Start-time (GMT-5) 10:18:04
10:29:04
10:56:20
Flows
249822
287616
198409
Packets
13376109
15351639
10170693
Bytes
11158181285 13589511746 8728052938
HTTP Flows
75485
87181
63951
HTTP Packets
7346663
8814438
5628558
HTTP Bytes
10456335955 12545720613 7982629610
Slide 22
Evaluation –
Results
F-Measure
NaivesBayes
SVM
Proposed Algorithm
94.2%
93.6%
93.0%
89.7%
84.9%
79.7%
72.9%
86.6%
85.1%
82.5%
70.8%
66.5%
59.5% 60.8%
64.0%
60.4%
56.1%
49.1%
42.6%
39.4%
43.1%
40.4%
27.5%
23.2%
21.6%
16.8%
12.5%
SME6-Audio
SME6-File
SME6-Video
SME7-Audio
SME7-File
SME7-Video
SME8-Audio
SME8-File
Slide 23
SME8-Video
Evaluation –
Results
Accuracy
90.9%
89.9%
70.5%
73.5%
71.4%
42.0%
39.1%
17.8%
16.3%
SME-6
SME-7
SME-8
NaivesBayes
39.1%
73.5%
71.4%
SVM
17.8%
16.3%
42.0%
Proposed Algorithm
70.5%
89.9%
90.9%
Slide 24
Conclusion
• Proposed algorithm uses flow-based approach and
classifies high percentage of tunneled traffic : audio, video
and file-transfer
• Proposed audio algorithm:
• Used a concept called occupancy to classify radio & music traffic
• Proposed video & file-transfer algorithm
• Used improvised flow-group method to help increase
classification accuracy of video and file-transfer traffic
• Proposed scheme’s F-measure is at least 10% more than
NaiveBayes and SVM
Slide 25
Reference
[1] Samruay Kaoprakhon , Vasaka Visoottiviseth, "Classification of Audio and Video Traffic over HTTP Protocol," in Communications and Information
Technology, 2009. ISCIT 2009. 9th International Symposium on, Sept 2009
[2] M. Twardos, "The Information Diet," 2011. [Online]. Available: http://theinformationdiet.blogspot.ca/2011/11/probability-distribution-of-song-length.html.
[Accessed 2013]
[3] K Takeshita, T Kurosawa, M Tsujino and M Iwashita, "Evaluation of HTTP Video Classification Method Using Flow Group Information," in
Telecommunications Network Strategy and Planning Symposium (NETWORKS), 2010 14th International, Sept 2010.
[4] H.Kim, K.Claffy, M.Fomenkov, D.Barman, M.Falutsos, K.Lee, " Internet Traffic Classification Demystified: Myths, Caveats, and the Best Practices
Classification of Audio and Video Traffic over HTTP Protocol," in ACM, 2008
[5] POWERS, D.M.W. “EVALUATION: FROM PRECISION, RECALL AND F-MEASURE TO ROC, INFORMEDNESS, MARKEDNESS & CORRELATION ," in
Journal of Machine Learning Technologies, Volume 2, Issue 1, 2011, pp-37-63
Slide 26