Profiling Network Performance in Multi

Download Report

Transcript Profiling Network Performance in Multi

RelSamp:
Preserving Application Structure
in Sampled Flow Measurements
Myungjin Lee, Mohammad Hajjat,
Ramana Rao Kompella, Sanjay Rao
A plethora of Internet applications
1) Emergence of new applications
2) Measure/Monitor 3) Characterization
Internet

Objectives



Re-provision networks
Detect undesirable behaviors of applications
Prepare network better against major application trends
Monitoring applications at an edge

Goal: Monitoring application behavior

Internet
Edge
Router


Enterprise
Network
Sampled
NetFlow
Current Solution: Sampled NetFlow


Identify number of flows
Identify number of packets
Supported by most modern routers
Key limitation: Application session
structure gets distorted


Small # of flows per application session
Small # of packets per application session
Preserving application structure in flow
measurements

Benefit 1: Enables continuous monitoring of applications



Benefit 2: Application classification becomes easier



Better understanding about communication patterns
Better understanding of characteristics (# of flows, packets)
Statistical machine learning techniques: SVM, C4.5, etc.
Social behavior-based classifier: BLINC
Benefit 3: Detecting undesirable traffic patterns of an
application
Contributions

Introduce the notion of related sampling


Propose RelSamp architecture for realizing related sampling


Flows belonging to the same application session are sampled
with higher probability
Uses three stages of sampling to preserve application structure
Show efficacy in preserving application structure


Captures more number of flows per application session
Significant increase of accuracy in application classification
Related sampling
Original
application
structure
Sampled
NetFlow
Related
sampling
App1

Key idea: Sample more
flows from fewer
application sessions 
App2

App3



Realizing related sampling

Question 1: How to sample an application session ?

Question 2: How to sample packets within an application
session ?
Defining application session

A sequence of packets from an application on a given
host with inter-arrival time ≤ τ seconds

Packets may belong to different flows to different destinations

Example 1: BitTorrent connections to several destinations
within a short span of time constitute an application session
Example 2: Web connections from a browser several seconds
apart constitute different application sessions

Sampling an application session

One possible approach: Similar to Sampled NetFlow




Sample packets with some probability
Create an application session record if no record exists
Update the application session record
Problem: Hard to do in an online fashion



No application session identifier (like flow key)
Need to know all flows that constitute an application session
DPI-based techniques are both difficult and incomplete
Our approach: sampling hosts

Observation: Host is a super-set of an application session


Sample more flows from the same host
Flows originating at a same host closely in time typically
belong to few application sessions


About 80% hosts run fewer than 2 applications in our study
More details in the paper
RelSamp design


Three-stage sampling process consisting of host, flow, and
packet selection stages
Host stage: hash-based sampling




Flow and packet stages: random packet sampling


No state maintained on a per-application basis
Many application sessions for a given host are possibly sampled
Change hash function periodically to track different hosts
Controls fraction of flows sampled in an application session
and packets sampled in a flow
Post processing: Can separate flow records into
application sessions using port-based/statistical classifiers
RelSamp architecture
2
1 Copy
1
2
Ph = selection range / hash space
1
Host-level
bias stage
Ph
Flow Memory
Selection range
H(SrcIP)
Hash space
Flow-level
Pf bias stage
if ( random no. ≤ Pf && no flow record)
create a flow record
Pkt-level
Pp bias stage
if ( random no. ≤ Pp && flow record)
update the flow record
Tunable
parameters
Exploring parametric space



Router sampling budget Pe = f(Ph, Pf, Pp)
Trade-off between accuracy of flow statistics and #
flows/application session
Parameters can be tuned depending on



Objective
Network environment
Examples of tuning parameters by objective



Application classification: low Ph, high Pf, low Pp
Application characterization: lower Ph, high Pf, high Pp
Flow statistics of all flows: Ph = Pf = Pp = Pe
Evaluation goals

Application characterization



Question 1: Is RelSamp effective for sampling more # of flows
in an application session?
Question 2: Can RelSamp estimate statistics of an application
session?
Application classification

Questions 3: Is sampling more # flows in an application session
beneficial for application classification?
Experimental setup

Evaluation of effectiveness for capturing more flows




Trace 1: 1 hour packet trace collected at an edge
RelSamp configuration (other settings in paper): Capture more
flows of app session from many hosts
𝑝ℎ = 0.03, 𝑝𝑓 = 0.76, 𝑝𝑝 = 0.0001 (𝑝𝑒 ≈ 0.001)
Evaluation of application classification accuracy




Trace 2: 13-hour full-payload trace captured at a dorm network
RelSamp setting: Similar setting, but 𝑝𝑓 varies from 0.1 to 1.0
Classifiers: BLINC [SIGCOMM ’05] , SVM, and C4.5
Ground truth is obtained using DPI-based classifier (tstat)
CDF
Flows per application session
More # of flows
per app session
#captured flows/#total flows in an app session
Accuracy (%)
Accuracy of BLINC classifier
~ 50% increase
Sampling rate
Note: classification results on flows using non-standard port
Related work

Flow Sampling [ToN ’06]


Flow Slices [IMC ’05]


Focuses on controlling router resources (CPU and memory)
cSamp [NSDI ’08]


Samples flows once flow record is created
Supports sampling of all traffic by coordinating various vantage
points in a network
FlexSample [IMC ’08]

Support monitoring of traffic subpopulations, but needs to
maintain extra states for approximate checking of predicates
Summary

Introduced the notion of related sampling


Proposed RelSamp architecture


Samples more number of related flows in the same application
session with higher probability
Preserve application structure in sampled flow records
Effective to preserving application session structure


5-10x more flows per application session compared to
Sampled NetFlow
Up to 50% higher classification accuracy than Sampled
NetFlow
Thank you! Questions?
Evaluation method of classification
techniques
Tstat
Ground
Truth
RelSamp
Flow
Record1
Sampled
NetFlow
Flow
Record2
Flow
Sampling
Flow
Record3
Classification Algorithm
(e.g., BLINC, SVM, C4.5)
Packet
Trace
DPI-based
Classifier
Report
# of accurately classified flows
Comparison with other solutions using
BLINC
Sampling rate
Note: classification results on flows using non-standard port