Profiling Network Performance in Multi
Download
Report
Transcript Profiling Network Performance in Multi
RelSamp:
Preserving Application Structure
in Sampled Flow Measurements
Myungjin Lee, Mohammad Hajjat,
Ramana Rao Kompella, Sanjay Rao
A plethora of Internet applications
1) Emergence of new applications
2) Measure/Monitor 3) Characterization
Internet
Objectives
Re-provision networks
Detect undesirable behaviors of applications
Prepare network better against major application trends
Monitoring applications at an edge
Goal: Monitoring application behavior
Internet
Edge
Router
Enterprise
Network
Sampled
NetFlow
Current Solution: Sampled NetFlow
Identify number of flows
Identify number of packets
Supported by most modern routers
Key limitation: Application session
structure gets distorted
Small # of flows per application session
Small # of packets per application session
Preserving application structure in flow
measurements
Benefit 1: Enables continuous monitoring of applications
Benefit 2: Application classification becomes easier
Better understanding about communication patterns
Better understanding of characteristics (# of flows, packets)
Statistical machine learning techniques: SVM, C4.5, etc.
Social behavior-based classifier: BLINC
Benefit 3: Detecting undesirable traffic patterns of an
application
Contributions
Introduce the notion of related sampling
Propose RelSamp architecture for realizing related sampling
Flows belonging to the same application session are sampled
with higher probability
Uses three stages of sampling to preserve application structure
Show efficacy in preserving application structure
Captures more number of flows per application session
Significant increase of accuracy in application classification
Related sampling
Original
application
structure
Sampled
NetFlow
Related
sampling
App1
Key idea: Sample more
flows from fewer
application sessions
App2
App3
Realizing related sampling
Question 1: How to sample an application session ?
Question 2: How to sample packets within an application
session ?
Defining application session
A sequence of packets from an application on a given
host with inter-arrival time ≤ τ seconds
Packets may belong to different flows to different destinations
Example 1: BitTorrent connections to several destinations
within a short span of time constitute an application session
Example 2: Web connections from a browser several seconds
apart constitute different application sessions
Sampling an application session
One possible approach: Similar to Sampled NetFlow
Sample packets with some probability
Create an application session record if no record exists
Update the application session record
Problem: Hard to do in an online fashion
No application session identifier (like flow key)
Need to know all flows that constitute an application session
DPI-based techniques are both difficult and incomplete
Our approach: sampling hosts
Observation: Host is a super-set of an application session
Sample more flows from the same host
Flows originating at a same host closely in time typically
belong to few application sessions
About 80% hosts run fewer than 2 applications in our study
More details in the paper
RelSamp design
Three-stage sampling process consisting of host, flow, and
packet selection stages
Host stage: hash-based sampling
Flow and packet stages: random packet sampling
No state maintained on a per-application basis
Many application sessions for a given host are possibly sampled
Change hash function periodically to track different hosts
Controls fraction of flows sampled in an application session
and packets sampled in a flow
Post processing: Can separate flow records into
application sessions using port-based/statistical classifiers
RelSamp architecture
2
1 Copy
1
2
Ph = selection range / hash space
1
Host-level
bias stage
Ph
Flow Memory
Selection range
H(SrcIP)
Hash space
Flow-level
Pf bias stage
if ( random no. ≤ Pf && no flow record)
create a flow record
Pkt-level
Pp bias stage
if ( random no. ≤ Pp && flow record)
update the flow record
Tunable
parameters
Exploring parametric space
Router sampling budget Pe = f(Ph, Pf, Pp)
Trade-off between accuracy of flow statistics and #
flows/application session
Parameters can be tuned depending on
Objective
Network environment
Examples of tuning parameters by objective
Application classification: low Ph, high Pf, low Pp
Application characterization: lower Ph, high Pf, high Pp
Flow statistics of all flows: Ph = Pf = Pp = Pe
Evaluation goals
Application characterization
Question 1: Is RelSamp effective for sampling more # of flows
in an application session?
Question 2: Can RelSamp estimate statistics of an application
session?
Application classification
Questions 3: Is sampling more # flows in an application session
beneficial for application classification?
Experimental setup
Evaluation of effectiveness for capturing more flows
Trace 1: 1 hour packet trace collected at an edge
RelSamp configuration (other settings in paper): Capture more
flows of app session from many hosts
𝑝ℎ = 0.03, 𝑝𝑓 = 0.76, 𝑝𝑝 = 0.0001 (𝑝𝑒 ≈ 0.001)
Evaluation of application classification accuracy
Trace 2: 13-hour full-payload trace captured at a dorm network
RelSamp setting: Similar setting, but 𝑝𝑓 varies from 0.1 to 1.0
Classifiers: BLINC [SIGCOMM ’05] , SVM, and C4.5
Ground truth is obtained using DPI-based classifier (tstat)
CDF
Flows per application session
More # of flows
per app session
#captured flows/#total flows in an app session
Accuracy (%)
Accuracy of BLINC classifier
~ 50% increase
Sampling rate
Note: classification results on flows using non-standard port
Related work
Flow Sampling [ToN ’06]
Flow Slices [IMC ’05]
Focuses on controlling router resources (CPU and memory)
cSamp [NSDI ’08]
Samples flows once flow record is created
Supports sampling of all traffic by coordinating various vantage
points in a network
FlexSample [IMC ’08]
Support monitoring of traffic subpopulations, but needs to
maintain extra states for approximate checking of predicates
Summary
Introduced the notion of related sampling
Proposed RelSamp architecture
Samples more number of related flows in the same application
session with higher probability
Preserve application structure in sampled flow records
Effective to preserving application session structure
5-10x more flows per application session compared to
Sampled NetFlow
Up to 50% higher classification accuracy than Sampled
NetFlow
Thank you! Questions?
Evaluation method of classification
techniques
Tstat
Ground
Truth
RelSamp
Flow
Record1
Sampled
NetFlow
Flow
Record2
Flow
Sampling
Flow
Record3
Classification Algorithm
(e.g., BLINC, SVM, C4.5)
Packet
Trace
DPI-based
Classifier
Report
# of accurately classified flows
Comparison with other solutions using
BLINC
Sampling rate
Note: classification results on flows using non-standard port