Queuing Cache - Washington University in St. Louis

Download Report

Transcript Queuing Cache - Washington University in St. Louis

New Directions in Traffic
Measurement and Accounting
Cristian Estan – UCSD
George Varghese - UCSD
Reviewed by
Michela Becchi
Discussion Leaders
Andrew Levine
Jeff Mitchell
Outline

Introduction

Cisco NetFlow

Sample and Hold & Multistage Filters

Analytical Evaluation

Comparison

Measurements

Conclusions
Michela Becchi - 7/8/2015
Introduction

Measuring and monitoring of network traffic for
Internet Backbones
» Long term traffic engineering (traffic rerouting and link
upgrade)
» Short term monitoring (hot spots and DOS attacks’
detection)
» Accounting (usage based pricing)

Scalability problem
» FixWest, MCI traces: ~million flows/hour between end host
pairs
Michela Becchi - 7/8/2015
Cisco NetFlow

Flow: unidirectional stream of data identified by
» Source IP address and port
» Destination IP address and port
» Protocol
» TOS byte
» Rx router interface

An entry in DRAM for each flow

Heuristics for end-of-flow detection

Flow data exported via UDP packets from routers
to collection server for processing
Michela Becchi - 7/8/2015
Cisco NetFlow - problems

Processing overhead
» Interfaces faster then OC3 (155Mbps) slowed down by
memory cache updates

Collection overhead
» Collection server
» Network connection

NetFlow Aggregation (based on IP prefixes, ASes,
ports)
» Extra “aggregation” cache
» Only aggregated data exported to collection server
» PB: High amount of aggregates
Michela Becchi - 7/8/2015
Sampled NetFlow

Sampling packets

Per flow information based on samples

Problems:
» Inaccurate (sampling and losses)
» Memory Intensive
» Slow (DRAM needed)
Michela Becchi - 7/8/2015
Idea

“A small percentage of flows accounts for a large
percentage of the traffic”
» Algorithms for identifying large flows

Use of SRAM instead of DRAM

Categorize algorithms depending on:
1.
2.
3.
4.
Memory size and memory references
False negatives
False positives
Expected error in traffic estimates
Michela Becchi - 7/8/2015
Algorithms

Sample and Hold
» Sample to determine flows to consider
» Update flow entry for every subsequent packet belonging
to the flow

Multistage Filters
» Use multiple tables of counters (stages) indexed by a
hash function computed on flow ID
» Different stages have independent hash functions
» For each packet and for each stage, compute hash on flow
ID and add the packet size to corresponding counter
» Consider counters in all stages for addition of packets to
flow memory
Michela Becchi - 7/8/2015
Sample and Hold
Sampled Packet (probability=1/3)
Entry created
Flow Memory
Entry updated
F1 312
F3 21
F1
F1
F2
F3
F2
Transmitted Packets
Michela Becchi - 7/8/2015
F4
F1
F3
F1
Multistage Filters
flow memory
Array of
counters
Hash(Pink)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Array of
counters
Hash(Green)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Array of
counters
Hash(Green)
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
Michela Becchi - 7/8/2015
Multistage Filters
Collisions
are OK
Michela Becchi - 7/8/2015
flow memory
Multistage Filters
Reached
threshold
flow memory
stream1 1
Insert
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
stream1 1
Michela Becchi - 7/8/2015
Multistage Filters
flow memory
stream1 1
stream2 1
Michela Becchi - 7/8/2015
Multistage Filters
Stage 1
flow memory
stream1 1
Stage 2
Michela Becchi - 7/8/2015
Parallel vs. Serial Multistage Filters

Threshold for serial filters: T/d (d = number of
stages)

Parallel filters perform better on traces of actual
traffic
Michela Becchi - 7/8/2015
Optimizations

Preserving entries
» Nearly exact measurement of long lived large flows
» Bigger flow memory required

Early removal
» Definition of a threshold R < T to determine which entries
added in the current interval to keep

Shielding
» Avoid to update counters for flows already in flow memory
» Reduction of false positives

Conservative update of the counters
» Update normally only the smallest counter
» No introduction of false negatives
» Reduction of false positive
Michela Becchi - 7/8/2015
Conservative update of counters
Gray = all prior packets
Michela Becchi - 7/8/2015
Conservative update of counters
Redundant
Redundant
Michela Becchi - 7/8/2015
Conservative update of counters
Michela Becchi - 7/8/2015
Analytical Evaluation

Sample and Hold
» Prob.(false negatives): (1-p)^T ~ e^(-O)
» Best estimate for flow size s: c+1/p
» Upper bound for flow memory size: O*C/T
– Preserving entries: 2O*C/T
– Early removal: O*C/T+C/R

Parallel Multistage Filters
» No false negatives
» Prob(false positives): f(1/k)^d
» Upper bound for flow size estimate error: f(T,1/k)
» Bound on memory requirement
Where
T: threshold, p:sample prob (O/T), c: number of bytes counted for flow,
C: link capacity, O: oversampling factor, d: filter depth,
k: stage strength (T*b/C)
Michela Becchi - 7/8/2015
Comparison w/ Memory Constraint

Assumptions:
» Memory Constraint M
» The considered flow produces traffic zC (e.g. z=0.01)

Observations and Conclusions:
» Mz ~ oversampling factor
» S&H and MF better accuracy but more memory accesses
» S&H and MF through SRAM, SNetflow through DRAM, as
long as x is larger than the ratio of a DRAM memory access
to an SRAM memory access
Michela Becchi - 7/8/2015
Comparison w/o Mem Constraint

Observations and Conclusions:
» Through preserving of entries, S&H and MF provide exact
estimation for long-lived large flows
» S&H and MF gain in accuracy by losing in memory bound
(u=zC/T)
» Memory access as in case of constrained memory
» S&H provides better accuracy for small measurement
intervals => faster detection of new large flows
» Increase in memory size => greater resource consumption
Michela Becchi - 7/8/2015
Dynamic threshold adaption

How to dimension the algorithms
» Conservative bounds vs. accuracy
» Missing a priori knowledge of flow distribution

Dynamical adaptation
» Keep decreasing the threshold below the conservative
estimate until the flow memory is nearly full
» “Target usage” of memory
» “Adjustment ratio” of threshold
» For stability purposes, adjustments made across 3
intervals

Netflow: fixed sampling rate
Michela Becchi - 7/8/2015
Measurement setup

3 unidirectional traces of
Internet traffic

3 flow definitions

Traces are between 13%
and 17% of link
capacities
Michela Becchi - 7/8/2015
Measurements

S&H (threshold 0.025%, oversampling 4)

MF (strength=3)
Michela Becchi - 7/8/2015

Differences between
analytical bounds and actual
behavior (lightly loaded
links)

Effect of preserving entries
and early removal
Measurements



Flow IDs: 5-tuple

MF always better than
S&H

SNetflow better for
medium flows, worse for
very large ones

AS: reduced number of
flows (~entries in flow
memory).
Flow IDs: destination IP
Flow IDs: ASes
Michela Becchi - 7/8/2015
Conclusions

Focus on identifying large flows which creates the
majority of network traffic

Proposal of two techniques
» Providing higher accuracy than Sampled Netflow
» Using limited memory resource (SRAM)

Mechanism to make the algorithms adaptable

Analytical Evaluation providing theoretical bounds

Experimental measurements showing the validity
of the proposed algorithms
Michela Becchi - 7/8/2015
Future works

Generalize algorithms to automatically extract flow
definitions for large flows

Deepen analysis, especially to cover discrepancy
between theory and experimental measurements

Explore the commonalities with other research
areas (e.g.: data mining, architecture, compilers)
where issues related to data volume and high
speed also hold
Michela Becchi - 7/8/2015
The End

Questions?
Michela Becchi - 7/8/2015
Zipf distribution

Characteristics:
» Few data “score” very high
» A medium number of elements have “medium score”
» Huge number of elements “score” very low

Examples
» Use of words in a natural language
» Web use (e.g.: www.sun.com website accesses)
Michela Becchi - 7/8/2015