Transcript Document

Fast, Memory-Efficient Traffic Estimation
by Coincidence Counting
Fang Hao1, Murali Kodialam1, T. V. Lakshman1, Hui Zhang2,
1Bell Labs, Lucent Technologies
2University of Southern California
Traffic flow measurement
flows
packets
Flow rate
statistics
router
network link
 Related work:
•Sample and hold [Estan et al. 2002], Smart Sampling [Duffield et al. 2004],
RATE [Kodialam et al. 2004], ACCEL-RATE[Hao et al. 2004], etc.
 Flow rate pf : the proportion of packets belonging to flow f during a
certain time period.
•Arrival rate rf = pf   :  - total arrival rate (packets / second)
2
Problem definition
•
What’s the traffic flow composition through a network link
during a certain time period, given an estimation accuracy
requirement (, )?
 - confidence percentile;
 - estimation error.
•
For any given flow f with its rate be pf, determine an estimate p’f
for pf such that
p’f  (pf - /2, pf +/2)
with probability greater than .
 e.g.,  = 0.9975,  = 0.0001.
3
One solution – naive counting
•
Directly counting the number of packets for each flow.
It’s simple;
It estimates the rate of the flows rapidly;
But it requires frequent access of a large amount of high-speed memory.
•There can be millions of active flows on backbone links [Duffield et al. 2001]
4
Arrival model
1.
Flow rates are stationary during the estimation period.

An arriving packet belongs to flow f with probability pf.
2. Packet arrivals are independent.

An arriving packet belongs to flow f independent of other
arrivals.
5
Performance metrics
•
Sample size (estimation time): the number of arrivals needed
to achieve the specified estimation accuracy.
•
Memory size: the number of flows that are tracked during the
estimation.
6
Can we …
•
By counting the exact number of arrivals for each flow, naïve
2
Z

accounting requires minimally 2 arrivals to meet the accuracy

requirement (, ).
 This will capture at least one packet in expectation for any flow f with
2
2
pf   (Z=3,  = 0.0001,  2 ??).
2
Z 2
•
2
!
Z2 2
ZZ
Can we design a scheme which runs as fast as naïve counting
but only catches “interesting” flows?
7
Problem re-definition
•
What’s the traffic flow composition through a network link
during a certain time period given an estimation accuracy
requirement (, , , )?
  - confidence percentile;  - estimation error;
  - threshold rate;
•
 - error relaxation factor.
For any given flow f with its rate be pf, determine an estimate p’f
for pf such that
p’f (pf - /2, pf +/2)
if pf  
p’f (pf - /2, pf + /2)
if pf  
with probability greater than .
 e.g.,  = 0.9975 (Z=3),  = 0.0001,  = 0.01,  = 10.
8
Our solution – CATE
•
Coincidence bAsed Traffic Estimation
 It’s simple;
 It estimates the size of the flows rapidly;
 It requires a small amount of memory.
•
A generalization of RATE scheme [Kodialam et al. 2004]
9
CATE – scheme description (I)
k-length
10
CATE – scheme description (II)
•
Estimation procedure
1.
Specify the estimation accuracy requirement (, , , );
2.
Calculate sample size N, memory size M, and k (the length of PT);
3.
Run and count the coincidences CC(f) for each flow f;
4.
At the end of the estimation, output the estimated flow rates as
a)
b)
cc( f ) ;
N K
For the rest of the flows, report 0 as the estimated rates.
For each flow f in CCT, p’(f) =
11
Intuition behind CATE
•
Counting coincidence dramatically amplifies (squares) the ratio of
catching probability between a large flow and a small flow.
Z2
 Good news: CATE sample size is still  ( 2 ) for the estimation accuracy

requirement (, , , ).
•
Multiple (i.e., k) comparisons increase the number of membership
testing to kN with N arrivals.
Good news: the reduction on estimation variance due to increase in testing
number is no less than the increase on estimation variance due to comparison
correlation.
12
CATE – sample size
• Given the accuracy requirement (, ) and k-length
predecessor table for CATE,
 The minimal sample size N 
Z
2
when k  1
2
 Specifically, for a flow f with pf 
1
2k  1
, the minimal sample size
2
3Z
N f  2
k
13
Theorem 1 – CATE sample size
•
Given an estimation accuracy requirement (, , , ), if
1
 1)2
 
2
3(  1)

(
then setting
k
1 1
(  1)
2 
gives
the sample size =
 = 99.9 %
 = 0.0001
 = 0.01   = 10  k = 50
 = 0.001   = 20  k = 500
…
3Z2
k 2
14
Theorem 2 – CATE memory size
•
Given the estimation setting in Theorem 1,
the maximum expected memory  1.11Z

Totally 100,000 flows, rate range [10-6, 1]
Z = 3
= 0.002
CATE will catch no more than 1650 flows.
Naïve counting will record all flows w.h.p.
15
CATE – experiment 1
•
Real IP traces from NLANR;
•
A size-1000 buffer between new arrival and PT.
(The memory size = 547)
(The memory size = 1464)
16
CATE – experiment 2 (I)
• methodology
Totally 1 million flows, synthetic traces;
5 “large” flows, rates uniformly distributed between 0.1 and 0.2 of the
entire traffic;
1000 “medium-sized” flows, rates uniformly distributed between
0.0001 and 0.0002 of the entire traffic;
All the rest are “small” flows, each with rate roughly 10-7;
Deliberately chose a short sample time (1 million packet time) to
illustrate the impact of k, the predecessor table length.
17
CATE – experiment 2 (II)
18
CATE – experiment 2 (III)
19
Conclusion & future work
•
CATE, a memory efficient traffic estimation scheme as fast as
naïve counting.
•
Future work:
 Extending CATE for byte rate estimation.
 Extending CATE to minimize the impact of arrival dependence
without (excessive) additional overhead.
20