The SmartNIC - CommNet@USF - University of South Florida

Download Report

Transcript The SmartNIC - CommNet@USF - University of South Florida

A Power Management Proxy with a New Best-of-N
Bloom Filter Design to Reduce False Positives
Miguel Jimeno
Ken Christensen
Department of Computer Science and Engineering
University of South Florida
Tampa, FL 33620
{mjimeno, christen}@cse.usf.edu
Computer Networks Seminar
Spring 2007
Outline







2
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Introduction




The internet consumes 2% of all the electricity
consumed in the US.[1]
An average PC consumes 120 W when fully
powered-on.[10]
PCs could add 10% to the typical US residential
consumption.
P2P Applications make the PC remain “on the
net” all the time, (they are idle 99% of the time)
[1]K. Kawamoto, J. Koomey, B. Nordman, R. Brown, M. Piette, M. Ting, and A. Meier, “Electricity Used by
Office Equipment and Network Equipment in the U.S.: Detailed Report and Appendices,” Technical
Report LBNL-45917, Energy Analysis Department, Lawrence Berkeley National Laboratory, 2001.
3
Computer Networks Seminar
Spring 2007
Introduction



Can a P2P application can be run in small, lowpower microcontroller?
The PC could then be power managed.
The microcontroller can’t store large list of file
names.
Bloom Filters:
 Bloom filters are a well known probabilistic data
structure for representing a list of file name
strings.
4
Computer Networks Seminar
Spring 2007
Introduction
Bloom Filters:
 A group of hash functions are used to map
elements into an array of bits.

False negatives are not
possible, but there is a
probability of generating false
positives.
s
Pr[ false positive]   
m
k
where m = size of the Bloom filter in bits,
k = number of hash functions used to calculate
a Bloom filter, and s = number of bits set.
Figure 1. Bloom filter of size m
bits, and k = 4 hash functions.
Image Taken from [9]
5
Computer Networks Seminar
Spring 2007
Background



Bloom filters were first proposed by Bloom [2]
Kirsch et. al. proposed a way to calculate bloom
filter with less hashing [7]
Lumetta et. al. used the Power of Two Choices to
calculate the bloom filter [8]
[2] B. Bloom, “Space/Time Tradeoffs in Hash Coding with Allowable Errors,”
Communications of the ACM, Vol. 13, No. 7, pp. 422-426, 1970.
6
Computer Networks Seminar
Spring 2007
Outline







7
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Research Problem


8
We investigated new methods for reducing the
probability of false positives for a Bloom filter for
fixed m and n.
The target is the implementation of this structure
in a power management proxy.
Computer Networks Seminar
Spring 2007
Outline







9
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
The SmartNIC




NICs support up to MAC layer, but can’t respond
to higher-layer packets.
A PC needs to be fully powered-on in order to
respond to packets.
Applications like P2P file sharing require the PC
to be fully powered-on all the time.
To manage power in PCs running P2P
applications:
- We are studying the idea of using small controller to
proxy for a sleeping PC.
10
Computer Networks Seminar
Spring 2007
The SmartNIC


This proxy will be able to maintain P2P TCP
connections and respond to query messages.
We are exploring locating the controller on the
NIC, so it’s a “SmartNIC”.
PC fully on
PC in sleep
Traffic flow
from/to PC
Traffic flow
from/to NIC
Internet
NIC (internal to PC)
Internet
Proxying is enabled
(a)
(b)
Figure 2. The SmartNIC with proxy capability
11
Computer Networks Seminar
Spring 2007
Outline







12
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
The New Design: Best-of-N method


Best-of-N method: N
instances of a Bloom
filter are generated and
the instance with the
least number of bits set
to 1 is selected.
The “winner” hash group
is used to test the bloom
filter.
Set of
Strings
Strings to be
inserted
N times
Generate a
Bloom filter
instance
New
next seeded
instance
group of hash
created
functions
Select instance
if number of
bits set is the
smallest
Final
Instance
1)
13
What improvement in
Pr[false positive] can be
achieved?
Bloom
Filter
Figure 3: Best-of-N Method for Bloom filter
2) What is the computational
cost to generate the
filter?
Computer
Networks Seminar
Spring 2007
The New Design: Best-of-N method


14
In order to compute N instances quickly, we
developed a new pseudo-hashing method called
“RNG hashing”.
This method, based on a Random Number
Generator, generates multiple hashes from one
initial “seed” hash.
Computer Networks Seminar
Spring 2007
Outline







15
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Analysis of Best-of-N Method



We define S to be the random variable for the
number of bits set in a Bloom filter.
Using order statistics we can determine the
distribution of the minimum value of the
independent samples S1, S2, …, SN (selected as
Best-of-N).
For order statistics, if f(s) and F(s) are known,
then
Smin  S( 1 )  min( S1 , S2 ,..., S N )
16
Computer Networks Seminar
Spring 2007
Analysis of Best-of-N Method

For a continuous distribution,
f min ( s )  N ( 1  F( s ))N 1 f ( s )


The mean can be computed as
E [ Smin ] 
 sf
min
( s )ds


Based on heuristic and empirical evidence, the
distribution of S appears to be close to normal.
Now we have that
 s   
 1  
f min ( s )   N 1  N  erfc 
 
2  
  2 

N 1
 s   
 1

e 2
  2

2




where μ=E[S] and σ= σ[S]. We know that
kn
 

1


E [ S ]  m 1  1   
  m 


17
2
Computer Networks Seminar
Spring 2007
Analysis of Best-of-N Method
kn
kn
kn
 m 1
m2
2 m  2 
2  m 1 
 [ S ]  m
 m 
  m
 m 

 m 
 m 
 m 
 m 

We derive

The probability of false positive for our method
is then:
2
 E [ S min ] 
Pr[ false positive]  

m


k
where E[Smin] is computed by substituting
above.
18
Computer Networks Seminar
Spring 2007
kn
Outline







19
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Numerical Results

For a given m and n where k is chosen
optimally, we study the probability of false
positive as a function of N.
Improvement factor
1.30
30%
m/n = 64
1.25
1.20
m/n = 32
1.15
m/n = 16
1.10
m/n = 8
1.05
1.00
0
10
20
30 40 50 60 70
Number of instances (N )
80
90 100
Figure 4. Improvement factor for various m /n
20
Computer Networks Seminar
Spring 2007
5.0E-04
2.4E-07
4.8E-04
2.2E-07
Pr[false positive]
Pr[false positive]
Numerical Results
4.6E-04
4.4E-04
4.2E-04
2.0E-07
1.8E-07
1.6E-07
1.4E-07
4.0E-04
0
10
20
30 40 50 60 70 80
Number of instances (N )
90 100
Figure 5. Probability of false positive for m /n = 16
0
10
20
30 40 50 60 70 80
Number of instances (N )
90 100
Figure 6. Probability of false positive for m /n = 32
For Figure 5, n = 1000 and m = 16,000. For Figure 6, same n,
but m = 32,000
21
Computer Networks Seminar
Spring 2007
Outline







22
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Experiments Evaluation

Environment
- Dell OptiPlex GX620 PC (Pentium4, 3.4 Ghz, 2 MBytes
cache) with 1 GByte RAM.
- WindowsXP, gcc compiler (version 3.4.2 mingw-special
from Dev C++.
- A list of 25,000 strings of unique music file names was
obtained using Bearshare 5.2.

Response Variables
- Probability of false positive for the Bloom filter.
- Execution time to generate a Bloom filter.
23
Computer Networks Seminar
Spring 2007
Experiments Evaluation

Control variables
- Hashing method used.
• CRC32, Md5, RNG Method, Kirsch Method
- Bloom filter parameters m, n, and k.
- Best-of-N parameter N.
- Number of strings used in the string test set.

Experiments Description
- False Positive Exp 1: Vary N, measure Prob. of False
Positive.
- False Positive Exp 2: Vary N, measure False Pos.
- Run-time experiment: Collect CPU time for each N.
24
Computer Networks Seminar
Spring 2007
Experiments Evaluation
4.6E-04
6.0E-01
Pr[false positive]
4.5E-04
4.4E-04
4.3E-04
4.2E-04
4.1E-04
0
10
20 30 40 50 60 70
Number of instances (N )
80
25
Kirsch
and
RNG
3.0E-01
2.0E-01
1.0E-01
0
90 100
Figure 7. Results from false positive experiment #1

4.0E-01
0.0E+00
4.0E-04

MD5
CRC32
RNG method
Kirsch
5.0E-01
CPU Time (sec)
Analysis
MD5
CRC32
RNG method
Kirsch
10
20
30 40 50 60 70
Number of instances (N )
80
90 100
Figure 8. Results from run time experiment
The experimental results for probability of false
positive perfectly agree with the analysis.
CPU time results of RNG method were as good as
Kirsch method, and better than CRC32.
Computer Networks Seminar
Spring 2007
Outline







26
Introduction & Background
Research Problem
The SmartNIC
The new Design: Best-of-N Method
Analysis of Best-of-N Method
Numerical Results & Experiments Evaluation
Summary & Future Work
Computer Networks Seminar
Spring 2007
Summary & Future Work

Two Improvements to Bloom filters
- A new Best-of-N method that reduces the probability of
false positive by generating N instances of a Bloom
filter and selecting the best one.
- A new RNG hashing method that generates pseudo
hashes given a single seed hash.


27
Bloom filters could be implemented in a power
management proxy for P2P applications.
Savings of up to 85 Mill. could be obtained if 25%
of PCs running P2P applications use SmartNICs.
Computer Networks Seminar
Spring 2007
References
3.
4.
5.
6.
7.
8.
9.
10.
28
A. Broder and M. Mitzenmacher, “Network Applications of Bloom Filters: A Survey,”
Internet Mathematics, Vol. 1, No. 4, pp. 485-509, 2005.
Energy Information Administration, “U.S Household Electricity Report,” July 2005.
Available: http://www.eia.doe.gov/emeu/reps/enduse/er01_us.html.
L. Fan, P. Cao, and J. Almeida, “Bloom Filters - The Math,” 2000. Available:
http://www.cs.wisc.edu/~cao/ papers/summary-cache/node8.html.
A. Kirsch and M. Mitzenmacher, “Less Hashing, Same Performance: Building a
Better Bloom Filter,” Technical Report TR-02-5, Computer Science Group, Harvard
University, 2005.
S. Lumetta and M. Mitzenmacher, “Using the Power of Two Choices to Improve
Bloom Filters,” unpublished, 2006. Available:
http://www.eecs.harvard.edu/~michaelm/ postscripts/bftwo.ps.
A. Pagh, R. Pagh, and S. Rao, “An Optimal Bloom Filter Replacement,” Proceedings
of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 823-829,
2005.
http://www.cs.wisc.edu/~cao/papers/summary-cache/node8.html
US Department of Energy, Energy Efficiency and Renewable Energy, “Estimating
Appliance and Home Electronic Energy Use,” 2005. Available:
http://www.eere.energy.gov/consumer/your_home/appliances/index.cfm/mytopic=10
040.
Computer Networks Seminar
Spring 2007
Thanks!
I’ll be happy to answer any questions.
29
Computer Networks Seminar
Spring 2007