Slides - INL: IP Networking Lab

Download Report

Transcript Slides - INL: IP Networking Lab

Bloom Filters
Benoit Donnet
November 30th, 2006
1
Context
• Introduced in 1970 ([bloom])
Set membership problem
- Trade-off between space and computing
complexity
- Lossy summary technique
Historical usage
Spell checking ([McIlroy])
Database ([Bratbergsengen])
-
•
2
Content
•
•
•
•
•
Bloom filters
Extensions
Networking applications
Conclusion
References
3
Bloom filters
4
Construction
5
Membership Query
6
False Positive
• A Bloom filter can suffer of false positives
-
•
The filter returns a positive answer for
some elements that do not belong to A
Can we evaluate a priori the impact of
false positives on a Bloom filter?
7
False Positive (2)
8
False Positive (3)
9
False Positive (4)
10
Extensions
11
Content
•
•
•
•
Compressed Bloom filters ([mitzenmacher])
Counting Bloom filters ([Fan et al.])
Dynamic Bloom filters ([Guo et al.])
Retouched Bloom filters ([Donnet et al.])
12
Compressed BF
• A Bloom filter can be used a message
exchanged by networked monitors
• New performance metric
- Bandwidth
• Transmission size can be affected by
compression
• Compressed Bloom filters ([mitzenmacher])
13
Compressed BF (2)
• Positive aspects:
Quantity of bit exchanged reduced
- False positive rate reduced
- Amount of computation per query
reduced
Cost:
- Internal memory increased
- Compression/decompression process
-
•
14
Compressed BF (3)
15
Counting BF
• The subset A is changing over time
Insertion
- Deletion
• How to perform deletion?
- Couting Bloom filters ([Fan et al.])
-
16
Counting BF (2)
17
Counting BF (3)
• Which size for the counter?
4 bits per counter are OK for most of the
applications
• What happens in case of an overflow?
-
18
Dynamic BF
• Statement:
During the execution of the application,
|A| can exceed its orignal size n
• Consequence:
- The false positive rate is not maintained
anymore
• Solution?
- Dynamic Bloom Filters ([Guo et al.])
-
19
Dynamic BF (2)
• It uses a matrix of s Bloom filters
Each Bloom filter uses m bits and k hash
functions
• It starts with s equals to 1.
• A new Bloom filter (i.e., a new row in the
matrix) is created when needed.
-
20
Dynamic BF (3)
• How to insert an element?
Check for an active Bloom filter
- If there is no active Bloom filter, create a
new one
- Add the element to the Bloom filter
How to query an element?
• If all s Bloom filters return false, the
element does not belong the DBF.
• If, at least, one Bloom filter returns true,
the element probably belongs to the
DBF.
21
-
•
Dynamic BF (4)
22
Retouched BF
• Statement:
Some false positives might be more
troublesome than others
- Some applications might tolerate a
small level of false negatives
• Question:
- Can we trade-off the false positives
against false negatives?
• Solution?
- Retouched Bloom filters ([Donnet et al. 06])
-
23
Retouched BF (2)
24
Retouched BF (3)
• Quid if we randomly reset s bits in the
vector?
- Eliminates the same proportion of false
positives as the proportion of false
negatives generated
- Randomized bit clearing.
• The process of removing selected false
positives is called selective clearing
25
Retouched BF (4)
26
Networking
Applications
27
Distributed Caching
• Proxies cooperate to exchange cache
information
• Instead of sharing URLs list, proxies
broadcast Bloom filters ([Fan et al.])
• A Bloom filter represents a proxy’s cache
content
28
Multicast
• A router maintains, for each multicast
address, a list of associated
interfaces/connections
• Replace the list by a Bloom filter
• Parallelization possible
• Deletion of an address can be achieved
([Grönvall])
with a counting Bloom filter
29
Measurement
• Topology discovery at the IP interface level
• Traceroute monitors exchange information
about what was previously discovered
- Doubletree
• This information shared can be encoded
as a Bloom filter
- Communication cost reduction ([Donnet et al.
05])
30
Conclusion
• A Bloom filter
• solves the set membership problem
• can generate false positives
• Extensions to standard Bloom filter were
presented
• A few networking applications were
discussed
31
•[Bloom]:
References
Space/Time Trade-Offs in Hash Coding with
Allowable Errors. In Communications of the ACM. vol. 13,
n°7.
•[McIlroy]: Development of a Spelling List.
In Transactions
on Communications. vol. 30, n° 1.
•[Mitzenmacher]:
Compressed Bloom
Transactions on Networking. vol. 10, n° 5.
Filters.
In
•[Fan et al.]:
Summary Cache: a Scalable Wide-Area Web
Cache Sharing Protocol. In Transactions on Networking.
vol. 8, n°3.
•[Guo et al.]:
Theory and Network Applications of Dynamic
Bloom Filters. In Proc. INFOCOM 2006.
32
References (2)
•[Bratbergsengen]: Hashing Methods and Relational Algebra
Operations. In Proc. ICVLD 1984.
•[Bruck et al.]: Weighted Bloom Filters. In Proc. ISIT 2006.
•[Donnet et al. 06]: Retouched Bloom Filters: Allowing
Networked Applications to Trade-Off Selected False
Positives Against False Negatives. In Proc. CoNEXT 2006.
•[Grönvall]:
Scalable Multicast Forwarding. In Proc. ACM
SIGCOMM 2001. Student Workshop.
•[Donnet
et al. 05]: Improved Algorithms for Network
Topology Discovery. In Proc. PAM 2005.
33