slides in ppt - Computer Engineering

Download Report

Transcript slides in ppt - Computer Engineering

Detection of “Hot Spots”
Paper Title: Joint Data Streaming and Sampling Techniques for
Detection of Super Sources and Destinations
Liang,Chao
Polytechnic University,ECE Department
1
Motivation
 “Hot spots” in the Internet
– Super Source (large fan-out)
• Infected hosts by worm (Slammer worm)
– Super Destination (large fan-in)
• DDoS victim
 Internet attacks increasing in severity
– Network security monitoring
 Challenges
• High packets arrival rate
• Speed requirement of RAM (DRAM vs SRAM)
• Impractical per-flow state maintenance
Polytechnic University,ECE Department
2
How to find the needle in the haystack
flow 1
flow 2
flow 3
 IP Flow
–
–
Abstraction: set of packets identified with same
address, ports, etc.
Flow label: Source-destination pair <pkt.src,pkt.dst>
 General Problem: Heavy distinct-hitters
–
–
Given a stream of flow label <pkt.src,pkt.dst> pairs,
find all the src that are paired with a large number of
distinct destination.
Detect super destination: Reverse the flow label
Polytechnic University,ECE Department
3
Weapons
 Previous Techniques
–
–
–
–
–
Flow state maintenance
Probabilistic counting
Bloom Filters
Multi-resolute bitmap
……
 This paper
 Sampling
 Network Data streaming
Polytechnic University,ECE Department
4
Paper
 Qi Zhao, Abhishek Kumar, Jun Xu, “Joint Data
Streaming and Sampling Techniques for Detection
of Super Sources and Destinations”, IMC 2005
Polytechnic University,ECE Department
5
Outline of the rest of the talk
 Introduction of one previous work
–
Traditional hash-based flow sampling
 Main approach
– Simple scheme
– Advanced scheme
 Evaluation
 Summary
Polytechnic University,ECE Department
6
Traditional hash-based flow sampling
 Flow sampling
–
Sample flows with a certain percentage p
•
•
Hash function maps flow label to a value uniformly distributed in
[0,1)
H (flow label)<p, then sample the flow
 Hash Table
–
HT1.Detect and discard duplicate ones
–
HT2.Count flow numbers
•
•
Access the element with index by hashing flow label
Element: list of flow label pairs
•
•
Access the element with index by hashing srcIP
Element: list of <srcIP,count> pairs
Polytechnic University,ECE Department
7
Traditional hash-based flow sampling
 Fan-out Calculation
–
–
Threshold Judge to report the super source
Estimation to compensate sampling
Ē=E*(1/p)
 Performance Analysis
–
Key Ineffective Reason - Low sampling rate
–
Performance bottleneck: Query of the first hash table
–
Result
•
•
•
•
The update cost of hash tables (In DRAM)
Elephant flows influence
The sampling rate p<< Hs / Tr
–
–
Hs: operating speed of hash table
Tr: arrival rate of traffic
p is too slow!
Estimation error scale by 1/p
Polytechnic University,ECE Department
8
Contribution of this paper
 Network Data Streaming
–
–
–
Process each and every incoming packet in real-time
Employ a small and fast memory
Maintain only the most pertinent information
 Two schemes
–
–
Simple scheme : filtering after sampling
Advanced scheme : separation of counting and
identity gathering
Include more information
Polytechnic University,ECE Department
9
Simple Scheme System
 Filtering after sampling System
 Data Streaming module
–
–
Replace the hash table
Final goal: improve the sampling rate
Polytechnic University,ECE Department
10
Simple Scheme – Data Streaming Module
 How to realize
– Employing bit array to label new flow
• Bit array G: w bits
• Hash function: maps to a value uniformly distributed in
[1,w]
– Employ SRAM (static random access memory)
packet
0
flow label
1
2
H( )
0
1
i
Polytechnic University,ECE Department
w-1
11
Simple Scheme - Estimation
 Hash collision in data streaming
–
–
Different flows have same index of G
Miss the update of the hash table
 Compensation of the collision
–
when the ith new flow arrival
–
Compensate the hash collisions by adding w/u
•
•
•
Variable u: to keep track of the number of “0” in G
Variable i : hash result of the new flow
P(G[i]=0) = u/w
 Unbiased Estimation of count
–
Hash table updated by K flows
Polytechnic University,ECE Department
12
Simple Scheme - Algorithm
Compensation
Calculation
Polytechnic University,ECE Department
13
Simple Scheme - Analysis
 Unbiased estimator of fan-out
 Saturation Avoidance
Number of ‘0’ element
Probability to be recorded
– Minimum of ‘0’ element typically set around w/2 (half full)
– Two sets of arrays and hash tables operated alternatively
 Sampling rate improved
–
Affordable SRAM
•
–
Little memory consumption to support high speed links
Streaming speed
•
•
•
Bottleneck!
Poisson alike update times of the hash table
Efficient hardware implementation of hash function
All operations in data streaming module can be finished in about 10ns
Polytechnic University,ECE Department
14
Advanced Scheme - System
Record flow
information
to array
in real-time
Record source
identity
(e.g.. source IP)
Use the source
identity(2) to look up
the array(1) to
estimate offline
Polytechnic University,ECE Department
15
Advanced Scheme – Streaming algorithm
 2D bit array A(m,n)
 Four hash functions
– One to get row number (range [1,m])
– Three to get column number (range [1,n])
this case k=3
Polytechnic University,ECE Department
16
Advanced Scheme – Streaming module
Row collision
Column collision
Why k=3?
Polytechnic University,ECE Department
17
The Linear-Time probabilistic counting algorithm

Idea from Database field: counting the number of unique
values in the presence of duplicates

Estimation of distinct flow number
–
–
–
–
m : column size
n : total number of flow
Aj : the jth element of column
Un: the number of element whose value is “0”
Polytechnic University,ECE Department
j
18
Joint relation calculation

The distinct values in the
join of two relations
–
–

AB=A+B-AUB
A->G1 B->G2
Estimate them by linear
counting D based on G
–
AB=D(G1)+D(G2)D(G1UG2)
Note: Cannot directly
calculate G1G2 cause
different space
AпB
G1пG2
AUB
G1UG2
Polytechnic University,ECE Department
19
Advanced Scheme – Estimation module
 Computing the join selectivity in three columns(k=3)
–
U: Bitwise-OR
 Avoid two sources both hashed to the same k
columns
–
–
S: total number distinct sources
n: column number
–
The probability of collision drop to 0.002
– When n=16,000, S=100,000, k=3
Polytechnic University,ECE Department
20
Advanced Scheme – Identity module
 Purpose
– Capture the identities of potential super sources
– Write data into DRAM in real-time
 Identity collection
– Estimate the corresponding fan-out as input data
 Why DRAM?
– Replace expensive hash table operation
– Sequential writes can be very fast
• 100% and 25% recording for OC-192 and OC-768
Polytechnic University,ECE Department
21
Evaluation
 Real internet traffic traces
– UNC(1 Gbps),USC,NLANR(IPKS+,IPKS-)(OC192 link)
Polytechnic University,ECE Department
22
Evaluation-Simple Scheme
 [UNC] Sampling rate:1/4
–
Area1:false positives
Bit array size:128Kb
Area II: false negative
Polytechnic University,ECE Department
23
Evaluation-Advanced Scheme

[UNC]2D Bit array A: 128KB(64*16,384)
Polytechnic University,ECE Department
sampling rate:1
24
Estimation Accuracy
Polytechnic University,ECE Department
25
Summary
 Monitoring at high speed is challenging
 Network Data Streaming
– Keep up with the line speed
– Include more pertinent information
 Employ other fields achievements
Polytechnic University,ECE Department
26
Q&A
Polytechnic University,ECE Department
27