Transcript Slide 1
ECE 526 – Network
Processing Systems Design
Network Security: string matching algorithm
Chapter 17: George Varghese
Goal
• Gain basic knowledge to improve network security from
network processing system design perspective
Ning Weng
ECE 526
2
Outline
• Signature-based IDSs
• String matching algorithms
─
─
─
─
─
Boyer-Moore
Aho-Corasic
Bloom Filter
Approximated Searching
Approximated Searching Based on Bloom Filters
• Summary
Ning Weng
ECE 526
3
Internet Security
• Internet lacking of security
─ Example?
• What is Internet Security
─ Confidentiality: data keeping private
─ Integrity: protected from modification or destruction
─ Availability: data or service accessible
• What are current approaches
─ Engineering?
─ non-engineering?
─ Intrusion Detection Systems (IDSs)
Ning Weng
ECE 526
4
Intrusion Detection Systems
• Two types of Intrusion Detection Systems (IDSs)
─ Signature detection: based on matching events to the signatures of
known attacks
─ Anomaly detection: based on statistical or learning theory to identify
aberrant events
• Three important tasks
─ String matching: searching suspicious strings in packet payloads
─ Traceback: to detect intruder who uses forged source address
─ Detect onset of new worm without prior knowledge
• The problems of current IDSs
─ Very slow
─ Have a high false-positive rate
─ false positive: answering membership query positively when member is
not in the set
Ning Weng
ECE 526
5
Snort Rule Example
• Snort:
─ one of lightweight detection system, open source
─ www.snort.org
• Snort rule example:
Alert tcp $BAD 80 -> $GOOD 90 \
(content: “perl.exe”; msg: “detected perl.exe”;)
─ Looking for string “perl.exe” contained in TCP packet from IP: $BAD,
Port: 80 to IP: $GOOD, Port: 90
─ Upon detection, generating alert with “detected perl.exe”
• Question: a packet coming, how to check it?
• Question: how about multiple rules?
• String matching is bottleneck
Ning Weng
ECE 526
6
String Searching: brute force
• Arbitrary string can be anywhere in the packet
• Naive approach
Input: String size: m; packet size: n (assuming n >m)
For i:=0 to n-m do
For j:=0 to m-1 do
Compare string[j] with packet[i+j]
If not equal exit the inner loop
• Complexity:
─ worst case O(m*n)
─ Best case O(n)
• Can we do better?
Ning Weng
ECE 526
7
Boyer-Moore: example
B A R N E
B
A
B
A
B
A
B A
B
A
R ….
(packet payload)
R
• Improving by skipping over a larger number of character
and by comparing last character first
• How to build the ship table?
Ning Weng
ECE 526
8
Boyer Moore: skip table
• How far to skip when the last character does not match.
• For example
─ pattern: CAB
─ Skip: 1 * 2 3 3…
─ Last A B C D E
• Care is needed with repeated letters
• For example
─ pattern: ABBA
─ Skip: * 1 4 4 4…
─ Last: A B C D E …
• Skip[c] = distance of last occurrence of c from end in
pattern
Ning Weng
ECE 526
9
Boyer Moore: algorithm
Input: pattern with size m; packet with size n
i: =0
While i<=n-m do
If pattern[m-1] = packet[i+m-1] then //last character first
For j:=0 to m – 1 do
Compare pattern[j] with packet[i+j] //one by one sequentially
i:=i+1
Else i:=i+skip[packet[i+m-1] //skip
• Complexity:
─ best case O(n/m)
─ worst case still O(nm)
Ning Weng
ECE 526
10
Aho-Corasic
B A B A B A R …. (packet payload)
B
A
B
B
A
R
BABAR
• Failure pointer
─ Prevent restarting at top of trie when failure occurring
─ New attempt made by shifting
• How about multiple strings?
Ning Weng
ECE 526
11
Multiple String Trie Construction
Example: P = {he, she, his, hers}
Initial State
Transition Function
State
Accepting State
h
h
h
2
h
h
s
8
s
9
Ning Weng
ECE 526
4
S h
7
h
h
i
6
S
3
i
S
r
s
S
1
e
0
e
h
r
S
S
5
h
S
12
Aho-Corasick: Searching
Matching String
h
h
h
S
2
r
h
h
i
6
s Sh
8
7
s
Input stream:
h x h e rs
s
S
1
e
0
S
3
h
h
i
S
4
h
r
e
S
5
h
S
9
• Scanning input stream only once
• Complexity: linear time
.
Ning Weng
ECE 526
13
Aho-Corasick: summary
• Pros:
─ Computation complexity: worst case O(n)
─ Can scan once and output all matches
• Cons:
─ Constructing a finite state machine
─ Failure pointers needed
─ Too big to be on chip
• Each node has maximum 256 pointers
Ning Weng
ECE 526
14
Hashing
•
One efficient set membership query mechanism
─ Programming trivial
─ Query complexity: O(n) best case (n: size of packet)
─ Query accuracy: possible false positive
•
However, to handle collision
─ Each hash entry containing a list of IDs of all elements share
the hash value
─ Storage minimal requirement: O(n*w) n: number of elements,
w: minimal width of each element
•
Question: can we trade accuracy for storage
requirement using hashing idea?
Ning Weng
ECE 526
15
Bloom Filter
• Data structured proposed by Burton Bloom
• Randomized data structure
─ Strings stored using multiple hash functions (programming)
─ Check string’s presence based on multiple bits (querying)
• Membership queries result in false positives
• Powerful tools for
─
─
─
─
Content networks
Route trace back
Network measurements
Intrusion Detection
Ning Weng
ECE 526
16
Bloom Filter Programming
• Instead using one hash function, k
independent hash functions
• Instead requiring n*w bit storage; m-bit
vector required
• Initially all bit are cleared
• Programming set bit based on each
hashing function
─ bit remaining set if two elements hashed to
same position
Ning Weng
ECE 526
17
Bloom Filter Querying
• Procedure:
String x is computed by k
hashing functions
Each hashing function
pinpointing one bit in m-bit
vector
All value in m-bit vector are
ANDed
If match ==0,
x is not a member
else
x is positive member
Ning Weng
ECE 526
18
Bloom Filter: false positive rate
• n: number of strings to be stored
• k: number of hash functions
• m: the size of bit array
• The false positive probability
─ f = (1/2)k
─ Optimal value hash functions k
• K = ln2 * m/n = 0.693*m/n
• False positive rate decreases exponentially with number
of hash functions & memory
Ning Weng
ECE 526
19
Counting Bloom Filters
• Member deletion
─ Deletion of a member requiring clearing all the related bits
─ A bit once set in the bit vector can not be deleted easily
• the bit can be set by multiple members
• Solution
─ Assuming member deletion rare case
─ Counting bloom filter
• Updating counter when element added or deleted
• Bit reset in m-bit vector when counter value is 0
Ning Weng
ECE 526
20
Approximate String Searching
• Using Bloom filter
Ning Weng
ECE 526
21
Approximate String Searching
John W. Lockwood and etc. “DEEP PACKET INSPECTION USING PARALLEL BLOOM FILTERS”
Ning Weng
ECE 526
22
Summary
Idea
Computation
Brute Force
Naïve
O(m*n)
Boyer-Moore
Skip
O(m*n) –worst
O(n/m) – best
0.1 MB
(10K Rules)
Shift table needed
Aho Corasick
Tire
O(n) – worst
case
50 MB
(1500 Rules)
Storage
demanding
Bloom-Filter
Approximate
searching
O(n)
0.1 MB
(10K Rules)
False positive
Ning Weng
ECE 526
Storage
Problem
slow
23
For Next Class
• Read Comer: chapter 6 and 9
• Final Project (option 1)
─ Project group finalized
• 9/19/07: group leader: email me your group members .
• each group no more than 3 members.
─ Project topic finalized.
• 9/28/07: Group leader: email me your topic.
•
Paper presentation + Final exam (Option 2)
• 9/19/07: group leader: email me your group members .
• each group no more than 2 members.
• based on assigned one or two papers (<20 min)
Ning Weng
ECE 526
24