3. Matching of Individual Patterns - CSIE -NCKU

Download Report

Transcript 3. Matching of Individual Patterns - CSIE -NCKU

Fast and Memory-Efficient
Regular Expression
Matching for Deep Packet
Inspection
Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman
and Randy H. Katz
Publisher: ANCS'06, December 3–5, 2006
Present: Yu-Tso Chen
Date: November, 6, 2007
Department of Computer Science and Information Engineering
National Cheng Kung University, Taiwan R.O.C.
1
Outline






1. Introduction
2. Definitions and problem description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
2
Introduction

Three unique complex features
• 1) Large numbers of wildcards can cause DFA to
•
•
grow exponentially
2) Wildcard are used with length restriction(‘?’, ‘+’)
will increase the resource
3) Groups of characters are also commonly used
such interaction can result in highly complex state
machine(ex.”^220[\x09-]*ftp”)
3
Introduction (cont.)

Make following contributions
• 1) Analyze the computational and storage cost of
•
•
building individual DFAs
2) Two rewrite rules for specific regular
expressions
3) Combine multiple DFAs into a small number of
group
4
Outline






1. Introduction
2. Definitions and problem
description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
5
Regular Expression Patterns

Compares the regular expressions used
in two networking applications (Snort,
Linux L-7 filter & XML filtering)
• 1)Both types of app. Use wildcards (‘.’,’?‘,’+’,’*’)
•
•
contain larger numbers of them
2) Classes of characters (“[ ]”) are used only
in packet scanning applications
3) High percentage of scanning app. Have
length restrictions on some of the classes or
wildcards
6
Regular Expression Patterns
7
Solution Space for Regular
Expression Matching



A single regular expression of length n can be
expressed as an NFA with O(n)
When the NFA is converted into a DFA, it may
n

O
generate   states
The processing complexity for each character
in the input is O(1) in DFA, but is O(n2) for an
NFA when all n states are active at the same
time
8
Solution Space for Regular
Expression Matching (cont.)

To handle m regular expressions, two
choices are possible:
• Processing them individually in m automata
• Compiling them into a single automaton
9
Problem Statement

DFA-based approaches in this paper
• Our goal is to achieve O(1) computation cost
• The focus of the study is to reduce memory
overhead of DFA

There are two sources of memory usage
in DFAs:states and transitions
• We consider the number of states as the
primary factor
10
Outline






1. Introduction
2. Definitions and problem description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
11
Design Considerations

Define Completeness of Matching Results:
•
Exhaustive Matching:M(P,S)={substring S’ of S | S’
is accepted by the DFA of P}
• It is expensive and often unnecessary to report all
•
•
matching substrings
We propose a new concept, Non-overlapping Matching,
that relaxes the requirements of exhaustive matching
Non-overlapping Matching:
M ( P, S )  {substring Si of S | Si, Sj accepted by the DFA of P, Si  Sj  }
• Ex:ab* if input abbb non-overlapping matching will
•
report one match instead of three
Exhaustive Matching will report, ab, abb, abbb
12
Design Considerations (cont.)

Define DFA Execution Model for Substring
Matching:We focus on patterns without ‘^’
attached at the beginning
•
•
Repeater searches
One-pass search – this approach can truly achieve
O(1) computation cost per character
13
DFA Analysis for Individual Regular
Expressions

The study is based on the use of exhaustive
matching & one-pass search
14
Case 4:DFA of Quadratic Size

The DFA needs to remember the number
of Bs it has seen and their locations
15
Case 5:DFA of Exponential Size

An exponential number of states
(22+1)are needed to represent these two
wildcard characters
AAB(AABBCD) is different
from ABA(ABABCD)
because a subsequence
input BCD
16
Regular Expression Rewrites

Rewrite Rule(1)
• “^SEARCH\s+[^\n]{1024}” to
•
“^SEARCH\s [^\n]{1024}”
“^A+[A-Z]{j}” to “^A [A-Z]{j}”
• We can prove match “^A+[A-Z]{j}” also match “^A
[A-Z]{j}”
17
Regular Expression Rewrites (cont.)

Rewrite Rule(2)
•
We don’t need to keep track of the second AUTH\s
• If there is a ‘\n’ within the next 100 bytes, the return
•
•
character must also be within 100 bytes to the second
AUTH\s
If there is no ‘\n’ within the next 100 bytes, the first
already matched the pattern
“([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99
}\n)*AUTH\s[^\n]{100}”
18
Outline






1. Introduction
2. Definitions and problem description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
19
Selective Grouping of Multiple
Patterns

The composite DFA may experience exponential
growth in size, although none of the individual DFA
has an exponential component
20
Regular Expressions Grouping
Algorithm

Definition of interaction:two patterns
interact with each other if their composite
DFA contains more states than the sum
of two individual ones
21
Grouping Algorithm
22
Outline






1. Introduction
2. Definitions and problem description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
23
Evaluation Result

Effect of Rule Rewriting
24
Evaluation Result
(cont.)
25
Outline






1. Introduction
2. Definitions and problem description
3. Matching of Individual Patterns
4. Selective Grouping of Multiple
Patterns
5. Evaluation Result
6. Conclusion
26
Conclusion


Rewriting techniques –
memory-efficient DFA-based approaches are
possible
Selectively groups patterns together –
speed up the matching process
27