投影片 1 - CSIE -NCKU
Download
Report
Transcript 投影片 1 - CSIE -NCKU
Author:
Tzu-Fang Sheu ,Nen-Fu Huang and Hsiao-Ping Lee
Publisher:
IEEE Globecom , 2006
Presenter:
Tsung-Lin Hsieh
Date:
2012/05/16
1
Introduction
Related Work
A Cost-Effective String Matching Algorithm
Results
2
The Aho-Corasick algorithm (AC) had the best
worst-case computational time complexity(
compared to other algorithms [1] ~ [7]). But is also
needs a lot of memory.
Tuck et al modified the AC with a compressed data
structure, which reduced the memory size, but also
increased the processing time[10].
3
In this paper, we will propose a practical multiplepattern matching algorithm that has better worstcase performance as well as smaller required
memory.
The proposed novel scheme is based on the
property of Chinese Remainder Theorem and
contributes modifications to the AC.
4
Aho-Corasick Algorithm
each node – 1028 bytes
-> too much memory
5
Aho-Corasick Algorithm with Bitmap
It can reduce the memory to
only 44 bytes/per state
But it needs doing “popcount”
to calculate the offset of the
starting pointer -> too much time
6
Assume that we can find a
simple function so that the
input symbols {h,s,i} can be
mapped to {0,1,2}
Assume that there is a magic number X
7
8
For example, assume we have three valid symbols
{h, s, i} that have paths to the child state as shown in
Figure 3. Assign three prime numbers {2, 3, 5} for {h,
s, i} respectively.
We want to find X % 2 = 0 ,X % 3 = 1 ,X % 5 = 2.
According to CRT -> X = 22
Using “s” for test ,prime represent “s” is 3.
22 % 3 = 1 -> when input is “s” ,visit child 1.
9
52 bytes
The Multiple-Pattern Matching with a Magic
Number
Input “ish”
10
11
ACO-100 means penalty of external access is 100
cycles
12
As the required time and memory are usually tradeoff, to compare the overall costs of these three
algorithms, we define an evaluation function
C: C = CM × CT. (CM: total memory, CT: average time)
13