投影片 1 - CSIE -NCKU

Download Report

Transcript 投影片 1 - CSIE -NCKU

Author:
Tzu-Fang Sheu ,Nen-Fu Huang and Hsiao-Ping Lee
Publisher:
IEEE Globecom , 2006
Presenter:
Tsung-Lin Hsieh
Date:
2012/05/16
1

Introduction

Related Work

A Cost-Effective String Matching Algorithm

Results
2

The Aho-Corasick algorithm (AC) had the best
worst-case computational time complexity(
compared to other algorithms [1] ~ [7]). But is also
needs a lot of memory.

Tuck et al modified the AC with a compressed data
structure, which reduced the memory size, but also
increased the processing time[10].
3

In this paper, we will propose a practical multiplepattern matching algorithm that has better worstcase performance as well as smaller required
memory.

The proposed novel scheme is based on the
property of Chinese Remainder Theorem and
contributes modifications to the AC.
4

Aho-Corasick Algorithm
each node – 1028 bytes
-> too much memory
5

Aho-Corasick Algorithm with Bitmap

It can reduce the memory to
only 44 bytes/per state

But it needs doing “popcount”
to calculate the offset of the
starting pointer -> too much time
6

Assume that we can find a
simple function so that the
input symbols {h,s,i} can be
mapped to {0,1,2}

Assume that there is a magic number X
7
8

For example, assume we have three valid symbols
{h, s, i} that have paths to the child state as shown in
Figure 3. Assign three prime numbers {2, 3, 5} for {h,
s, i} respectively.



We want to find X % 2 = 0 ,X % 3 = 1 ,X % 5 = 2.
According to CRT -> X = 22
Using “s” for test ,prime represent “s” is 3.
22 % 3 = 1 -> when input is “s” ,visit child 1.
9
52 bytes

The Multiple-Pattern Matching with a Magic
Number

Input “ish”
10
11

ACO-100 means penalty of external access is 100
cycles
12

As the required time and memory are usually tradeoff, to compare the overall costs of these three
algorithms, we define an evaluation function
C: C = CM × CT. (CM: total memory, CT: average time)
13