Day11_RegularExpressions - Rose
Download
Report
Transcript Day11_RegularExpressions - Rose
MA/CSSE 474
Theory of Computation
Regular Expressions Intro
Your Questions?
• Previous class days'
material
• Reading Assignments
• HW5 problems
• Anything else
Still more
language
ambiguity!
Regular Languages
Describes
Regular Expression
Regular
Language
Accepts
Finite State
Machine
Regular Expressions
The regular expressions over an alphabet are the
strings that can be obtained as follows:
1. is a regular expression.
2. is a regular expression.
3. Every element of is a regular expression.
4. If , are regular expressions, then so is .
5. If , are regular expressions, then so is .
6. If is a regular expression, then so is *.
7. is a regular expression, then so is +.
8. If is a regular expression, then so is ().
#7 is here for convenience only (syntactic sugar);
many authors do not include + in the list of r.e. builders.
Regular Expression Examples
If = {a, b}, the following are regular expressions:
a
(a b)*
(abba )+(a bab)
Regular Expressions Define Languages
Define L, a semantic interpretation function for regular
expressions (Let and be arbitrary regular
expressions over alphabet ).
1. L() = .
2. L() = {}.
3. If c , L(c) = {c}.
4. L() = L() L().
5. L( ) = L() L().
6. L(*) = (L())*.
7. L(+) = L(*) = L() (L())*. If L() is equal to , then
L(+) is also equal to . Otherwise L(+) is the
language that is formed by concatenating together one
or more strings drawn from L().
8. L(()) = L().
The Role of the Rules
• Rules 1, 3, 4, 5, and 6 give the language its power to
define sets.
• Rule 8 has as its only role grouping other operators.
• Rules 2 and 7 appear to add functionality to the
regular expression language, but they don’t.
2. is a regular expression.
7. is a regular expression, then so is +.
Operator Precedence in Regular Expressions
Regular
Expressions
Highest
Lowest
Kleene star and +
Arithmetic
Expressions
exponentiation
concatenation
multiplication
union
addition
a b* c d*
x y2 + i j 2
Analyzing a Regular Expression
L((a b)*b) = L((a b)*) L(b)
= (L((a b)))* L(b)
= (L(a) L(b))* L(b)
= ({a} {b})* {b}
= {a, b}* {b}.
From English to reg exps
L = {w {a, b}*: |w| is even}
L = {w {0, 1}*: w is a binary representation of a
multiple of 4}
L = {w {a, b}*: w contains an odd number of a’s}
The Details Matter
a* b* (a b)*
(ab)* a*b*
More Regular Expression Examples
L ( (aa*) ) =
L ( (a )* ) =
L = {w {a, b}*: there is no more than one b in w}
L = {w {a, b}* : no two consecutive letters in w are the
same}
The Details Matter
L1 = {w {a, b}* : every a is immediately followed a b}
A regular expression for L1:
A FSM for L1:
L2 = {w {a, b}* : every a has a matching b somewhere}
A regular expression for L2:
A FSM for L2:
Kleene’s Theorem
Finite state machines and regular expressions define
the same class of languages.
To prove this, we must show:
Theorem: Any language that can be defined by a
regular expression can be accepted by some FSM
and so is regular.
Theorem: Every regular language (i.e., every language
that can be accepted by some DFSM) can be
defined with a regular expression.
For Every Regular Expression
There is a Corresponding FSM
We’ll show this by construction. An FSM for:
:
A single element c of :
:
Union
If is the regular expression and if both L() and
L() are regular:
Concatenation
If is the regular expression and if both L() and L()
are regular:
Kleene Star
If is the regular expression * and if L() is regular:
An Example
(b ab)*
An FSM for b
An FSM for ab:
An FSM for a
An FSM for b
An Example
(b ab)*
An FSM for (b ab):
An Example
(b ab)*
An FSM for (b ab)*:
For Every FSM There is a
Corresponding Regular Expression
• We’ll show this by construction.
The construction is different than the textbook's.
• Let M = ({q1, …, qn}, , , q1, A) be a DFSM.
Define Rijk to be the set of all strings x * such that
• (qi,x) |-M* (qj, ), and
• if (qi,y) |-M* (q𝓁, ), for any prefix y of x
(except y= and y=x), then 𝓁 k
• That is, Rijk is the set of all strings that take us from qi to
qj without passing through any intermediate states
numbered higher than k.
• In this case, "passing through" means both entering
and leaving.
• Note that either i or j (or both) may be greater than k.
Example: Rijk
• Rijk is the set of all strings that take us from qi to qj without
passing through any intermediate states numbered
higher than k.
• In this case, "passing through" means both entering
and leaving.
• Note that either i or j (or both) may be greater than k.
R000
R010
R011
R021
R022
R232
R233
DFAReg. Exp. construction
• Rijk is the set of all strings that take M from qi
to qj without passing through any
intermediate states numbered higher than k.
• Examples: Rijn is
• Also note that L(M) is the union of R1jn over
all qj in A.
• We will show that for all i,j{1, …, n} and
all k {0, …, n}, Rijk is defined by a regular
expression.
– We already know that the union of languages
defined by reg. exps. is defined by a reg. exp.
DFAReg. Exp. continued
• Rijk is the set of all strings that take M from qi to qj without
passing through any intermediate states numbered higher
than k.
It can be computed recursively:
• Base cases (k = 0):
– If i j, Rij0 = {a : (qi, a) = qj}
– If i = j, Rii0 = {a : (qi, a) = qi} {}
• Recursive case (k > 0):
Rijk is Rijk-1 Rikk-1(Rkkk-1)*Rkjk-1
• We show by induction that each Rijk is
defined by some regular expression rijk.
DFAReg. Exp. Proof pt. 1
• Base case definition (k = 0):
– If i j, Rij0 = {a : (qi, a) = qj}
– If i = j, Rii0 = {a : (qi, a) = qi} {}
• Base case proof:
Rij0 is a finite set of symbols, each of which is either
or a single symbol from .
So Rij0 can be defined by the reg. exp.
rij0 = a1a2…ap (or a1a2…ap if i=j),
where {a1, a2, …,ap} is the set of all symbols a such
that (qi, a) = qj.
• Note that if M has no direct transitions from qi to qj,
then rij0 is (it is if i=j and no "loop" on that state).
DFAReg. Exp. Proof pt. 2
• Recursive definition (k > 0):
Rijk is Rijk-1 Rikk-1(Rkkk-1)*Rkjk-1
• Induction hypothesis: For each 𝓁 and 𝓂,
there is a regular expression r𝓁𝓂k-1 such that
L(r𝓁𝓂k-1 )= R𝓁𝓂k-1.
• Induction step. By the recursive parts of the
definition of regular expressions and the
languages they define, and by the above
recursive defintion of Rijk :
Rijk = L(rijk-1 rikk-1(rkkk-1)*rkjk-1)
DFAReg. Exp. Proof pt. 3
• We showed by induction that each Rijk is
defined by some regular expression rijk.
• In particular, for all qjA, there is a regular
expression r1jn that defines R1jn.
• Then L(M) = L(r1j1n … r1jpn ),
where A = {qj1, …, qjp}
An Example
0
Start
q1
q2
0
1
r11k
r12k
r13k
r21k
r22k
r23k
r31k
r32k
r33k
k=0
0
1
0
1
01
k=1
0
1
0
00
1 01
01
1
q3
0,1
k=2
(00)*
0(00)*
0*1
0(00)*
(00)*
0*1
(0 1)(00)*0
(0 1)(00)*
(0 1)0*1