Day11_RegularExpressions - Rose

Download Report

Transcript Day11_RegularExpressions - Rose

MA/CSSE 474
Theory of Computation
Regular Expressions Intro
Your Questions?
• Previous class days'
material
• Reading Assignments
• HW5 problems
• Anything else
Still more
language
ambiguity!
Regular Languages
Describes
Regular Expression
Regular
Language
Accepts
Finite State
Machine
Regular Expressions
The regular expressions over an alphabet  are the
strings that can be obtained as follows:
1.  is a regular expression.
2.  is a regular expression.
3. Every element of  is a regular expression.
4. If  ,  are regular expressions, then so is .
5. If  ,  are regular expressions, then so is .
6. If  is a regular expression, then so is *.
7.  is a regular expression, then so is +.
8. If  is a regular expression, then so is ().
#7 is here for convenience only (syntactic sugar);
many authors do not include + in the list of r.e. builders.
Regular Expression Examples
If  = {a, b}, the following are regular expressions:


a
(a  b)*
(abba  )+(a  bab)
Regular Expressions Define Languages
Define L, a semantic interpretation function for regular
expressions (Let  and  be arbitrary regular
expressions over alphabet ).
1. L() = .
2. L() = {}.
3. If c   , L(c) = {c}.
4. L() = L() L().
5. L(  ) = L()  L().
6. L(*) = (L())*.
7. L(+) = L(*) = L() (L())*. If L() is equal to , then
L(+) is also equal to . Otherwise L(+) is the
language that is formed by concatenating together one
or more strings drawn from L().
8. L(()) = L().
The Role of the Rules
• Rules 1, 3, 4, 5, and 6 give the language its power to
define sets.
• Rule 8 has as its only role grouping other operators.
• Rules 2 and 7 appear to add functionality to the
regular expression language, but they don’t.
2.  is a regular expression.
7.  is a regular expression, then so is +.
Operator Precedence in Regular Expressions
Regular
Expressions
Highest
Lowest
Kleene star and +
Arithmetic
Expressions
exponentiation
concatenation
multiplication
union
addition
a b*  c d*
x y2 + i j 2
Analyzing a Regular Expression
L((a  b)*b) = L((a  b)*) L(b)
= (L((a  b)))* L(b)
= (L(a)  L(b))* L(b)
= ({a}  {b})* {b}
= {a, b}* {b}.
From English to reg exps
L = {w  {a, b}*: |w| is even}
L = {w  {0, 1}*: w is a binary representation of a
multiple of 4}
L = {w  {a, b}*: w contains an odd number of a’s}
The Details Matter
a*  b*  (a  b)*
(ab)*  a*b*
More Regular Expression Examples
L ( (aa*)   ) =
L ( (a  )* ) =
L = {w  {a, b}*: there is no more than one b in w}
L = {w  {a, b}* : no two consecutive letters in w are the
same}
The Details Matter
L1 = {w  {a, b}* : every a is immediately followed a b}
A regular expression for L1:
A FSM for L1:
L2 = {w  {a, b}* : every a has a matching b somewhere}
A regular expression for L2:
A FSM for L2:
Kleene’s Theorem
Finite state machines and regular expressions define
the same class of languages.
To prove this, we must show:
Theorem: Any language that can be defined by a
regular expression can be accepted by some FSM
and so is regular.
Theorem: Every regular language (i.e., every language
that can be accepted by some DFSM) can be
defined with a regular expression.
For Every Regular Expression
There is a Corresponding FSM
We’ll show this by construction. An FSM for:
:
A single element c of :
:
Union
If  is the regular expression    and if both L() and
L() are regular:
Concatenation
If  is the regular expression  and if both L() and L()
are regular:
Kleene Star
If  is the regular expression * and if L() is regular:
An Example
(b  ab)*
An FSM for b
An FSM for ab:
An FSM for a
An FSM for b
An Example
(b  ab)*
An FSM for (b  ab):
An Example
(b  ab)*
An FSM for (b  ab)*:
For Every FSM There is a
Corresponding Regular Expression
• We’ll show this by construction.
The construction is different than the textbook's.
• Let M = ({q1, …, qn}, , , q1, A) be a DFSM.
Define Rijk to be the set of all strings x  * such that
• (qi,x) |-M* (qj, ), and
• if (qi,y) |-M* (q𝓁, ), for any prefix y of x
(except y=  and y=x), then 𝓁  k
• That is, Rijk is the set of all strings that take us from qi to
qj without passing through any intermediate states
numbered higher than k.
• In this case, "passing through" means both entering
and leaving.
• Note that either i or j (or both) may be greater than k.
Example: Rijk
• Rijk is the set of all strings that take us from qi to qj without
passing through any intermediate states numbered
higher than k.
• In this case, "passing through" means both entering
and leaving.
• Note that either i or j (or both) may be greater than k.
R000
R010
R011
R021
R022
R232
R233
DFAReg. Exp. construction
• Rijk is the set of all strings that take M from qi
to qj without passing through any
intermediate states numbered higher than k.
• Examples: Rijn is
• Also note that L(M) is the union of R1jn over
all qj in A.
• We will show that for all i,j{1, …, n} and
all k {0, …, n}, Rijk is defined by a regular
expression.
– We already know that the union of languages
defined by reg. exps. is defined by a reg. exp.
DFAReg. Exp. continued
• Rijk is the set of all strings that take M from qi to qj without
passing through any intermediate states numbered higher
than k.
It can be computed recursively:
• Base cases (k = 0):
– If i  j, Rij0 = {a : (qi, a) = qj}
– If i = j, Rii0 = {a : (qi, a) = qi}  {}
• Recursive case (k > 0):
Rijk is Rijk-1  Rikk-1(Rkkk-1)*Rkjk-1
• We show by induction that each Rijk is
defined by some regular expression rijk.
DFAReg. Exp. Proof pt. 1
• Base case definition (k = 0):
– If i  j, Rij0 = {a : (qi, a) = qj}
– If i = j, Rii0 = {a : (qi, a) = qi}  {}
• Base case proof:
Rij0 is a finite set of symbols, each of which is either 
or a single symbol from .
So Rij0 can be defined by the reg. exp.
rij0 = a1a2…ap (or a1a2…ap if i=j),
where {a1, a2, …,ap} is the set of all symbols a such
that (qi, a) = qj.
• Note that if M has no direct transitions from qi to qj,
then rij0 is  (it is  if i=j and no "loop" on that state).
DFAReg. Exp. Proof pt. 2
• Recursive definition (k > 0):
Rijk is Rijk-1  Rikk-1(Rkkk-1)*Rkjk-1
• Induction hypothesis: For each 𝓁 and 𝓂,
there is a regular expression r𝓁𝓂k-1 such that
L(r𝓁𝓂k-1 )= R𝓁𝓂k-1.
• Induction step. By the recursive parts of the
definition of regular expressions and the
languages they define, and by the above
recursive defintion of Rijk :
Rijk = L(rijk-1  rikk-1(rkkk-1)*rkjk-1)
DFAReg. Exp. Proof pt. 3
• We showed by induction that each Rijk is
defined by some regular expression rijk.
• In particular, for all qjA, there is a regular
expression r1jn that defines R1jn.
• Then L(M) = L(r1j1n  … r1jpn ),
where A = {qj1, …, qjp}
An Example
0
Start
q1
q2
0
1
r11k
r12k
r13k
r21k
r22k
r23k
r31k
r32k
r33k
k=0

0
1
0

1

01

k=1

0
1
0
  00
1  01

01

1
q3
0,1
k=2
(00)*
0(00)*
0*1
0(00)*
(00)*
0*1
(0  1)(00)*0
(0  1)(00)*
 (0  1)0*1