Transcript q 1
15-453
FORMAL LANGUAGES,
AUTOMATA, AND
COMPUTABILITY
REGULAR EXPRESSIONS
THURSDAY Jan 21
WHAT DOES D LOOK LIKE?
D = { w | w has equal number of
occurrences of 01 and 10}
= { w | w = 1, w = 0, w = ε or
w starts with a 0 and ends with a 0 or
w starts with a 1 and ends with a 1 }
1 0 ε 0(01)*0 1(01)*1
REGULAR EXPRESSIONS
is a regexp representing {}
ε is a regexp representing {ε}
is a regexp representing
If R1 and R2 are regular expressions
representing L1 and L2 then:
(R1R2) represents L1 L2
(R1 R2) represents L1 L2
(R1)* represents L1*
PRECEDENCE
Tightest
Loosest
Star (“*”)
Concatenation (“.”, “”)
Union (“”, “+”, “|”)
EXAMPLE
R1*R2 R3 = ( ( R1* ) R2 ) R3
{ w | w has exactly a single 1 }
0*10*
What language does
* represent?
{ε}
{ w | w has length ≥ 3 and its 3rd symbol is 0 }
000(01)* 010(01)*
100(01)* 110(01)*
(01)(01)0(01)*
{ w | every odd position of w is a 1 }
1((01)1)*(01ε) ε
Also
(1(0 1))*(1 ε)
EQUIVALENCE
L can be represented by a regexp
L is regular
Induction on the length of R:
Given regular expression R, we show there
exists NFA N such that R represents L(N)
L can be represented by a regexp
L is regular
Base Cases (R has length 1):
Given regular expression R, we show there
exists
R NFA
= N such that R represents L(N)
Induction on the length of R:
R=ε
R=
Inductive Step:
Assume R has length k > 1,
and that every regular expression of length < k
represents a regular language
Three possibilities for R:
R = R1 R2
(Union Theorem!)
R = R1 R2
(Concatenation)
R = (R1)*
(Star)
Therefore: L can be represented by a regexp
L is regular
Have Shown
L can be represented by a regexp
L is a regular language
Give an NFA that accepts the
language represented by (1(0 1))*
ε
1
1,0
ε
L can be represented by a regexp
L is a regular language
Proof idea: Transform an NFA for L into a
regular expression by removing states and
re-labeling arrows with regular expressions
ε
ε
ε
NFA
ε
ε
Add While
uniquemachine
and distinct
start and
states
has more
thanaccept
2 states:
Pick an internal state, rip it out and
re-label the arrows with regexps,
to account for the missing state
0
01*0
1
0
ε
ε
GNFA
ε
ε
ε
While machine has more than 2 states:
In general:
R(q1,q3)
R(qR(q
)R(q
)*R(q
1,q1
2,q
2,q
2) 2,q2R(q
2,q
3) 3)
q2
q3
q1
R(q1,q3)
R(q2,q2)
a
q0
ε
a,b
(a*b)(ab)*
b
a*b
q1
q2
R(q0,q3) = (a*b)(ab)*
represents L(N)
ε
q3
Formally: Add qstart and qaccept to create G
Run CONVERT(G):
If #states = 2
(Outputs a regexp)
return the
If #states
> 2 expression on the arrow
going from qstart to qaccept
Formally: Add qstart and qaccept to create G
Run CONVERT(G):
(Outputs a regexp)
If #states > 2
select qripQ different from qstart and qaccept
define Q = Q – {qrip}
define R as:
}
Defines: G (GNFA)
R(qi,qj) = R(qi,qrip)R(qrip,qrip)*R(qrip,qj) R(qi,qj)
(R = the regexps for edges in G)
return CONVERT(G)
Claim: CONVERT(G) is equivalent to G
Proof by induction on k (number of states in G)
Base Case:
k=2
Inductive Step:
Assume claim is true for k-1 state GNFAs
We first note that G and G are equivalent
But, by the induction hypothesis, G is
equivalent to CONVERT(G)
Thus: CONVERT(G) equivalent to CONVERT(G)
QED
b
bb
a
q1
q2
a
ε
b
ε
b
a
q3
ε
b
bb
a
q1
ε
q2
a aba
b
b
ε
a
ε
b
bb (abb
ba)b*a
a
q1
ε
a ba
q2
b (a ba)b*ε
(bb (a ba)b*a)* (b (a ba)b*)
DFA
NFA
DEFINITION
Regular
Language
Regular
Expression
FLAC
Read Chapter 2.1 of the book for next time
www.cs.cmu.edu/~15453