Powerpoint - Universität Freiburg

Download Report

Transcript Powerpoint - Universität Freiburg

Applied Computer Science II
Chapter 1 : Regular Languages
Prof. Dr. Luc De Raedt
Institut für Informatik
Albert-Ludwigs Universität Freiburg
Germany
Overview
•
•
•
•
•
•
•
Deterministic finite automata
Regular languages
Nondeterministic finite automata
Closure operations
Regular expressions
Nonregular languages
The pumping lemma
Finite Automata
• An intuitive example : supermarket door controller
• Figures 1,2,3
•
Probabilistic counterparts exist
– Markov chains, bayesian nets, etc.
– Not in this course
A finite automaton
• Figure 1.4
• Formally
A finite automaton is a 5-tuple (Q, ,  , q0 , F )
1.Q is a finite set of states
States :q1 , q2 , q3
2.  is a finite set, the alphabet
3.  : Q    Q is the transition function
Startstate :q1
4. qo  Q is the start state
Acceptstate :q2
5. F  Q is the set of accept states
Transitions
Output : accept or reject
Describe M 1
Q  {q1 , q2 , q3}
A is the language of machine M
we write L( M )  A
A  {w | w contains at least one 1 and an
even number of 0s follow the last 1 }
  {0,1}
 defined by
0
q1 q1
q2 q3
q3 q2
q1 start state
F  {q2 }
1
q2
q2
q2
Other examples
• 7,8,9
Another example
A generalisation : Ai is the language of all strings where the sum of the numbers
is a multiple of i except that the sum is reset to 0 whenever the symbol reset appears
Automaton Bi =
1.Qi  {q0 ,..., qi }
2.  ={0,1,2, reset }
3. (q j , 0)  q j
 (q j ,1)  qk where k  ( j  1) mod i
 (q j , 2)  qk where k  ( j  2) mod i
 (q j , reset )  q0
4. qo  Q is start and accept state
Formal definition of
computation
Let M be a finite automaton (Q, ,  , q0 , F )
Let w  w1....wn be a string over 
M accepts w if a sequence of states r0 ,..., rn exists in Q such that
1.r0  q0
2. (ri , wi 1 )  ri 1 for all i  0,..., n  1
3.rn  F
M recognizes language A if A  {w | M accepts w}
A language is regular if some finite automaton recognizes it.
Designing finite automata
• Design automaton for language
consisting of binary strings with an
odd number of 1s
• Design first states
• Then transitions
• Accept and reject states
• Fig. 12
Another example
• Design an automaton to recognize the
language of binary strings containing
the string 001 as substring
• Fig 13
The regular operations
Let A and B be languages
We define :
Union :A  B  {x | x  A or x B}
Concatenation :A B  {xy | x  A and y B}
Star :A*  {x1 ,..., xn | n  0 and each xi  A}
note: always   A*
Example
A  {good , bad }
B  {boy , girl}
Regular languages are
closed under …
A set S is closed under an operation o if applying o on
elements of S yields elements of S .
Example: multiplication on natural numbers
Counterexample :division of natural numbers
Theorem 1.12
The class of the regular languages is closed under the union operation.
In other words, if A1 and A2 are regular languages, so is A1  A2
• Proof
M  (Q, ,  , q, F )
constructed from M 1  (Q1 , 1 , 1 , q1 , F1 ) and M 2  (Q2 ,  2 ,  2 , q2 , F2 )
Define
1. Q  {(r1 , r2 ) | r1  Q1 and r2  Q2 }
2.  1   2
3. ((r1 , r2 ), a)  (1 (r1 , a),  2 (r2 , a))
4.q  (q1 , q2 )
5.F  {(r1 , r2 ) | r1  F1 or r2  F2 }
Theorem 1.13
The class of the regular languages is closed under the concatenation operation.
In other words, if A1 and A2 are regular languages, so is A1 A2
Non deterministic finite
automata
• Deterministic
– One successor state
–  transitions not allowed
• Non deterministic
– Several successor states possible
–  transitions possible
• Figure 14
Deterministic versus non
deterministic computation
• Figure 15
Example
Every NFA has an
equivalent DFA
• Figures 17-18
Another NFA
Nondeterministic finite
automaton
A nondeterministic finite automaton is a 5-tuple (Q, ,  , q0 , F )
1.Q is a finite set of states
2.  is a finite set, the alphabet
3.  : Q    P(Q) is the transition function
4. qo  Q is the start state
5. F  Q is the set of accept states
 includes 
P(Q) the powerset of Q
Example
• Example 18
Formal definition of
computation
Let M be a nondeterministic finite automaton (Q, ,  , q0 , F )
Let w  w1....wn be a string over 
M accepts w if a sequence of states r0 ,..., rn exists in Q such that
1.r0  q0
2.ri 1   (ri , wi 1 ) for all i  0,..., n  1
3.rn  F
Equivalence NFA and DFA
Two machines are equivalent if they recognize the same language
Theorem 1.19
Every nondeterministic finite automaton has an equivalent finite automaton
Corollary 1.20
A language is regular if and only if some nondeterministic finite
automaton recognizes it.
Proof
An example
Closure under the regular
operations
Theorem 1.12/1.22
The class of the regular languages is closed under the union operation.
In other words, if A1 and A2 are regular languages, so is A1  A2
Theorem 1.23
The class of the regular languages is closed under the concatenation operation.
Theorem 1.24
The class of the regular languages is closed under the star operation.
Proof idea
Theorem 1.12/1.22
The class of the regular languages is closed under the union operation.
In other words, if A1 and A2 are regular languages, so is A1  A2
• INSERT FIG 1.24
• PROOF P 60 !
Proof idea
Theorem 1.23
The class of the regular languages is closed under the concatenation operation.
Proof idea
Theorem 1.24
The class of the regular languages is closed under the star operation.
Regular expressions
R 
R 
R 
R 
Applications
• Design of compilers
{, ,  }( DD*  DD* .D  D* .DD* )
where D  {0,...,9}
• awk, grep, vi … in unix (search for strings)
• Perl programming language
• Bioinformatics
– So called motifs (patterns occurring in
sequences, e.g. proteins)
Theorem1.28
A language is regular if and only if some regular expression describes it
Proof through :
Lemma1.29
If a language is described by some regular expression, then it is regular
Lemma 1.32
If a language is regular, then it is described by some regular expression
Lemma1.29
If a language is described by some regular expression, then it is regular
Lemma 1.32
If a language is regular, then it is described by some regular expression
• Two steps
– DFA into GNFA (generalized
nondeterministic finite automaton)
– GNFA into regular expression
GNFAs
• Two states q and r
are connected in
both directions
• Exception :
– Start state
– Accept state
– One direction only
• Labels are regular
expressions
A generalized nondeterministic finite automaton is a 5-tuple (Q, ,  , qstart , qaccept )
Formally
1.Q is a finite set of states
2.  is a finite set, the alphabet
3.  : (Q  {qaccept })  (Q  {qstart })  R is the transition function
4. qstart  Q is the start state
5. qaccept  Q the accept state
A GNFA accepts w  w1...wk where each wi  *
if a sequence of states r0 ,..., rn exists in Q such that
1.r0  qstart
2.rk  qaccept
3. for all i  0,..., n  1, we have that wi  L( Ri )
where Ri   (ri 1 , ri )
Convert DFA into GNFA
Add new start state, with  arrow to old start state
Add new accept state, with arrows from old accept states
If any arrows have multiple labels a and b, replace by a  b
Add arrows with label  between states where necessary
Convert GNFA into regular
expression
Induction Proof
Claim 1.34
For any GNFA G, Convert(G ) is equivalent to G
• Induction proof
Basis
k  2; straightforward
Induction
Assume claim holds for k  1 states
Suppose
Nonregular Languages
• Finite Automata have a finite memory
• Are the following languages regular ?
B  {0n1n | n  0}
C  {w | w has an equal number of 0s and 1s}
D  {w | w has an equal number of occurences of 01 and 10}
• Mathematical proof necessary
The pumping lemma
The pumping lemma
If A is regular, then there is a number p, for which
if s is any string in A of length at least p
then s may be divided into three pieces s  xyz
such that
1. for each i  0, xy i z  A
2. | y | 0
3. | xy | p
Note y  
Proof Idea
Nonregular languages
B  {0 1 | n  0}
n n
Choose s  0 p1p
1. string y consists only of 0s
2. string y consists only of 1s
3. string y consists of both 0s and 1s
C  {w | w has an equal number of 0s and 1s}
Choose s  0 p1p
Would seem possible without condition 3!
However, condition 3 of lemma states | xy | p
Thus y consists of 0s only
Then xyyz  C
Choice of s crucial. Consider s  (01) p
Alternative proof :
B is nonregular
If C were regular, then C  0*1*  B regular
Regular languages closed under intersection
F  {ww | w {0,1}*}
Choose s  0 p10 p1
Condition 3 of lemma states | xy | p
Thus y consists of 0s only
Then xyyz  F
0 p 0 p would not work !
D  {1 | n  0}
n2
Choose s  1p
2
Perfect squares : 0,1,4,9,16,25,36,49,...
Consider xy i z and xy i 1 z
Prove that for large i: xy i z and xy i 1 z cannot both be squares
which should be true according to pumping lemma
Let m  n 2 | xy i z |
Then ( n  1) 2  n 2  2n  1  2 m  1
Choose | y | 2 m  1  2 | xy i z |  1
by setting i  p 4
Indeed, observe
| y || s | p 2 
p4
<2 p 4  1
<2 | xy i z |  1
E  {w | 0i1 j where i  j}
p 1 p
Choose s  0 1
Condition 3 of lemma states | xy | p
Thus y consists of 0s only
Then xy 0 z  F
Summary
•
•
•
•
•
•
•
Deterministic finite automata
Regular languages
Nondeterministic finite automata
Closure operations
Regular expressions
Nonregular languages
The pumping lemma