Review - MIT Files

Download Report

Transcript Review - MIT Files

CS 3813: Introduction to Formal
Languages and Automata
Chapter 3
Regular languages and expressions
These class notes are based on material from our textbook, An
Introduction to Formal Languages and Automata, 3rd ed.,
by Peter Linz, published by Jones and Bartlett Publishers, Inc.,
Sudbury, MA, 2001. They are intended for classroom use only
and are not a substitute for reading the textbook.
Operations on formal languages
Let L1 = {10} and L2 = {011, 11}.
Union:
L1  L2 = {10, 011, 11}
Concatenation: L1 L2 = {10011, 1011}
Kleene Star:
L1* = {λ, 10, 1010, 101010, … }
Other operations: intersection, complement, difference
Definition Of Regular Languages
A regular language over an alphabet  is one
that contains either a single string of length 0
or 1, or strings which can be obtained by
using the operations of union, concatenation,
or Kleene* on strings of length 0 or 1.
Alternative definition of regular languages
The simplest possible regular languages are the empty set
and languages consisting of a single string that is either
the empty string or has length one. For example; if
 = {a,b}, the simplest languages are , {λ}, {a}, and {b}.
A regular language is a language that can be built from
these simple languages, by using the three operations
of union, concatenation, and Kleene star.
Regular Languages correspond to
Regular Expressions
•
•
•
•
•
•
L=Ø
L = {}
L = {a}
L = L1  L2
L = L1 L2
L = L1*
RE is Ø
RE = 
RE = a
RE = (r1 + r2)
RE = (r1r2)
RE = (r1*)
Regular expressions
A useful shorthand for describing regular languages.
Compare to arithmetic expressions, such as (x + 3)/2.
An arithmetic expression is constructed using arithmetic
operators, such as addition and division. A regular
expression is constructed using operations on languages,
such as concatenation, union, and Kleene star.
The value of an arithmetic expression is a number.
The value of a regular expression is a language.
Recursive definition of a regular expression
 is a regular expression corresponding to the language .
λ is a regular expression corresponding to the language {λ}.
For each symbol a  , a is a regular expression
corresponding to the language {a}.
For any regular expressions r and s, corresponding to
the regular languages L(r) and L(s), respectively,
each of the following is a regular expression:
(r + s) corresponds to the language L(r)  L(s)
(r · s) or (rs) corresponds to the language L(r)L(s)
(r*) corresponds to the language (L(r))*
Examples
a*+ b = {λ, a, b, aa, aaa, aaaa, aaaaa, … }
a*ba* = {w  * : w has exactly one b}
(a + b)* = any string of a’s and b’s
(a + b)*aa (a + b)* = {w  * : w contains aa}
(a + b)*aa (a + b)* + (a + b)*bb (a + b)* =
{w  * : w contains aa or bb}
(a + λ)b* = {abn : n  0} + {bn : n  0}
As with arithmetic expressions, there is an order of
precedence for operators -- unless you change it using
parentheses. The order is: star closure first, then
concatenation, then union.
More examples
All strings containing no more than two a’s:
(b + c)*(λ + a)(b + c)*(λ + a)(b + c)*
All strings containing no runs of a’s of length greater
than two:
(b + c)*(λ + a + aa)(b + c)*((b + c)(b + c)*(λ + a + aa)(b + c)*)*
All strings in which all runs of a’s have lengths that are
multiples of three:
(aaa + b + c)*
Hints for writing regular expressions
Assume  = {a, b, c}.
Zero or more a’s:
a*
One or more a’s:
aa*
Any string at all:
(a + b + c)*
Any nonempty string:
(a + b + c)(a + b + c)*
Any string that does not contain a:
(b + c)*
Any string containing exactly one a: (b + c)*a(b + c)*
Practice
Let  = {a,b,c}. Give a regular expression for the
following languages:
(a) all strings containing exactly two a’s
(b) all strings containing no more than three a’s
Practice
Let  = {a,b,c}. Give a regular expression for the
following languages:
(a) all strings containing exactly two a’s
(b + c)*a(b + c)*a(b + c)*
(b) all strings containing no more than three a’s
(b + c)*(λ + a)(b + c)*(λ + a)(b + c)*(λ + a)(b + c)*
Practice
What languages correspond to the following
regular expressions?
a*b
(aaa + bba)
(ab)*
More practice
Give regular expressions for the following languages,
where the alphabet is  = {a, b, c}.
--all strings ending in b
--all strings containing no more than two a’s
-- all strings of even length
More practice
Give regular expressions for the following languages,
where the alphabet is  = {0, 1}.
--all strings of one or more 0’s followed by a 1
--all strings of two or more symbols followed by three
or more 0’s
-- all strings that do not end with 01
Do these strings match the regular expression?
Regular expression
String
(01* + 1)
0101
(a + λ)b
b
(ab)*a*
λ
(a + b)(ab)
bb
Accepting (review)
Let M = (Q, , q0, d, A) be an FA.
• A string x   * is accepted by M if
d*(q0, x)  A
• The language accepted (or recognized) by M is the
set L(M) = {x   * | x is accepted by M}
• A language L over the alphabet  is regular iff
there is a Finite Automaton that accepts L.
Kleene’s theorem
1) For any regular expression r that represents language
L(r), there is a finite automaton that accepts that same
language.
2) For any finite automaton M that accepts language L(M),
there is a regular expression that represents the same
language.
Therefore, the class of languages that can be represented by
regular expressions is equivalent to the class of languages
accepted by finite automata -- the regular languages.
Kleene’s theorem part 1
NFA
proved
DFA
regular
expression
Kleene’s
Theorem part 2
Theorem 3.1
1st half of Kleene’s theorem: Let r be a regular
expression. Then there exists some nondeterministic
regular accepter that accepts L(r). Consequently, L(r) is
a regular language.
Proof strategy: for any regular expression,
we show how to construct an equivalent NFA.
Because regular expressions are defined recursively,
the proof is by induction.
Base step: Give a NFA that accepts each of the simple
or “base” languages, , {λ}, and {a} for each a  .
a
Inductive step: For each of the operations -- union,
concatenation and Kleene star -- show how to
construct an accepting NFA.
Closure under union:
M1
λ
λ
λ
M2
λ
Closure under concatenation:
M1
λ
M2
Closure under Kleene Star:
λ
λ
M1
λ
λ
Closure properties of Regular
Languages
• Union, concatenation, and Kleene star of two
regular languages will result in a regular
language, since we can write a regular
expression for them.
• Intersection and difference (complement) of two
regular languages will also produce a regular
language.
• The class of regular languages is said to be
closed under these operations. (More in Ch. 4.)
Exercise
Use the construction of the first half of Kleene’s
theorem to construct a NFA that accepts the language
L(ab*aa + bba*ab).
Exercise
Use the construction of the first half of Kleene’s
theorem to construct a NFA that accepts the language
L(ab*aa + bba*ab).
λ
FA accepting ab*aa
λ
q0
qf
λ
FA accepting bba*ab
λ
Exercise
Construct a NFA that accepts the language
corresponding to the regular expression:
((b(a+b)*a) + a)
Theorem 3.2
Kleene’s theorem part 2: Let L be a regular language.
Then there exists a regular expression r such that L =
L(r).
Any language accepted by a finite automaton can
be represented by a regular expression.
The proof strategy: For any DFA, we show
how create an equivalent regular expression. In other
words, we describe an algorithm for converting any
DFA to a regular expression.
Expression diagram
• A labeled directed graph (similar to a finite state
diagram) in which transitions are labeled by regular
expressions
• Has a single start state with no incoming transitions
• Has a single accepting state with no outgoing
transitions
ab
• Example:
(a+b)
a*
Algorithm for converting a DFA into an
equivalent regular expression
Initial step: Change every transition labeled a,b to (a+b).
Add a single start state with an outgoing λ-transition to
the current start state, and add a single final state with
incoming λ-transitions from every previous final state.
Main step: Until expression diagram has only two states
(initial state and final state), repeat the following:
-- pick some non-start, non-final state
-- remove it from the diagram and re-label transitions
with regular expressions so that the same language
is accepted
The key step is removing states and re-labeling
transitions with regular expressions. Here are
some examples of how to do this.
b
a
a
b
ab*a
ab*a
a
b
ab*b
a
b
a
b
ab*b
a
ab*a
Exercise
a,b
a
b
Continue ...
a
λ
(a+b)
b
λ
Exercise
a
λ
(a+b)
λ
b
(a+b)
a* b
λ
a*b (a+b)*
Exercise
Find a regular expression that corresponds to the
language accepted by the following DFA.
a
b
a
b
Exercise
a
b
a
λ
λ
b
a*bb*a
a*bb*
(a*bb*a)*a*bb*
λ
Homework
Find a regular expression that corresponds to the
language accepted by the following DFA.
0
q0
q1
1
0
1
q2
0
1
Applications of regular expressions
• Validation
– checking that an input string is in valid format
– example 1: checking format of email address on
WWW entry form
– example 2: UNIX regex command
• Search and selection
– looking for strings that match a certain pattern
– example: UNIX grep command
• Tokenization
– converting sequence of characters (a string) into
sequence of tokens (e.g., keywords, identifiers)
– used in lexical analysis phase of compiler
Grammar
• A grammar G = (V, T, S, P) consists of the
following quadruple:
– a set V of variables (non-terminal symbols),
including a starting symbol S  NT
– a set T of terminals (same as an alphabet, )
– A start symbol S  V
– a set P of production rules
• Example:
S  aS | A
A bA | λ
Derivation
• Strings are “derived” from a grammar
• Example of a derivation
S  aS  aaS  aaA  aabA  aab
• At each step, a nonterminal is replaced by
the sentential form on the right-hand side
of a rule (a sentential form can contain
nonterminals and/or terminals)
• Automata recognize languages; grammars
generate languages
Context-free grammar
• A grammar is said to be context-free if every
rule has a single non-terminal on the left-hand
side
• This means you can apply the rule in any
context. More complicated languages (such as
English) have context-dependent rules. A
language generated from a context-free
grammar is called a context-free language
Regular grammar
• A grammar is said to be right-linear if all
productions are of the form AxB or Ax,
where A and B are variables and x is a string of
terminals
• A grammar is said to be left-linear if all
productions are of the form ABx or Ax
• A regular grammar is either right-linear or leftlinear.
Linear grammar
• A grammar can be linear without being rightor left-linear.
• A linear grammar is a grammar in which at
most one variable can occur on the right side of
any production rule, without any restriction on
the position of the variable.
• Example:
S  aS | A
A Ab | λ
Another formalism for regular languages
• Every regular grammar generates a regular
language, and every regular language can be
generated by a regular grammar.
• A regular grammar is a simpler, specialcase of a context-free grammar
• The regular languages are a proper subset of
the context-free languages
Exercises
• Find a regular grammar that generates the
language on  = {a,b} consisting of all
strings with no more than three a’s
Exercises
• Find a regular grammar that generates the
language on  = {a,b} consisting of all
strings with no more than three a’s
S  bS | aA | λ
A  bA | aB | λ
B  bB | aC | λ
C  bC | λ
Exercises
• Find a regular grammar that generates the
language consisting of even-length strings
over {a,b}
Exercises
• Find a regular grammar that generates the
language consisting of even-length strings
over {a,b}
S  aaS | abS | baS | bbS | λ
Exercise
What language is generated by the following
context-free (but not regular) grammar?
S  aSa | bSb | a | b | λ
Exercise
What language is generated by the following
context-free grammar?
S  aSa | bSb | a | b | λ
The odd/even palindrome language:
L = {w(a+b+λ)wR}
Exercise
• Given a grammar, you should be able to say
what language it generates
• Use set notation to define the language
generated by the following grammars
1)
S  aaSB | λ
B  bB | b
2)
S  aSbb | A
A  cA | c
Exercise
S  aaSB | λ
B  bB | b
It helps to list some of the strings that can be formed:
S  aaSB  aaB  aab
S  aaSB  aaB  aabB  aabb
S  aaSB  aaB  aabB  aabbB  aabbb
S  aaSB  aaB  aabB  aabbB  aabbbB  aabbbb
S  aaSB  aaaaSBB  aaaaBB  aaaaBb  aaaabb
S  aaSB  aaaaSBB  aaaaBB  aaaaBbB  aaaaBbb 
aaaabbb
What is the pattern?
L = {(aa)nbnb*}
Exercise
• Given a language, you should be able give a
grammar that generates it.
• For example, give a regular (right-linear)
grammar for the language consisting of all
strings over {a, b, c} that begin with a,
contain exactly two b’s, and end with cc.
Exercise
• Give a regular (right-linear) grammar for
the language consisting of all strings over
{a, b, c} that begin with a, contain exactly
two b’s, and end with cc
S  aA
A  bB | aA | cA
B  bC | aB | cB
C  aC | cC | cD
Dc
Theorem 3.3
• Every language generated by a rightlinear grammar is regular.
• Proof:
– Specify a procedure for automatically
constructing an NFA that mimics the
derivations of a right-linear grammar.
Theorem 3.3
• Justification:
The sentential forms produced by a right linear
grammar have exactly one variable, which occurs
as the rightmost symbol.
Assume that our grammar has a production rule
D  dE
and that, during the derivation of a string, there is a
step wcD  wcdE
We can construct an NFA which has states D and E,
and an arc labeled d from D to E.
NFAs can be converted to DFAs.
All languages accepted by DFAs are regular.
Theorem 3.3
Construction:
For each variable Vi in the grammar there will be a state
in the automaton labeled Vi.
The initial state of the automaton will be labeled V0 and
will correspond to the S variable in the grammar.
For each production rule Vi  a1a2…amVj the
automaton will have transitions such that
δ*(Vi  a1a2…am) = Vj
For each production rule Vi  a1a2…am the automaton
will have transitions such that
δ*(Vi  a1a2…am) = Vfinal
Theorem 3.3
Construct an NFA that accepts the language generated
by the grammar:
S  aA
convert to: V0  aV1
A abS | b
V1  abV0 | b
a
V0
b
V1
a
b
Vf
Theorem 3.4
• Every regular language can be generated
by a right-linear grammar.
• Proof:
– Generate a DFA for the language.
– Specify a procedure for automatically
constructing a right-linear grammar from
the DFA.
Theorem 3.4
• Given a regular language L, let M = (Q, , δ,
q0, F) be a DFA that accepts L. Let Q = {q0,
q1, …, qn} and  = {a1, a2, …, am}.
• Construct the grammar G = (V, T, S, P) with:
V = {q0, q1, …, qn}
T = {a1, a2, …, am}
S = q0.
P = {} initially.
• P, the set of production rules, is constructed
as follows:
Theorem 3.4
• For each transition
δ(qi, aj) = qk
in the transition table of M, add to P the
production:
qi  ajqk
• If qk is in F, then add to P the production:
qk  λ
Theorem 3.4
• Example: Construct a right-linear grammar for
the language L = L(aab*a)
• First, build an NFA for L:
q0
a
q1
a
q2
b
a
qf
Theorem 3.4
q0
a
q1
a
q2
a
b
P = {} initially.
Add to P a rule for each transition in the NFA:
q0  aq1
q1  aq2
q2  bq2
q2  aqf
Since qf is in F, add to P the production:
qf  λ
qf
Theorem 3.4
q0
a
q1
a
q2
a
qf
b
Now P =
{q0  aq1
q1  aq2
q2  bq2
q2  aqf
qf  λ }
You can convert to normal grammar notation:
S  aA
A  aB
B  bB
B  aC
Cλ
Theorem 3.5
A language L is regular if and only if there exists a
left-linear grammar G such that L = L(G).
Proof:
The strategy here is a little tricky. We describe an
algorithm to construct a right-linear grammar that
generates the reverse of all the strings generated by the
left-linear grammar.
Theorem 3.5
Given any left-linear grammar we can construct from it an
right-linear grammar G’ by replacing productions of the
form:
A  Bv
with A  vRB
and
Av
with A  vR
Since L(G’) is generated by a right-linear grammar, it is
regular.
It can be demonstrated that L(G) = (L(G’))R.
It can be proven that the reverse of any regular language is
also regular (see exercise 12, section 2.3 in the Linz text).
Hence, L is regular.
Theorem 3.6
A language L is regular if and only if there exists a
regular grammar G such that L = L(G).
Proof:
Combine our definition of regular grammars,
which includes the statement, “A regular grammar is
either right-linear or left-linear”, with theorems 3.4 and
3.5
3 ways of specifying regular languages
Regular expressions
describe
DFA
NFA
accept
Regular languages
generate
Regular grammars