A, S - Xiaoyin Wang

Download Report

Transcript A, S - Xiaoyin Wang

Formal Languages and
Grammars
Xiaoyin Wang
CS 5363
Spring 2016
1
Last Class
 Compilers: Introduction
– Why Compilers?
– Input and Output
– Structure of Compilers
– Compiler Design
2
Today’s Class
 Formal Languages
 What are languages?
 What are grammars?
 Phrase-Structure Grammars
 Classification of Grammars (and corresponding
languages)
3
3
Intro to Languages
 English grammar tells us if a given
combination of words is a valid sentence.
 The syntax of a sentence concerns its
form while the semantics concerns its
meaning, e.g.
 the mouse wrote a poem
 From a syntax point of view this is a valid
sentence.
4
Intro to Languages
 From a semantics point of view not so
fast…perhaps in Disney land
 Natural languages (English, French, etc)
have very complex rules of syntax and not
necessarily well-defined.
 Long time no see.
 Here you are.
5
Formal Language
 Formal language
 well-defined
 finite set of rules
 The key problem:
 How to express infinite sentences with finite
set of rules?
 The answer is recursion
 Example: Natural Numbers {1, 2, 3, …}
N  {1}  {x  1 | x  N }
6
Grammars
 A formal grammar G is compact, precise
mathematical definition of a language L.
–
–
–
–
Not a list of examples
Finite
Fixed
Typically use recursion to resolve infinity
7
Grammars
 A grammar implies an algorithm that would
generate all legal sentences of the language.
– Often, it takes the form of a set of recursive
definitions.
– An example of recursive definition:
 <expression> -> <expression> + 1
 <expression> -> 1
 {1, 1+1, 1+1+1, 1+1+1+1, …}
8
What is Grammar?
 Answer two key questions:
 Is a combination of words a valid sentence in a
formal language?
 How can we generate the valid sentences of a
formal language?
 Grammars provide models for both natural
languages and programming languages.
9
Grammars
• Example: A grammar that generates a
subset of the English language
sentence  noun _ phrase
noun _ phrase  article
predicate
noun
predicate  verb
10
article  a
article  the
noun  boy
noun  dog
verb  runs
verb  sleeps
11
A derivation of “the boy sleeps”:
sentence  noun _ phrase
predicate
 noun _ phrase
verb
 article
verb
 the noun
noun
verb
 the boy verb
 the boy sleeps
12
A derivation of “a dog runs”:
sentence  noun _ phrase
predicate
 noun _ phrase
verb
 article
noun
verb
 a noun
verb
 a dog verb
 a dog runs
13
Language of the grammar
L = { “a boy runs”,
“a boy sleeps”,
“the boy runs”,
“the boy sleeps”,
“a dog runs”,
“a dog sleeps”,
“the dog runs”,
“the dog sleeps” }
Problems with
the Grammar?
14
Notation
noun  boy
noun  dog
Variable
or
Non-terminal
Production
rule
Terminal
Symbols of
the vocabulary
Symbols of
the vocabulary
15
Basic Terminology
 A vocabulary / alphabet, V is a finite nonempty set of elements called symbols.
 Example: V = {a, b, c, A, B, C, S}
 A word/sentence over V is a string of
finite length of elements of V.
 Example: Aba
 The empty/null string, ε is the string with
no symbols.
16
Basic Terminology
 V* is the set of all words over V.
 Example: V* = {Aba, BBa, bAA, cab …}
 A language over V is a subset of V*.
 We can give some criteria for a word to be in
a language.
17
Phrase-Structure Grammars
 A popular way to specify a grammar
recursively
 A phrase-structure grammar (abbr. PSG)
G = (V,T,S,P) is a 4-tuple, in which:
– V is a vocabulary (set of symbols)
 The “template vocabulary” of the language.
– T V is a set of symbols called terminals
 Actual symbols of the language.
 Also, N :≡ V − T is a set of special “symbols” called nonterminals. (Representing concepts like “noun”)
18
Phrase-Structure Grammars
– SN is a special non-terminal, the start symbol.
 in our example the start symbol was “sentence”.
– P is a set of productions (to be defined).
 Rules for substituting one sentence fragment for another
 Every production rule must contain at least one nonterminal on its left side.
19
Derivation
 Let G=(V,T,S,P)
grammar
be a phrase-structure
 Let w0=lz0r (the concatenation of l, z0, and
r) and w1=lz1r be strings over V
 If z0  z1 is a production of G we say that
w1 is directly derivable from w0 and we
write wo => w1
22
Derivation
• If w0, w1, …., wn are strings over V such that
w0 =>w1,w1=>w2,…, wn-1 => wn, then we say
that wn is derivable from w0, and write
w0=>*wn.
• The sequence of steps used to obtain wn from
wo is called a derivation.
23
Example
 Let G = (V, T, S, P)




V = {a, b, A, S}
T = {a, b}
S is a start symbol
P = {S → aA, A → aa, A → b}
 Process:




L(G)T = {S}
L(G)T = {S, aA}
L(G)T = {S, aA, aaa, ab}
L(G) = {aaa, ab}
27
Language
• Let G(V,T,S,P) be a phrase-structure grammar. The
language generated by G (or the language of G)
denoted by L(G) , is the set of all strings of terminals
that are derivable from the starting state S.
• L(G)= {w
 T* | S =>*w}
28
28
Languages of PSG
 EXAMPLE:





Let G = (V, T, S, P),
where V = {a, b, A, B, S}
T = {a, b},
S is a start symbol
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}.
 G is a Phrase-Structure Grammar.
What sentences can be generated
with this grammar?
29
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}.
 Language of the grammar:
S => Ab => ab
S => Ab => aSb => aAbb => aabb
…
S => aB => aSb => aaBb => aabb
…
L = {anbn, n>=1}
 How to prove?
 Mathematical Induction
30
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}
L1 = {anbn, n>=1}
(1) Proving L1 L(P)
N  1 : ab  L( P)
N  x : a x b x  L( P )
 S  a xb x
 S  aB, B  Sb
 S  a x 1b x 1
31
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}
L1 = {anbn, n>=1}
(2) Proving L(P)  L1
Using derivation steps as N
N  2 : L( P) 2  {ab}  L1
N  x : w S  x w, w  L1
Within x+2 steps, enumerating all productions,
we can only derive:
awb, w belongs to L1, so awb also belongs to L1
32
Another PSG Example – English
Fragment
 We have G = (V, T, S, P), where:
 V = {(sentence), (noun phrase), (verb phrase),
(article), (adjective), (noun), (verb), (adverb), a,
the, large, hungry, rabbit, mathematician, eats, hops,
quickly, wildly}
 T = {a, the, large, hungry, rabbit, mathematician,
eats, hops, quickly, wildly}
 S = (sentence)
 P = (see next slide)
33
Productions for our Language
P = { (sentence) → (noun phrase) (verb phrase)
(noun phrase) → (article) (adjective) (noun)
(noun phrase) → (article) (noun)
(verb phrase) → (verb) (adverb)
(verb phrase) → (verb)
(article) → a
(article) → the
(adjective) → large
(adjective) → hungry
(noun) → rabbit
(noun) → mathematician
(verb) → eats
(verb) → hops
(adverb) → quickly
(adverb) → wildly }
34
A Sample Sentence Derivation
(sentence)
(noun phrase)
(verb phrase)
(article) (adj.) (noun) (verb phrase)
(art.) (adj.) (noun) (verb) (adverb)
the
(adj.) (noun) (verb) (adverb)
the large (noun) (verb) (adverb)
the large
rabbit (verb) (adverb)
the
large rabbit hops (adverb)
the
large rabbit hops quickly
On each step,
we apply a
production to a
fragment of the
previous sentence
template to get a
new sentence
template. Finally,
we end up with a
sequence of
terminals (real
words), that is, a
sentence of our
language L.
35
Another Example
V
T
• Let G = ({a, b, A, B, S}, {a, b}, S,
P
• {S → ABa, A → BB, B → ab, AB → b}).
• One possible derivation in this grammar is:
• S  ABa  Aaba  BBaba  Bababa  abababa.
36
Defining the PSG Types
 Type 0: Phase-structure grammars
restrictions on the production rules
–
no
than
the
 Type 1: Context-Sensitive PSG:
– All after fragments are either longer
corresponding before fragments, or empty:
if b → a, then |b| < |a|

a= ε.
37
Defining the PSG Types
 Type 2: Context-Free PSG:
– All before fragments have length 1 and are nonterminals:
if b → a, then |b| = 1 (b  N).
 Type 3: Regular PSGs:
– All before fragments have length 1 and nonterminals
– All after fragments are either single terminals, or a
pair of a terminal followed by a nonterminal.
if b → a, then a  T  a  TN.
38
Types of Grammars Chomsky hierarchy of languages
• Venn Diagram of Grammar Types:
Type 0 – Phrase-structure Grammars
Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
39
Examples: Regular Grammars
 Only the last symbol at the right hand side can be
non-terminal
 E.g. P = {S → aB, B → bB, B → a}
 Generates the language ab*a
 The restriction can be either on the right or left
 E.g., P = {S → Ba, B → Bb, B → a}
 Also called linear grammars
 Why regular language is not suitable for programming
language?
40
Examples: Context Free Grammars
 Only one non-terminal on the left
 E.g. P = {S → aSa, S → b}
 Generates the language anban, n>=1
 Widely used as programming languages
 Example of non-context-free language?
 {anbncn, n > 1}
41
Examples: Context Sensitive
Grammars
 Multiple non-terminal on the left
 P = {S → aSBc | abc, cB → Bc, bB → bb}
 Generates the language anbncn, n>=1
 Exemplar Process:
S => aSBc => aabcBc => aabBcc => aabbcc
How to generate anbncndn, n>=1?
P = {S → aBSCd | abcd, Ba → aB, dC →Cd, cC → cc, Bb → bb }
42
Examples: Context Sensitive
Grammars
 For the English Grammar





An apple and A book
<noun_phrase> → <article> <noun>
<noun> → <vowel_noun> | <other_noun>
<article> <vowel_noun> → an <vowel_noun>
<article> <other_noun> → a <vowel_noun>
43
Unrestrictive grammars
 No restrictions on the productions
 Belonging-ship can be undecidable
 Language acceptable by Turing machine
 Cannot be easily described in
mathematical formula
simple
44
Property of Grammars: Regular
Language
 Closure Properties





Intersection
Union
Complement
Concatenation
Kleene Star
 Decidable Properties
 Membership
 Emptiness
 Containment
45
Property of Grammars: Context
Free Language
Closure Properties
 Union
 Concatenation
 Kleene Star
Decidable Properties
 Membership (cubic to sentence length)
 Emptiness (linear to # productions)
46
Property of Grammars: Context
Sensitive Language
Closure Properties





Union
Intersection
Complement
Concatenation
Kleene Star
Decidable Properties
 Membership (P-Space complete)
47
Today’s Class
 Formal Languages
 What are languages?
 What are grammars?
 Phrase-Structure Grammars
 Classification of Grammars (and corresponding
languages)
48
48
Next Class
 Context Free Grammar
 Grammar Norms
 CYK Algorithm
 Derivation Tree and Ambiguity
49
49