A, S - Xiaoyin Wang
Download
Report
Transcript A, S - Xiaoyin Wang
Formal Languages and
Grammars
Xiaoyin Wang
CS 5363
Spring 2016
1
Last Class
Compilers: Introduction
– Why Compilers?
– Input and Output
– Structure of Compilers
– Compiler Design
2
Today’s Class
Formal Languages
What are languages?
What are grammars?
Phrase-Structure Grammars
Classification of Grammars (and corresponding
languages)
3
3
Intro to Languages
English grammar tells us if a given
combination of words is a valid sentence.
The syntax of a sentence concerns its
form while the semantics concerns its
meaning, e.g.
the mouse wrote a poem
From a syntax point of view this is a valid
sentence.
4
Intro to Languages
From a semantics point of view not so
fast…perhaps in Disney land
Natural languages (English, French, etc)
have very complex rules of syntax and not
necessarily well-defined.
Long time no see.
Here you are.
5
Formal Language
Formal language
well-defined
finite set of rules
The key problem:
How to express infinite sentences with finite
set of rules?
The answer is recursion
Example: Natural Numbers {1, 2, 3, …}
N {1} {x 1 | x N }
6
Grammars
A formal grammar G is compact, precise
mathematical definition of a language L.
–
–
–
–
Not a list of examples
Finite
Fixed
Typically use recursion to resolve infinity
7
Grammars
A grammar implies an algorithm that would
generate all legal sentences of the language.
– Often, it takes the form of a set of recursive
definitions.
– An example of recursive definition:
<expression> -> <expression> + 1
<expression> -> 1
{1, 1+1, 1+1+1, 1+1+1+1, …}
8
What is Grammar?
Answer two key questions:
Is a combination of words a valid sentence in a
formal language?
How can we generate the valid sentences of a
formal language?
Grammars provide models for both natural
languages and programming languages.
9
Grammars
• Example: A grammar that generates a
subset of the English language
sentence noun _ phrase
noun _ phrase article
predicate
noun
predicate verb
10
article a
article the
noun boy
noun dog
verb runs
verb sleeps
11
A derivation of “the boy sleeps”:
sentence noun _ phrase
predicate
noun _ phrase
verb
article
verb
the noun
noun
verb
the boy verb
the boy sleeps
12
A derivation of “a dog runs”:
sentence noun _ phrase
predicate
noun _ phrase
verb
article
noun
verb
a noun
verb
a dog verb
a dog runs
13
Language of the grammar
L = { “a boy runs”,
“a boy sleeps”,
“the boy runs”,
“the boy sleeps”,
“a dog runs”,
“a dog sleeps”,
“the dog runs”,
“the dog sleeps” }
Problems with
the Grammar?
14
Notation
noun boy
noun dog
Variable
or
Non-terminal
Production
rule
Terminal
Symbols of
the vocabulary
Symbols of
the vocabulary
15
Basic Terminology
A vocabulary / alphabet, V is a finite nonempty set of elements called symbols.
Example: V = {a, b, c, A, B, C, S}
A word/sentence over V is a string of
finite length of elements of V.
Example: Aba
The empty/null string, ε is the string with
no symbols.
16
Basic Terminology
V* is the set of all words over V.
Example: V* = {Aba, BBa, bAA, cab …}
A language over V is a subset of V*.
We can give some criteria for a word to be in
a language.
17
Phrase-Structure Grammars
A popular way to specify a grammar
recursively
A phrase-structure grammar (abbr. PSG)
G = (V,T,S,P) is a 4-tuple, in which:
– V is a vocabulary (set of symbols)
The “template vocabulary” of the language.
– T V is a set of symbols called terminals
Actual symbols of the language.
Also, N :≡ V − T is a set of special “symbols” called nonterminals. (Representing concepts like “noun”)
18
Phrase-Structure Grammars
– SN is a special non-terminal, the start symbol.
in our example the start symbol was “sentence”.
– P is a set of productions (to be defined).
Rules for substituting one sentence fragment for another
Every production rule must contain at least one nonterminal on its left side.
19
Derivation
Let G=(V,T,S,P)
grammar
be a phrase-structure
Let w0=lz0r (the concatenation of l, z0, and
r) and w1=lz1r be strings over V
If z0 z1 is a production of G we say that
w1 is directly derivable from w0 and we
write wo => w1
22
Derivation
• If w0, w1, …., wn are strings over V such that
w0 =>w1,w1=>w2,…, wn-1 => wn, then we say
that wn is derivable from w0, and write
w0=>*wn.
• The sequence of steps used to obtain wn from
wo is called a derivation.
23
Example
Let G = (V, T, S, P)
V = {a, b, A, S}
T = {a, b}
S is a start symbol
P = {S → aA, A → aa, A → b}
Process:
L(G)T = {S}
L(G)T = {S, aA}
L(G)T = {S, aA, aaa, ab}
L(G) = {aaa, ab}
27
Language
• Let G(V,T,S,P) be a phrase-structure grammar. The
language generated by G (or the language of G)
denoted by L(G) , is the set of all strings of terminals
that are derivable from the starting state S.
• L(G)= {w
T* | S =>*w}
28
28
Languages of PSG
EXAMPLE:
Let G = (V, T, S, P),
where V = {a, b, A, B, S}
T = {a, b},
S is a start symbol
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}.
G is a Phrase-Structure Grammar.
What sentences can be generated
with this grammar?
29
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}.
Language of the grammar:
S => Ab => ab
S => Ab => aSb => aAbb => aabb
…
S => aB => aSb => aaBb => aabb
…
L = {anbn, n>=1}
How to prove?
Mathematical Induction
30
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}
L1 = {anbn, n>=1}
(1) Proving L1 L(P)
N 1 : ab L( P)
N x : a x b x L( P )
S a xb x
S aB, B Sb
S a x 1b x 1
31
Languages of PSG
P = {S → Ab | aB, A → aS, B → Sb, A → a, B → b}
L1 = {anbn, n>=1}
(2) Proving L(P) L1
Using derivation steps as N
N 2 : L( P) 2 {ab} L1
N x : w S x w, w L1
Within x+2 steps, enumerating all productions,
we can only derive:
awb, w belongs to L1, so awb also belongs to L1
32
Another PSG Example – English
Fragment
We have G = (V, T, S, P), where:
V = {(sentence), (noun phrase), (verb phrase),
(article), (adjective), (noun), (verb), (adverb), a,
the, large, hungry, rabbit, mathematician, eats, hops,
quickly, wildly}
T = {a, the, large, hungry, rabbit, mathematician,
eats, hops, quickly, wildly}
S = (sentence)
P = (see next slide)
33
Productions for our Language
P = { (sentence) → (noun phrase) (verb phrase)
(noun phrase) → (article) (adjective) (noun)
(noun phrase) → (article) (noun)
(verb phrase) → (verb) (adverb)
(verb phrase) → (verb)
(article) → a
(article) → the
(adjective) → large
(adjective) → hungry
(noun) → rabbit
(noun) → mathematician
(verb) → eats
(verb) → hops
(adverb) → quickly
(adverb) → wildly }
34
A Sample Sentence Derivation
(sentence)
(noun phrase)
(verb phrase)
(article) (adj.) (noun) (verb phrase)
(art.) (adj.) (noun) (verb) (adverb)
the
(adj.) (noun) (verb) (adverb)
the large (noun) (verb) (adverb)
the large
rabbit (verb) (adverb)
the
large rabbit hops (adverb)
the
large rabbit hops quickly
On each step,
we apply a
production to a
fragment of the
previous sentence
template to get a
new sentence
template. Finally,
we end up with a
sequence of
terminals (real
words), that is, a
sentence of our
language L.
35
Another Example
V
T
• Let G = ({a, b, A, B, S}, {a, b}, S,
P
• {S → ABa, A → BB, B → ab, AB → b}).
• One possible derivation in this grammar is:
• S ABa Aaba BBaba Bababa abababa.
36
Defining the PSG Types
Type 0: Phase-structure grammars
restrictions on the production rules
–
no
than
the
Type 1: Context-Sensitive PSG:
– All after fragments are either longer
corresponding before fragments, or empty:
if b → a, then |b| < |a|
a= ε.
37
Defining the PSG Types
Type 2: Context-Free PSG:
– All before fragments have length 1 and are nonterminals:
if b → a, then |b| = 1 (b N).
Type 3: Regular PSGs:
– All before fragments have length 1 and nonterminals
– All after fragments are either single terminals, or a
pair of a terminal followed by a nonterminal.
if b → a, then a T a TN.
38
Types of Grammars Chomsky hierarchy of languages
• Venn Diagram of Grammar Types:
Type 0 – Phrase-structure Grammars
Type 1 –
Context-Sensitive
Type 2 –
Context-Free
Type 3 –
Regular
39
Examples: Regular Grammars
Only the last symbol at the right hand side can be
non-terminal
E.g. P = {S → aB, B → bB, B → a}
Generates the language ab*a
The restriction can be either on the right or left
E.g., P = {S → Ba, B → Bb, B → a}
Also called linear grammars
Why regular language is not suitable for programming
language?
40
Examples: Context Free Grammars
Only one non-terminal on the left
E.g. P = {S → aSa, S → b}
Generates the language anban, n>=1
Widely used as programming languages
Example of non-context-free language?
{anbncn, n > 1}
41
Examples: Context Sensitive
Grammars
Multiple non-terminal on the left
P = {S → aSBc | abc, cB → Bc, bB → bb}
Generates the language anbncn, n>=1
Exemplar Process:
S => aSBc => aabcBc => aabBcc => aabbcc
How to generate anbncndn, n>=1?
P = {S → aBSCd | abcd, Ba → aB, dC →Cd, cC → cc, Bb → bb }
42
Examples: Context Sensitive
Grammars
For the English Grammar
An apple and A book
<noun_phrase> → <article> <noun>
<noun> → <vowel_noun> | <other_noun>
<article> <vowel_noun> → an <vowel_noun>
<article> <other_noun> → a <vowel_noun>
43
Unrestrictive grammars
No restrictions on the productions
Belonging-ship can be undecidable
Language acceptable by Turing machine
Cannot be easily described in
mathematical formula
simple
44
Property of Grammars: Regular
Language
Closure Properties
Intersection
Union
Complement
Concatenation
Kleene Star
Decidable Properties
Membership
Emptiness
Containment
45
Property of Grammars: Context
Free Language
Closure Properties
Union
Concatenation
Kleene Star
Decidable Properties
Membership (cubic to sentence length)
Emptiness (linear to # productions)
46
Property of Grammars: Context
Sensitive Language
Closure Properties
Union
Intersection
Complement
Concatenation
Kleene Star
Decidable Properties
Membership (P-Space complete)
47
Today’s Class
Formal Languages
What are languages?
What are grammars?
Phrase-Structure Grammars
Classification of Grammars (and corresponding
languages)
48
48
Next Class
Context Free Grammar
Grammar Norms
CYK Algorithm
Derivation Tree and Ambiguity
49
49