Grammars - Columbus State University

Download Report

Transcript Grammars - Columbus State University

Grammars
CPSC 5135
Formal Definitions
• A symbol is a character. It represents an
abstract entity that has no inherent meaning
• Examples:
a, A, 3, *, - ,=
Formal Definitions
• An alphabet is a finite set of symbols.
• Examples:
A = { a, b, c }
B = { 0, 1 }
Formal Definitions
• A string (or word) is a finite sequence of
symbols from a given alphabet.
• Examples:
S = { 0, 1 } is a alphabet
0, 1, 11010, 101, 111 are strings from S
A = { a, b, c ,d } is an alphabet
bad, cab, dab, d, aaaaa are strings from A
Formal Definitions
• A language is a set of strings from an
alphabet.
• The set can be finite or infinite.
• Examples:
A = { 0, 1}
L1 = { 00, 01, 10, 11 }
L2 = { 010, 0110, 01110,011110,
…}
Formal Definitions
• A grammar is a quadruple G = (V, Σ, R, S) where
1) V is a finite set of variables (non-terminals),
2) Σ is a finite set of terminals, disjoint from V,
3) R is a finite set of rules. The left side of each rule
is a string of one or more elements from V U Σ and
whose right side is a string of 0 or more elements
from V U Σ
4) S is an element of V and is called the start symbol
Formal Definitions
• Example grammar:
• G = (V, Σ, R, S)
V = { S, A }
Σ = { a, b }
R = { S → aA
A → bA
A→a }
Derivations
R = S → aA
A → bA
A→a
• A derivation is a sequence of replacements ,
beginning with the start symbol, and replacing a
substring matching the left side of a rule with the
string on the right side of a rule
S → aA
→ abA
→ abbA
→ abba
Derivations
• What strings can be generated from the
following grammar?
S → aBa
B → aBa
B→b
Formal Definitions
• The language generated by a grammar is
the set of all strings of terminal symbols
which are derivable from S in 0 or more steps.
• What is the language generated by this
grammar?
• S→a
S → aB
B → aB
B→a
Kleene Closure
• Let Σ be a set of strings. Σ* is called the
Kleene closure of Σ and represents the set of
all concatenations of 0 or more strings in Σ.
• Examples
Σ* = { 1 }* = { ø, 1, 11, 111, 1111, …}
Σ* = { 01 }* = { ø, 01, 0101, 010101, …}
Σ* = { 0 + 1 }* = set of all possible strings of 0’s
and 1’s. (+ means union)
Formal Definitions
• A grammar G = (V,Σ, R, S) is right-linear if all
rules are of the form:
A → xB
A→x
where A, B ε V and x ε Σ*
Right-linear Grammar
• G = { V, Σ, R, S }
V = { S, B }
Σ = { a, b }
R={
S → aS ,
S→B,
B → bB ,
}
What language is generated?
B→ε
Formal Definitions
• A grammar G = (V,Σ, R, S) is left-linear if all
rules are of the form:
A → Bx
A→x
where A, B ε V and x ε Σ*
Formal Definitions
• A regular grammar is one that is either right or left
linear.
• Let Q be a finite set and let Σ be a finite set of
symbols. Also let δ be a function from Q x Σ to
Q, let q0 be a state in Q and let A be a subset of Q.
We call each element of Q a state, δ the transition
function, q0 the initial state and A the set of
accepting states.
Then a deterministic finite automaton (DFA) is a 5tuple < Q , Σ , q0 , δ , A >
• Every regular grammar is equivalent to a DFA
Language Definition
• Recognition – a machine is constructed that
reads a string and pronounces whether the
string is in the language or not. (Compiler)
• Generation – a device is created to generate
strings that belong to the language.
(Grammar)
Chomsky Hierarchy
• Noam Chomsky (1950’s) described 4 classes
of grammars
1) Type 0 – unrestricted grammars
2) Type 1 – Context sensitive grammars
3) Type 2 – Context free grammars
4) Type 3 – Regular grammars
Grammars
• Context-free and regular grammars have
application in computing
• Context-free grammar – each rule or
production has a left side consisting of a
single non-terminal
Backus-Naur form (BNF)
• BNF was used to describe programming
language syntax and is similar to Chomsky’s
context free grammars
• A meta-language is a language used to
describe another language
• BNF is a meta-language for computer
languages
BNF
• Consists of nonterminal symbols, terminal
symbols (lexemes and tokens), and rules or
productions
• <if-stmt> → if <logical-expr> then <stmt>
• <if-stmt> → if <logical-expr> then <stmt>
else <stmt>
• <if-stmt> → if <logical-expr> then <stmt>
| if <logical-expr> then <stmt>
else <stmt>
A Small Grammar
<program>  begin <stmt_list> end
<stmt_list>  <stmt>
| <stmt> ; <stmt_list>
<stmt>  <var> = <expression>
<var>  A | B | C
<expression>  <var> + <var>
| <var> - <var>
| <var>
A Derivation
<program>  begin <stmt_list> end
 begin <stmt> end
begin <var> = <expression> end
begin A = <expression> end
begin A = <var> + <var> end
begin A = B + <var> end
begin A = B + C end
Terms
• Each of the strings in a derivation is called a
sentential form.
• If the leftmost non-terminal is always the one
selected for replacement, the derivation is a
leftmost derivation.
• Derivations can be leftmost, rightmost, or
neither
• Derivation order has no effect on the
language generated by the grammar
Derivations Yield Parse Trees
<program>  begin <stmt_list> end
 begin <stmt> end
begin <var> = <expression> end
begin A = <expression> end
begin A = <var> + <var> end
begin A = B + <var> end
begin A = B + C end
<Program>
begin
<stmt_list>
end
<stmt>
<var>
A
=
<expression>
<var> + <var>
B
C
Parse Trees
• Parse trees describe the hierarchical structure
of the sentences of the language they define.
• A grammar that generates a sentence for
which there are two or more distinct parse
trees is ambiguous.
An Ambiguous Grammar
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
Two Parse Trees – Same Sentence
<assign>
<id>
A
=
<assign>
<expr>
<expr>
<id>
B
+
<expr>
<expr> * <expr>
<id>
=
<expr>
A
<expr> *
<expr>
<expr> + <expr> <id>
<id>
<id>
<id>
<id>
C
A
B
C
A
Derivation 1
<assign>  <id> = <expr>
 A = <expr>
 A = <expr> + <expr>
 A = <id> + <expr>
 A = B + <expr>
 A = B + <expr> * <expr>
 A = B + <id> * <expr>
 A = B + C * <expr>
 A = B + C * <id>
A=B+C*A
Derivation 2
<assign>  <id> = <expr>
 A = <expr>
 A = <expr> * <expr>
 A = <expr> + <expr> * <expr>
 A = <id> + <expr> * <expr>
 A = B + <expr> * <expr>
 A = B + <id> * <expr>
 A = B + C * <expr>
 A = B + C * <id>
A=B+C*A
Ambiguity
• Parse trees are used to determine the
semantics of a sentence
• Ambiguous grammars lead to semantic
ambiguity - this is intolerable in a computer
language
• Often, ambiguity in a grammar can be
removed
Unambiguous Grammar
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <term> | <term>
<term>  <term> * <factor> | <factor>
<factor>  ( <expr> ) | <id>
• This grammar makes multiplication take precedence
over addition
Associativity of Operators
<assign>  <id> = <expr>
<id>  A | B | C
<expr>  <expr> + <term> |
<term>
<term>  <term> * <factor> |
<factor>
<factor>  ( <expr> ) | <id>
<assign>
<id>
=
A
<expr>
<expr>
+
<expr> + <term>
Addition operators associate from
left to right
<term>
<factor>
<term>
<factor>
<id>
<factor>
<id>
A
<id>
B
C
BNF
• A BNF rule that has its left hand side
appearing at the beginning of its right hand
side is left recursive .
• Left recursion specifies left associativity
• Right recursion is usually used for associating
exponetiation operators
<factor>  <exp> ** <factor> | <exp>
<exp>  ( <expr> ) | <id>
Ambiguous If Grammar
<stmt>  <if_stmt>
<if_stmt>  if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>
• Consider the sentential form:
if <logic_expr> then if <logic_expr> then <stmt> else <stmt>
Parse Trees for an If Statement
<if_stmt>
<if_stmt>
If <logic_expr> then <stmt> else <stmt>
If <logic_expr> then <stmt>
<if_stmt>
if <logic_expr> then <stmt>
<if_stmt>
if <logic_expr> then <stmt> else <stmt>
Unambiguous Grammar for If
Statements
<stmt>  <matched> | <unmatched>
<matched>  if <logic_expr> then <matched> else <matched>
| any non-if statement
<unmatched>  if <logic_expr> then <stmt>
| if <logic_expr> then <matched> else <unmatched>
Extended BNF (EBNF)
• Optional part denoted by […]
<selection>  if ( <expr> ) <stmt> [ else <stmt> ]
• Braces used to indicate the enclosed part can be repeated
indefinitely or left out
<ident_list>  <identifier> { , <identifier> }
• Multiple choice options are put in parentheses and separated
by the or operator |
<for_stmt>  for <var> := <expr> (to | downto) <expr> do <stmt>
BNF vs EBNF for Expressions
BNF:
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
EBNF:
<expr>  <term> { (+ | - ) <term> }
<term>  <factor> { ( * | / ) <factor>