Transcript Document

Course Overview
PART I: overview material
1
2
3
Introduction
Language processors (tombstone diagrams, bootstrapping)
Architecture of a compiler
PART II: inside a compiler
4
5
6
7
Syntax analysis
Contextual analysis
Runtime organization
Code generation
PART III: conclusion
8
9
Interpretation
Review
Syntax Analysis (Chapter 4)
1
Nullable, First sets (starter sets), and Follow sets
• A non-terminal is nullable if it derives the empty string
• First(N) or starters(N) is the set of all terminals that can
begin a sentence derived from N
• Follow(N) is the set of terminals that can follow N in
some sentential form
Next we will see algorithms to compute each of these.
Syntax Analysis (Chapter 4)
2
Algorithm for computing Nullable
For each terminal t
Nullable(t) = false
For each non-terminal N
Nullable(N) = is there a production N ::=  ?
Repeat
For each production N ::= x1 x2 x3 … xn
If Nullable(xi) for all of xi then set Nullable(N) to true
Until nothing new becomes Nullable
Syntax Analysis (Chapter 4)
3
Generalizing the definition of Nullable
Define Nullable(x1 x2 x3 … xn) as:
if n==0 then
true
else if !Nullable(x1) then
false
else
Nullable(x2 x3 … xn)
Syntax Analysis (Chapter 4)
4
Algorithm for computing First sets
For each terminal t
First(t) = { t }
For each non-terminal N
First(N) = { }
Repeat
For each production N ::= x1 x2 x3 … xn
First(N) = First(N)  First(x1)
For each i from 2 through n
If Nullable(x1 … xi-1), then
First(N) = First(N)  First(xi)
Until no First set changes
Syntax Analysis (Chapter 4)
5
Generalizing the definition of First sets
Define First(x1 x2 x3 … xn) as:
if !Nullable(x1) then
First(x1)
else
First(x1)  First(x2 x3 … xn)
Note: some textbooks add  (empty string) to First(N)
whenever N is nullable, so that First(N) is never { }
(empty set)
Syntax Analysis (Chapter 4)
6
Algorithm for computing Follow sets
Follow(S) = {$}
// the end-of-file symbol
For each non-terminal N other than S
Follow(N) = { }
Repeat
For each production N ::= x1 x2 x3 … xn
For each i from 1 through n-1
if xi is a non-terminal then
Follow(xi) = Follow(xi)  First(xi+1 … xn)
For each i from n downto 1
if xi is a non-terminal and Nullable(xi+1 … xn) then
Follow(xi) = Follow(xi)  Follow(N)
Until no Follow set changes
Syntax Analysis (Chapter 4)
7
Example of computing Nullable, First, Follow
S ::= TUVW | WVUT
T ::= aT | e
U ::= Ub | f
V ::= cV | 
W ::= Wd | 
S
Nullable?
false
First
{a, e, d, c, f}
Follow
{$}
T
false
{a, e}
{f, $}
U
false
{f}
{c, d, $, a, e, b}
V
true
{c} or {c, }
{d, $, f}
W
true
{d} or {d, }
{$, c, f, d}
Syntax Analysis (Chapter 4)
8
Parsing
We will now look at parsing.
Topics:
– Some terminology
– Different types of parsing strategies
• bottom up
• top down
– Recursive descent parsing
• What is it
• How to implement a parser given an EBNF specification
Syntax Analysis (Chapter 4)
9
Parsing: Some Terminology
• Recognition
To answer the question “does the input conform to the syntax of
the language”
• Parsing
Recognition + also determine structure of program (for example
by creating an AST data structure)
• Unambiguous grammar:
A grammar is unambiguous if there is only at most one way to
parse any input. (i.e. for syntactically correct program there is
precisely one parse tree)
Syntax Analysis (Chapter 4)
10
Different kinds of Parsing Algorithms
• Two big groups of algorithms can be distinguished:
– bottom up strategies
– top down strategies
• Example: parsing of “Micro-English”
Sentence
Subject
Object
Noun
Verb
::=
::=
::=
::=
::=
The cat sees the rat.
The rat sees me.
I like a cat.
Syntax Analysis (Chapter 4)
Subject Verb Object .
I | A Noun | The Noun
me | a Noun | the Noun
cat | bat | rat
like | is | see | sees
The rat like me.
I see the rat.
I sees a rat.
11
Bottom up parsing
The parse tree “grows” from the bottom (leafs) up to the top (root).
Sentence
Subject
The
Object
Noun
Verb
cat
sees
Syntax Analysis (Chapter 4)
Noun
a
rat
.
12
Top-down parsing
The parse tree is constructed starting at the top (root).
Sentence
Subject
Verb
Object
Noun
The
cat
Syntax Analysis (Chapter 4)
.
Noun
sees
a
rat
.
13
Quick review
• Syntactic analysis
– Lexical analysis
• Group letters into words (or group characters into tokens)
• Use regular expressions and deterministic FSM’s
– Grammar transformations
• Left-factoring
• Left-recursion removal
• Substitution
– Parsing = structural analysis of program
• Group words into sentences, paragraphs, and documents
(or tokens into expressions, commands, and programs)
• Top-Down and Bottom-Up
Syntax Analysis (Chapter 4)
14
Recursive Descent Parsing
• Recursive descent parsing is a straightforward top-down
parsing algorithm.
• We will now look at how to develop a recursive descent
parser from an EBNF specification.
• Idea: the parse tree structure corresponds to the recursive
calling structure of parsing functions that call each other.
Syntax Analysis (Chapter 4)
15
Recursive Descent Parsing
Sentence
Subject
Object
Noun
Verb
::=
::=
::=
::=
::=
Subject Verb Object .
I | A Noun | The Noun
me | a Noun | the Noun
cat | bat | rat
like | is | see | sees
Define a procedure parseN for each non-terminal N
private
private
private
private
private
void
void
void
void
void
Syntax Analysis (Chapter 4)
parseSentence( ) ;
parseSubject( );
parseObject( );
parseNoun( );
parseVerb( );
16
Recursive Descent Parsing
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
//Auxiliary methods will go here
...
//Parsing methods will go here
...
}
Syntax Analysis (Chapter 4)
17
Recursive Descent Parsing: Auxiliary Methods
public class MicroEnglishParser {
private TerminalSymbol currentTerminal;
private void accept (TerminalSymbol expected) {
if (currentTerminal matches expected)
currentTerminal = next input terminal ;
else
report a syntax error
}
...
}
Syntax Analysis (Chapter 4)
18
Recursive Descent Parsing: Parsing Methods
Sentence
::= Subject Verb Object .
private void parseSentence( ) {
parseSubject( );
parseVerb( );
parseObject( );
accept(‘.’);
}
Syntax Analysis (Chapter 4)
19
Recursive Descent Parsing: Parsing Methods
Subject
::= I | A Noun | The Noun
private void parseSubject( ) {
if (currentTerminal matches ‘I’)
accept(‘I’);
else if (currentTerminal matches ‘A’) {
accept(‘A’);
parseNoun( );
}
else if (currentTerminal matches ‘The’) {
accept(‘The’);
parseNoun( );
}
else
report a syntax error
}
Syntax Analysis (Chapter 4)
20
Recursive Descent Parsing: Parsing Methods
Noun
::= cat | bat | rat
private void parseNoun( ) {
if (currentTerminal matches ‘cat’)
accept(‘cat’);
else if (currentTerminal matches ‘bat’)
accept(‘bat’);
else if (currentTerminal matches ‘rat’)
accept(‘rat’);
else
report a syntax error
}
Syntax Analysis (Chapter 4)
21
Recursive Descent Parsing: Parsing Methods
Object
Verb
::= me | a Noun | the Noun
::= like | is | see | sees
private void parseObject( ) {
?
}
private void parseVerb( ) {
?
}
Test yourself: Can you complete parseObject( ) and parseVerb( ) ?
Syntax Analysis (Chapter 4)
22
Systematic Development of Rec. Descent Parser
(1) Express grammar in EBNF
(2) Grammar Transformations:
Left factorization and Left recursion elimination
(3) Create a parser class with
– private variable currentToken
– methods to call the scanner: accept and acceptIt
(4) Implement a public method for main function to call:
– public parse method that
• fetches the first token from the scanner
• calls parseS (where S is start symbol of the grammar)
• verifies that scanner next produces the end–of–file token
(5) Implement private parsing methods:
– add private parseN method for each non terminal N
Syntax Analysis (Chapter 4)
23