Transcript Lect7-Cup

Lecture 7
CUP: An LALR Parser
Generator for Java
1
1. Introduction and Example

CUP:
» a system for generating LALR parsers from
simple specifications.
» like yacc, but
» written in Java, uses specifications including
embedded Java code, and produces parser
code in Java.
2
Grammar File Format
in summary




preamble
» declarations to specify how the parser is to be
generated, and supply parts of the runtime
code.
» package_decl import_decl
» user_code parser_code init_code scan_code
Terminal and Nonterminal declarations (name
and type)
precedence and associativity of terminals
grammar rules
3
An example CUP Grammar
(1. preamble)
/* CUP specification for a simple expression
evaluator (no actions) */
import java_cup.runtime.*;
// Preliminaries to set up and use the scanner.
init with {: scanner.init(); :};
scan with {: return scanner.next_token(); :};
4
2. Declaration of terminals and
nonterminals
/* Terminals (tokens returned by the scanner). */
terminal PLUS, MINUS, TIMES, DIVIDE, MOD;
terminal UMINUS, LPAREN, RPAREN; SEMI,
terminal Integer NUMBER;
/* Non terminals */
non terminal expr_list, expr_part;
nonterminal Integer expr, term, factor;
// no type ==> no value associated with the symbol
5
3. Precedences and association
/* Precedences */
precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE, MOD;
precedence left UMINUS;
// Rules:
1. Terminals appearing at the same line has the
same precedence.
2.
A < B iff the line A appears is above the line
that B occurs
3. possible associativity: left, right and nonassoc.
6
4. The production rules
/* The grammar without actions*/
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= expr SEMI;
expr ::= expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| expr MOD expr
| MINUS expr %prec UMINUS
| LPAREN expr RPAREN
| NUMBER ;
7
How to use CUP



prepare a grammar file (say, parser.cup)
invoke:
» java java_cup.Main <parser.cup or
» java java_cup.Main parser.cup
Then two files produced:
» sym.java // contains constant decls, one for
each terminal (and nonterminal); used by
scanner to refer to terminals.
» parser.java // implement the parser;
containing two classes.
8
Grammar rules with action codes
/* The grammar */
expr_list ::= expr_list expr_part
|
expr_part;
expr_part ::= expr:e
{:
System.out.println("= " + e);
SEMI
;
:}
9
Grammar rules with action codes
expr
::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}
| expr:e1 MINUS expr:e2
{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}
| expr:e1 TIMES expr:e2
{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}
| expr:e1 DIVIDE expr:e2
{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}
| expr:e1 MOD expr:e2
{: RESULT = new Integer(e1.intValue() % e2.intValue()); :}
| NUMBER:n
{: RESULT = n; :}
| MINUS expr:e {: RESULT =
new Integer(0 - e.intValue()); :} %prec UMINUS
| LPAREN expr:e RPAREN {: RESULT = e; :}
10
;
Variables available on the action code
RESULT : bound to the value of lhs node
 D : name assigned to each symbol on the rhs.
» Dleft, Dright : int variables for left and right
position of the phrase in the input stream.
 Example:
expr ::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(
e1.intValue() + e2.intValue()); :}
// here e1left and e1rigth are both usable.

11
Relationship for related CUP classes
MyScanner
sym
$CUP$Parser$action
Scanner
Symbol
myParser
lr_parser
myPackage
(generated by CUP)
java_cup.runtime
12
Typical client code
/* create a parsing object */
parser parser_obj = new parser(new MyScanner(
new FileReader(“myFileName”));
/* open input files, etc. here */
Symbol parse_tree = null;
try { if (do_debug_parse)
parse_tree = parser_obj.debug_parse();
else parse_tree = parser_obj.parse();
} catch (Exception e) {
/* do cleanup here - - possibly rethrow e */ }
finally { /* do close out here */ }
13
The java_cup.runtime.Symbol class

public class Symbol {
» public Symbol(int id, int l, int r, Object o)
» public Symbol(int id, Object o)
» public Symbol(int id, int l, int r) …
public int sym; // kind of Symbol
public int parse_state; // used while staying in stack
boolean used_by_parser = false;
public int left, right; // left right position in input stream
public Object value; // filed for storing semantic value.
public String toString();
}
14
The scanner interface
package java_cup.runtime;
public interface Scanner {
/** Return the next token, or <code>null</code>
on end-of-file. */
public Symbol next_token() throws
java.lang.Exception;
}
15
java_cup.runtime.lr_parser
public abstract class lr_parser {
public lr_parser() { }
public lr_parser(Scanner s)
{ this(); setScanner(s)
}
private Scanner _scanner;
public void setScanner(Scanner s) { _scanner = s; }
public Scanner getScanner() { return _scanner; }
public void user_init() throws java.lang.Exception {}
// { %initWithCode } in subclass
16
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token();
return (sym!=null) ? sym : new Symbol(EOF_sym());
}
public void report_fatal_error(String message, Object
info) throws java.lang.Exception …
public Symbol parse() throws java.lang.Exception
17
actual code of parse()
public Symbol parse() throws java.lang.Exception {
/* the current action code */
int act;
// the Symbol/stack element returned by a reduce
Symbol lhs_sym = null;
/* information about production being reduced with */
short handle_size, lhs_sym_num;
/* set up direct reference to tables to drive the parser */
production_tab = production_table();
action_tab = action_table();
reduce_tab = reduce_table();
18
/* initialize the action encapsulation object */
init_actions(); // { $generatedByCUP }
/* do user initialization */
user_init(); // { $initWithCode }
/* get the first token */
cur_token = scan();
/* push dummy Symbol with start state to get us underway */
stack.removeAllElements();
stack.push(new Symbol(0, start_state()));
tos = 0;
19
/* continue until we are told to stop */
for (_done_parsing = false; !_done_parsing ; ) {
/* Check current token for freshness. */
if (cur_token.used_by_parser) throw new Error( … );
/* current state is always on the top of the stack */
/* look up action out of the current state with the current
input */
act = get_action(((Symbol)stack.peek()).parse_state,
cur_token.sym);
20
/* decode the action -- > 0 encodes shift */
if (act > 0) { /* shift */
cur_token.parse_state = act-1;
cur_token.used_by_parser = true;
stack.push(cur_token); tos++;
/* advance to the next Symbol */
cur_token = scan();
}
/* if its less than zero, then it encodes a reduce action */
else if (act < 0) { /* reduce, first arg is action number */
21
else if (act < 0) { /* reduce, first arg is action number */
lhs_sym = do_action((-act)-1, this, stack, tos);
/* look up information about the production */
lhs_sym_num = production_tab[(-act)-1][0];
handle_size = production_tab[(-act)-1][1];
/* pop the handle off the stack */
for (int i = 0; i < handle_size; i++)
{ stack.pop(); tos--;
}
/* look up the state to go to from the one popped back to */
act = get_reduce(((Symbol)stack.peek()).parse_state,
lhs_sym_num);
/* shift to that state */
lhs_sym.parse_state = act;lhs_sym.used_by_parser = true;
stack.push(lhs_sym); tos++; }
22
/* finally if the entry is zero, we have an error */
else if (act == 0) {
/* call user syntax error reporting routine */
syntax_error(cur_token);
/* try to error recover, arg is debug or not */
if (!error_recovery(false)) {
/* if that fails give up with a fatal syntax error */
unrecovered_syntax_error(cur_token);
/* just in case that wasn't fatal enough, end parse */
done_parsing(); }
else { // continue to parse
lhs_sym = (Symbol)stack.peek();
}
} }
return lhs_sym;
}
23
Where the user codes are put in the generated class
public class parser
extends lr_parser{
$parserCode
public Symbol scan()
{ $scanWithCode }
problic user_init() {
$initWithCode }
…}
class CUP$Parser$action {
$actionCode
…
Symbol CUP$parser$do_action
{ … $actionCodeInRules …}
class lr_parser {
public Symbol parse() {
…
user_init() ;
…
// getFirstToken()
… scan() …
}
}
24
Usage of various user code
components




action code : // action code {: … :}
» declaration of additional variables used in grammar
rules, e.g., symbol table,..
» helper methods used in rules
parser code // parser with {: … :}
» for customizing the generated parser
» e.g., new error_report methods, scan(), …
user init code // init with {: … :}
» initialize your scanner, tables, data structures required
for semantic actions
scan code // scan with {: :}
» tell how to get a token from your scanner
25
Running CUP








-package name // default is the empty package
-parser name // name.java is the generated file
// default is parser.
-symbols name // default is sym
-interface
» Outputs the symbol constant code as an interface
rather than as a class.
-nonterms // include also noterminal constant symbols
-expect number // of conflicts to abort parsing
-compact_red // enable table compaction for reduction
-nowarn // warnings suppressed
26











-nosummary
-progress
-dump_grammar
-dump_states
-dump_tables
-dump
-time
-debug
-nopositions // => elefr, eright cause error
-noscanner // obsolete
-version
27
Scanner Interface
1. Your scanner must implement the Scanner interface:
package java_cup.runtime;
public interface Scanner {
public Symbol next_token() throws java.lang.Exception; }
2. lr_parser use scan() to fetch next token :
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token();
return (sym!=null) ? sym : new Symbol(EOF_sym()); }

You can override scan() in your parser code (
$scanWithCode )
28
3. The generated parser also contains a constructor :
public parser(Scanner)
which calls setScanner() with it. In most cases, the
init with and scan with directives may be omitted.
 You can simply create the parser with a reference
to the desired scanner:
parser parser_obj = new parser(new my_scanner());

or set the scanner after the parser is created:
parser parser_obj = new parser();
parser_obj.setScanner(new my_scanner());
29
4. Integration with JLex
Put the following three lines in the directives region.
%implements java_cup.runtime.Scanner
%function next_token
%type java_cup.runtime.Symbol
 or simply:
%cup
 Invoking the parser with the JLex scanner is then simply:
parser parser_obj = new parser(
new Yylex( some_InputStream_or_Reader));
 Note CUP handles the JLex/JFlex convention of returning
null on EOF without a problem, so an %eofval directive is
not required in the JLex specification (added in CUP
0.10k).

30
Syntax Error Handling


Programs may contain errors at different levels:
» Lexical, such as misspelling an identifier, keyword, or
operator.
» Syntactic, such as an arithmetic expression with
unbalanced parentheses.
» Semantic, such as an operator applied to an
incompatible operand.
» Logical, such as an infinitely recursive call.
Goal of the error handler in a parser:
» Report the presence of errors clearly and accurately.
» Recover from each error quickly enough to be able to
detect subsequent errors.
» Do not significantly slow down the processing of
correct programs.
31
Error recovery strategies


Panic mode recovery.
» On discovering an error, discard input symbols one at
a time until one of a designated set of synchronizing
tokens (normally delimiters such as semicolon or
keyword end) is found.
» Simple but may skip a considerable amount of input.
Phrase level recovery.
» Perform a local correction on the remaining input:
replace a prefix of the remaining input by some string
that allows the parse to continue.
» For example, replace a comma by a semicolon, delete
an extra semicolon, or insert a missing semicolon.
» First used in the top town parsing.
32
Error Recovery Strategies


Error productions.
» Use the language grammar augmented by
some error production rules to detect common
errors. These extra rules generate erroneous
constructs.
Global correction.
» Still perform a local correction but with global
knowledge leading to smallest replacements.
33
Error recovery in LL(1) parsers



Use panic mode recovery.
When an error occurs looking for nonterminal A,
scan until an element of FOLLOW(A) is found,
then pop A and continue.
If we can't match a terminal on the top of stack:
» 1. pop the terminal
» 2. print a message saying the terminal was
inserted
» 3. continue the parse
34
Error recovery in shift-reduce
parsers


The problem
» encounter an invalid token
» bad pieces of tree hanging from stack
» incorrect entries in symbol table (type error… )
We want to parse the rest of the file
» Restarting the parser
» find a restartable state on the stack
» move to a consistent place in the input
» print an informative message (line number)
35
Error Recovery in CUP and YACC


Supports a special terminal symbol: error.
» need not be declared.
» can matches an erroneous input sequence.
» error show synchronization point
When an error is discovered
» pops the stack until error transition is legal
» skips input tokens until it matches a certain
number (3) of tokens
» error productions can have actions
36
Error recovery in yacc
stmtList ::= stmt SEMI
;
| stmtList stmt SEMI
can be augmented with error
stmtList ::= stmt SEMI
| stmtList stmt SEMI
error SEMI ;
 This should
» throw out the erroneous statement
» synchronize at SEMI.
» invoke syntax_error()
37
Example production 2:
stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ...
| error SEMI ;
 If error occurs
» 1. syntax_error() reported, and
» 2. [recovery:] throw out the erroneous expr (or
while_statement or if_statement) statement
» 3. synchronize at SEMI.
» An error is considered to be recovered iff a sufficient
number of tokens past the error symbol can be
successfully parsed. (= error_sync_size() method of
the parser and defaults to 3).
 Other natural places for errors
» missing parentheses or brackets
» extra operator or missing operator

38
Error Recovery in CUP
S  error . SEMI
error+
SEMI
S  . expr SEMI
S . while_stmt SEMI
S . if_stmt SEMI
S . error SEMI
S  error SEMI .
action(s,a1) is undefined
Stack
s
X
pop() until error
transition legal
input
a1 a2 a3 ; …
match error
39