Transcript Lect7-Cup
Lecture 7
CUP: An LALR Parser
Generator for Java
1
1. Introduction and Example
CUP:
» a system for generating LALR parsers from
simple specifications.
» like yacc, but
» written in Java, uses specifications including
embedded Java code, and produces parser
code in Java.
2
Grammar File Format
in summary
preamble
» declarations to specify how the parser is to be
generated, and supply parts of the runtime
code.
» package_decl import_decl
» user_code parser_code init_code scan_code
Terminal and Nonterminal declarations (name
and type)
precedence and associativity of terminals
grammar rules
3
An example CUP Grammar
(1. preamble)
/* CUP specification for a simple expression
evaluator (no actions) */
import java_cup.runtime.*;
// Preliminaries to set up and use the scanner.
init with {: scanner.init(); :};
scan with {: return scanner.next_token(); :};
4
2. Declaration of terminals and
nonterminals
/* Terminals (tokens returned by the scanner). */
terminal PLUS, MINUS, TIMES, DIVIDE, MOD;
terminal UMINUS, LPAREN, RPAREN; SEMI,
terminal Integer NUMBER;
/* Non terminals */
non terminal expr_list, expr_part;
nonterminal Integer expr, term, factor;
// no type ==> no value associated with the symbol
5
3. Precedences and association
/* Precedences */
precedence left PLUS, MINUS;
precedence left TIMES, DIVIDE, MOD;
precedence left UMINUS;
// Rules:
1. Terminals appearing at the same line has the
same precedence.
2.
A < B iff the line A appears is above the line
that B occurs
3. possible associativity: left, right and nonassoc.
6
4. The production rules
/* The grammar without actions*/
expr_list ::= expr_list expr_part | expr_part;
expr_part ::= expr SEMI;
expr ::= expr PLUS expr
| expr MINUS expr
| expr TIMES expr
| expr DIVIDE expr
| expr MOD expr
| MINUS expr %prec UMINUS
| LPAREN expr RPAREN
| NUMBER ;
7
How to use CUP
prepare a grammar file (say, parser.cup)
invoke:
» java java_cup.Main <parser.cup or
» java java_cup.Main parser.cup
Then two files produced:
» sym.java // contains constant decls, one for
each terminal (and nonterminal); used by
scanner to refer to terminals.
» parser.java // implement the parser;
containing two classes.
8
Grammar rules with action codes
/* The grammar */
expr_list ::= expr_list expr_part
|
expr_part;
expr_part ::= expr:e
{:
System.out.println("= " + e);
SEMI
;
:}
9
Grammar rules with action codes
expr
::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}
| expr:e1 MINUS expr:e2
{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}
| expr:e1 TIMES expr:e2
{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}
| expr:e1 DIVIDE expr:e2
{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}
| expr:e1 MOD expr:e2
{: RESULT = new Integer(e1.intValue() % e2.intValue()); :}
| NUMBER:n
{: RESULT = n; :}
| MINUS expr:e {: RESULT =
new Integer(0 - e.intValue()); :} %prec UMINUS
| LPAREN expr:e RPAREN {: RESULT = e; :}
10
;
Variables available on the action code
RESULT : bound to the value of lhs node
D : name assigned to each symbol on the rhs.
» Dleft, Dright : int variables for left and right
position of the phrase in the input stream.
Example:
expr ::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(
e1.intValue() + e2.intValue()); :}
// here e1left and e1rigth are both usable.
11
Relationship for related CUP classes
MyScanner
sym
$CUP$Parser$action
Scanner
Symbol
myParser
lr_parser
myPackage
(generated by CUP)
java_cup.runtime
12
Typical client code
/* create a parsing object */
parser parser_obj = new parser(new MyScanner(
new FileReader(“myFileName”));
/* open input files, etc. here */
Symbol parse_tree = null;
try { if (do_debug_parse)
parse_tree = parser_obj.debug_parse();
else parse_tree = parser_obj.parse();
} catch (Exception e) {
/* do cleanup here - - possibly rethrow e */ }
finally { /* do close out here */ }
13
The java_cup.runtime.Symbol class
public class Symbol {
» public Symbol(int id, int l, int r, Object o)
» public Symbol(int id, Object o)
» public Symbol(int id, int l, int r) …
public int sym; // kind of Symbol
public int parse_state; // used while staying in stack
boolean used_by_parser = false;
public int left, right; // left right position in input stream
public Object value; // filed for storing semantic value.
public String toString();
}
14
The scanner interface
package java_cup.runtime;
public interface Scanner {
/** Return the next token, or <code>null</code>
on end-of-file. */
public Symbol next_token() throws
java.lang.Exception;
}
15
java_cup.runtime.lr_parser
public abstract class lr_parser {
public lr_parser() { }
public lr_parser(Scanner s)
{ this(); setScanner(s)
}
private Scanner _scanner;
public void setScanner(Scanner s) { _scanner = s; }
public Scanner getScanner() { return _scanner; }
public void user_init() throws java.lang.Exception {}
// { %initWithCode } in subclass
16
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token();
return (sym!=null) ? sym : new Symbol(EOF_sym());
}
public void report_fatal_error(String message, Object
info) throws java.lang.Exception …
public Symbol parse() throws java.lang.Exception
17
actual code of parse()
public Symbol parse() throws java.lang.Exception {
/* the current action code */
int act;
// the Symbol/stack element returned by a reduce
Symbol lhs_sym = null;
/* information about production being reduced with */
short handle_size, lhs_sym_num;
/* set up direct reference to tables to drive the parser */
production_tab = production_table();
action_tab = action_table();
reduce_tab = reduce_table();
18
/* initialize the action encapsulation object */
init_actions(); // { $generatedByCUP }
/* do user initialization */
user_init(); // { $initWithCode }
/* get the first token */
cur_token = scan();
/* push dummy Symbol with start state to get us underway */
stack.removeAllElements();
stack.push(new Symbol(0, start_state()));
tos = 0;
19
/* continue until we are told to stop */
for (_done_parsing = false; !_done_parsing ; ) {
/* Check current token for freshness. */
if (cur_token.used_by_parser) throw new Error( … );
/* current state is always on the top of the stack */
/* look up action out of the current state with the current
input */
act = get_action(((Symbol)stack.peek()).parse_state,
cur_token.sym);
20
/* decode the action -- > 0 encodes shift */
if (act > 0) { /* shift */
cur_token.parse_state = act-1;
cur_token.used_by_parser = true;
stack.push(cur_token); tos++;
/* advance to the next Symbol */
cur_token = scan();
}
/* if its less than zero, then it encodes a reduce action */
else if (act < 0) { /* reduce, first arg is action number */
21
else if (act < 0) { /* reduce, first arg is action number */
lhs_sym = do_action((-act)-1, this, stack, tos);
/* look up information about the production */
lhs_sym_num = production_tab[(-act)-1][0];
handle_size = production_tab[(-act)-1][1];
/* pop the handle off the stack */
for (int i = 0; i < handle_size; i++)
{ stack.pop(); tos--;
}
/* look up the state to go to from the one popped back to */
act = get_reduce(((Symbol)stack.peek()).parse_state,
lhs_sym_num);
/* shift to that state */
lhs_sym.parse_state = act;lhs_sym.used_by_parser = true;
stack.push(lhs_sym); tos++; }
22
/* finally if the entry is zero, we have an error */
else if (act == 0) {
/* call user syntax error reporting routine */
syntax_error(cur_token);
/* try to error recover, arg is debug or not */
if (!error_recovery(false)) {
/* if that fails give up with a fatal syntax error */
unrecovered_syntax_error(cur_token);
/* just in case that wasn't fatal enough, end parse */
done_parsing(); }
else { // continue to parse
lhs_sym = (Symbol)stack.peek();
}
} }
return lhs_sym;
}
23
Where the user codes are put in the generated class
public class parser
extends lr_parser{
$parserCode
public Symbol scan()
{ $scanWithCode }
problic user_init() {
$initWithCode }
…}
class CUP$Parser$action {
$actionCode
…
Symbol CUP$parser$do_action
{ … $actionCodeInRules …}
class lr_parser {
public Symbol parse() {
…
user_init() ;
…
// getFirstToken()
… scan() …
}
}
24
Usage of various user code
components
action code : // action code {: … :}
» declaration of additional variables used in grammar
rules, e.g., symbol table,..
» helper methods used in rules
parser code // parser with {: … :}
» for customizing the generated parser
» e.g., new error_report methods, scan(), …
user init code // init with {: … :}
» initialize your scanner, tables, data structures required
for semantic actions
scan code // scan with {: :}
» tell how to get a token from your scanner
25
Running CUP
-package name // default is the empty package
-parser name // name.java is the generated file
// default is parser.
-symbols name // default is sym
-interface
» Outputs the symbol constant code as an interface
rather than as a class.
-nonterms // include also noterminal constant symbols
-expect number // of conflicts to abort parsing
-compact_red // enable table compaction for reduction
-nowarn // warnings suppressed
26
-nosummary
-progress
-dump_grammar
-dump_states
-dump_tables
-dump
-time
-debug
-nopositions // => elefr, eright cause error
-noscanner // obsolete
-version
27
Scanner Interface
1. Your scanner must implement the Scanner interface:
package java_cup.runtime;
public interface Scanner {
public Symbol next_token() throws java.lang.Exception; }
2. lr_parser use scan() to fetch next token :
public Symbol scan() throws java.lang.Exception {
Symbol sym = getScanner().next_token();
return (sym!=null) ? sym : new Symbol(EOF_sym()); }
You can override scan() in your parser code (
$scanWithCode )
28
3. The generated parser also contains a constructor :
public parser(Scanner)
which calls setScanner() with it. In most cases, the
init with and scan with directives may be omitted.
You can simply create the parser with a reference
to the desired scanner:
parser parser_obj = new parser(new my_scanner());
or set the scanner after the parser is created:
parser parser_obj = new parser();
parser_obj.setScanner(new my_scanner());
29
4. Integration with JLex
Put the following three lines in the directives region.
%implements java_cup.runtime.Scanner
%function next_token
%type java_cup.runtime.Symbol
or simply:
%cup
Invoking the parser with the JLex scanner is then simply:
parser parser_obj = new parser(
new Yylex( some_InputStream_or_Reader));
Note CUP handles the JLex/JFlex convention of returning
null on EOF without a problem, so an %eofval directive is
not required in the JLex specification (added in CUP
0.10k).
30
Syntax Error Handling
Programs may contain errors at different levels:
» Lexical, such as misspelling an identifier, keyword, or
operator.
» Syntactic, such as an arithmetic expression with
unbalanced parentheses.
» Semantic, such as an operator applied to an
incompatible operand.
» Logical, such as an infinitely recursive call.
Goal of the error handler in a parser:
» Report the presence of errors clearly and accurately.
» Recover from each error quickly enough to be able to
detect subsequent errors.
» Do not significantly slow down the processing of
correct programs.
31
Error recovery strategies
Panic mode recovery.
» On discovering an error, discard input symbols one at
a time until one of a designated set of synchronizing
tokens (normally delimiters such as semicolon or
keyword end) is found.
» Simple but may skip a considerable amount of input.
Phrase level recovery.
» Perform a local correction on the remaining input:
replace a prefix of the remaining input by some string
that allows the parse to continue.
» For example, replace a comma by a semicolon, delete
an extra semicolon, or insert a missing semicolon.
» First used in the top town parsing.
32
Error Recovery Strategies
Error productions.
» Use the language grammar augmented by
some error production rules to detect common
errors. These extra rules generate erroneous
constructs.
Global correction.
» Still perform a local correction but with global
knowledge leading to smallest replacements.
33
Error recovery in LL(1) parsers
Use panic mode recovery.
When an error occurs looking for nonterminal A,
scan until an element of FOLLOW(A) is found,
then pop A and continue.
If we can't match a terminal on the top of stack:
» 1. pop the terminal
» 2. print a message saying the terminal was
inserted
» 3. continue the parse
34
Error recovery in shift-reduce
parsers
The problem
» encounter an invalid token
» bad pieces of tree hanging from stack
» incorrect entries in symbol table (type error… )
We want to parse the rest of the file
» Restarting the parser
» find a restartable state on the stack
» move to a consistent place in the input
» print an informative message (line number)
35
Error Recovery in CUP and YACC
Supports a special terminal symbol: error.
» need not be declared.
» can matches an erroneous input sequence.
» error show synchronization point
When an error is discovered
» pops the stack until error transition is legal
» skips input tokens until it matches a certain
number (3) of tokens
» error productions can have actions
36
Error recovery in yacc
stmtList ::= stmt SEMI
;
| stmtList stmt SEMI
can be augmented with error
stmtList ::= stmt SEMI
| stmtList stmt SEMI
error SEMI ;
This should
» throw out the erroneous statement
» synchronize at SEMI.
» invoke syntax_error()
37
Example production 2:
stmt ::= expr SEMI | while_stmt SEMI | if_stmt SEMI | ...
| error SEMI ;
If error occurs
» 1. syntax_error() reported, and
» 2. [recovery:] throw out the erroneous expr (or
while_statement or if_statement) statement
» 3. synchronize at SEMI.
» An error is considered to be recovered iff a sufficient
number of tokens past the error symbol can be
successfully parsed. (= error_sync_size() method of
the parser and defaults to 3).
Other natural places for errors
» missing parentheses or brackets
» extra operator or missing operator
38
Error Recovery in CUP
S error . SEMI
error+
SEMI
S . expr SEMI
S . while_stmt SEMI
S . if_stmt SEMI
S . error SEMI
S error SEMI .
action(s,a1) is undefined
Stack
s
X
pop() until error
transition legal
input
a1 a2 a3 ; …
match error
39