Parsing Quantum Chemistry Output files(Sudhakar_not_presented

Download Report

Transcript Parsing Quantum Chemistry Output files(Sudhakar_not_presented

Parsing Quantum Chemistry
Output files
Using Jflex and CUP
Sudhakar Pamidighantam
NCSA, University of Illinois
JFlex
• Jflex from http://jflex.de
This is a lexical analyzer generator for java
Lexical analysis is the process of taking an input string of
characters (such as the Our Quantum Chemistry Output
from an Application such as Gaussian and producing a
sequence of symbols called "lexical tokens", or just
"tokens", which may be handled more easily by a parser.
Tokens are symbols derived from regular expressions
which are used by a parser for further action.
Typical Tokens
from
Gaussian Output Strings
String
Token (Symbol)
“Number of steps in this run”
“Step number”
"NUMERICALLY ESTIMATING GRADIENTS ITERATION“
"CCSD(T)="
"SCF Done: E(RHF) ="
“Maximum Force”
“RMS Force”
Found Iter
NSearch
NSearch
Energy
Energy
MaxGrad
RmsGrad
A Lexer Inputfile
Taken from examples
http://jflex.de/manual.html#ExampleUserCode
•
Java Specifics
•
•
•
import java_cup.runtime.*;
•
Options and Declarations
•
•
•
•
•
•
/*
•
%unicode ----
•
%cup
•
%%
The name of the class JFlex will create will be Lexer.
Will write the code to the file GoptfreqLexer.java.
*/
%class GoptfreqLexer
%public
defines the set of characters the scanner will work on. For scanning text
files, %unicode should always be used.
switches to CUP compatibility mode to interface with a CUP generated parser
%cupdebug Creates a main function in the generated class that expects the name of an
input file on the command line and then runs the scanner on this input file. Prints line,
column, matched text, and CUP symbol name for each returned token to standard
out.
Continued
Flex Input Lexical state
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
%state ITER
%state INTVALUE
%state FLOATVALUE
%state ITER2
%state ITER3
%state FLOAT1
%state FLOAT2
%state IGNOREALL
%state INPUT
%state INPUTA
%state INPUTB
%state INPUTC
%state INPUTD
%state INPUTE
%state INPUTF
State is a lexical state and is identified by a name and it controls how matches can
happen/not happen
Jflex File Structures
•
The code included in %{...%} is copied verbatim into the generated lexer class source. Here you can
declare member variables and functions that are used inside scanner actions.
•
•
/* Macro Declarations These declarations are regular expressions that will be used latter
in the Lexical Rules Section. */
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace = {LineTerminator} | [ \t\f]
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}
TraditionalComment = "/*" [^*] ~"*/"
EndOfLineComment = "//" {InputCharacter}* {LineTerminator}
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent = ( [^*] | \*+ [^/*] )*
/* adjust syntax font-coloring */
Identifier = [:jletter:] [:jletterdigit:]*
dec_int_lit = 0 | [1-9][0-9]*
dec_int_id = [A-Za-z_][A-Za-z_0-9]*
DIGIT
= [0-9]
FLOAT
= [+|-]?{DIGIT}+"."{DIGIT}+
INT
= [+|-]?{DIGIT}+
BOOL
= [T|F]
EQ
= "="
STRING
= [A-Z]+
GRAB
= [^(" "|\r|\n|\r\n| \t\f)]+
•
%%
Jflex File
•
•
•
•
•
•
•
•
•
/* ------------------------Lexical
•
the start state YYINITIAL. */
Rules Section---------------------- */
/*
This section contains regular expressions and actions, i.e. Java
code, that will be executed when the scanner matches the associated
regular expression. */
/* YYINITIAL is the state at which the lexer begins scanning. So
these regular expressions will only be matched if the scanner is in
•
• <YYINITIAL> {
Jflex Symbol Generation
•
/* Return the token STPT declared in the class sym that was found.
*/
•
"-- Stationary point found"
{ return symbol(Goptfreqsym.STPT); }
•
“Standard orientation:”
{ return symbol(Goptfreqsym.GEOM;}
•
/* Print the token found that was declared in the class sym and
then return it. */
•
“Standard orientation" { System.out.print(" + "); return
symbol(Goptfreqsym.GEOM); }
Scanner Methods and Fields
available in Action
•
•
•
void yybegin (int lexicalState) /* enters the lexical state lexicalState */
String yytext() /* returns the matched input text region */
….. (see http://www.jflex.de/manual.html for more methods )
<YYINITIAL>{
"Stationary point found" { yybegin(ITER); return new Symbol(FinalCoordSym.FOUNDITER); }
<ITER>{
"X
Y
Z" { yybegin(INPUTF);
"THE_END_OF_FILE" {
"Standard orientation:" {
.|\n {}
}
return new Symbol(FinalCoordSym.INPUT1); }
yybegin(IGNOREALL);
return new Symbol(FinalCoordSym.SCFDONE); }
yybegin(IGNOREALL);
return new Symbol(FinalCoordSym.SCFDONE); }
Parsing for Geometry
•
•
<INPUTF> { "---------------------------------------------------------------------"
{ yybegin (INPUT); return new Symbol(FinalCoordSym.DASH1); } }
•
<INPUT> { {INT} { yybegin (INPUTA);
•
•
return new Symbol(FinalCoordSym.INPUT2, new Integer(yytext())); }
"---------------------------------------------------------------------"
{ yybegin (ITER); return new Symbol(FinalCoordSym.DASH2); } }
•
<INPUTA> { {INT} { yybegin (INPUTB); return new Symbol(FinalCoordSym.INPUT3, new Integer(yytext())); } }
•
<INPUTB> { {INT} { yybegin (INPUTC); return new Symbol(FinalCoordSym.INPUT4, new Integer(yytext())); } }
•
<INPUTC> { {FLOAT} {yybegin (INPUTD); return new Symbol(FinalCoordSym.INPUT5, new Float(yytext())); } }
•
<INPUTD> { {FLOAT} { yybegin (INPUTE); return new Symbol(FinalCoordSym.INPUT6, new Float(yytext())); }}
•
<INPUTE> { {FLOAT} { yybegin (INPUT); return new Symbol(FinalCoordSym.INPUT7, new Float(yytext())); } }
•
•
<IGNOREALL>{ .|\n {} }
.|\n {}
•
•
•
•
•
•
•
•
•
•
Standard orientation:
--------------------------------------------------------------------Center Atomic Atomic
Coordinates (Angstroms)
Number Number
Type
X
Y
Z
--------------------------------------------------------------------1
7
0
.000000 .111062 .000000
2
1
0
-.931526 -.259200 .000000
3
1
0
.465763 -.259119 .806530
4
1
0
.465763 -.259119 -.806530
---------------------------------------------------------------------
CUP Parser Generator
• CUP is a perser generator for Java
• It generates Look Ahead Left to Right
parser from simple specifications
• This is similar to YACC
• These tools are used to construct
relationships from basic structures for
compilers ( and natural languages)
Cup File Structure
• 4 Main parts
Part 1.
preliminary and miscellaneous declarations
Imported Code ( classes)
Initialization
Invoking Scanner
Getting Tokens ( lexical tokens)
Cup File Structure
• Part 2
Declares Terminals and Non Terminals
Associate Object classes with above
Terminals are of type Notype or Integer
Terminals are symbols with Association to
Strings ( Non terminals)
Cup File structure
• Part 3
Specification of Precedence and
Associativity of Terminals
• Part 4
Grammar
Cup Usage
If the specification is in a file parser.cup then
java java_cup.Main < parser.cup
Would result in two java source files
Sym.java
sym class contains a series of constant declarations, one for each
terminal symbol. This is typically used by the scanner to refer to
symbols (e.g. with code such as "return new Symbol(sym.SEMI);" ).
Parser.java
The parser class implements the parser itself.
CUP File Structure
Note
• To calculate and print values of each expression,
we must embed Java code within the parser to
carry out actions at various points.
• In CUP, actions are contained in code strings
which are surrounded by delimiters of the form {:
and :} In general, the system records all
characters within the delimiters, but does not try
to check that it contains valid Java code.
Example finalcoord.cup
Part1. Preliminaries/ Initialization
•
•
•
•
import java_cup.runtime.*;
import javax.swing.*;
import java.util.*;
import java.io.*;
/* comment code
•
Standard orientation:
•
--------------------------------------------------------------------•
Center Atomic Atomic
Coordinates (Angstroms)
•
Number Number
Type
X
Y
Z
•
--------------------------------------------------------------------•
1
7
0
.000000 .111062 .000000
•
2
1
0
-.931526 -.259200 .000000
•
3
1
0
.465763 -.259119 .806530
•
4
1
0
.465763 -.259119 -.806530
•
--------------------------------------------------------------------•
•
•
•
•
•
•
•
•
•
OUTPUT FORMAT:____________________________________________________________
1NSERCH= 0
more text
SCF Done: E(RHF) = -7.85284496695 A.U. after 8 cycles
more text
Maximum Force
0.000000 0.000450 YES
RMS Force
0.000000 0.000300 YES
more text
TO MONITOR:____________________________________________________________
iteration, energy
•
•
•
MANUALLY ADD TO CUP-GENERATED CLASS IN SCFaParser.java:________________
//add to CUP$SCFaParser$actions
public ParseSCF2 parseSCF;
•
•
*/
//add to the constructor of CUP$SCFaParser$actions
parseSCF = new ParseSCF2();
Example finalcoord.cup
Part1 continued…
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
action code {:
//__________________________________
public static boolean DEBUG = true;
private static JTable table;
private static final String tableLabel = "SCF Intermediate Results:";
// private static String cycle = "0";
public static JTable getTable() {
return table;
}
public static String getTableLabel() {
return tableLabel;
}
// }
:}
Example finalcoord.cup
Part 2 Terminal and Non terminal Declarations
• terminal
INPUT1, FOUNDITER, SCFDONE, DASH1,
DASH2;
• terminal Integer INPUT2, INPUT3, INPUT4, ITERATION;
• terminal Float ENERGY, INPUT5, INPUT6, INPUT7;
• non terminal startpt, scfintro, scfpat, scfcycle, cycle,
grad1, grad2;
• non terminal inp2, inp3, inp5, inp6, inp7, cycle1, cycle2,
cycle3;
Example finalcord.cup
Part 3 Precedence and associativity
This is optional and is important for
ambiguous grammers
This is not sued as our parsing is straight
forward
Example finalcoord.cup
Part 4 Grammar
// Start with a non-terminal ( symbol/string) ::= action, terminal,
nonterminal, precedence and a ; at the end
// Java code is inbetween {: … :}
// productions separated by |
• startpt ::= scfintro scfpat SCFDONE ;
• scfintro ::= FOUNDITER {: if (DEBUG)
System.out.println("CUP:Input: found the start of Iteration"); :};
•
scfpat ::= scfpat scfcycle {: if (DEBUG)
System.out.println("CUP:Input: in scfpat"); :} | scfcycle ;
•
scfcycle ::= INPUT1 DASH1 cycle1 DASH2;
•
cycle1 ::= cycle1 cycle2 | cycle2 ;
•
cycle2 ::= inp2 inp3 INPUT4 inp5 inp6 inp7 ;
Example finalcoord.cup
Grammar Continued
•
•
•
inp2 ::= INPUT2:in2
{: //___________________________________________________________________
if (DEBUG) System.out.println("CUP:Input: center number "+in2); :} ;
•
•
•
inp3 ::= INPUT3:in3
{: //___________________________________________________________________
if (DEBUG) System.out.println("CUP:Input: atomic number "+in3); :} ;
•
•
•
inp5 ::= INPUT5:in5
{: //___________________________________________________________________
if (DEBUG) System.out.println("CUP:Input: x coordinate "+in5); :} ;
•
•
•
inp6 ::= INPUT6:in6
{: //___________________________________________________________________
if (DEBUG) System.out.println("CUP:Input: y coordinate "+in6); :} ;
•
•
•
inp7 ::= INPUT7:in7
{: //___________________________________________________________________
if (DEBUG) System.out.println("CUP:Input: z coordinate "+in7); :} ;
CUP Customization
java java_cup.Main options < finalcoord.cup
• Options
-package GridChem
-sym FinalCoordSym.java
-parser FinalCoordParser.java
More options at
http://www.cs.princeton.edu/~appel/modern/java/CUP/manual.html#about
Bottom_up Parser Architecture
buffer
Of states
visited
State
Actio n
Go
To