No Slide Title

Download Report

Transcript No Slide Title

YACC no more
Integrating parsers, interpreters and
compilers into your application
Sriram Srinivasan (“Ram”)
Session # 2221
This is he
• Sriram Srinivasan
• One of the core engineers of the WebLogic
app server
– Wrote the first commercially available EJB
implementation
– Wrote the TP engine in the WLS
• Author: “Advanced Perl Programming”
(O’reilly)
Beginning
2
Session # 2221
Why this talk?
• Quest for higher level programming patterns
– More productive / faster / maintainable etc…
• Integrating compilers, parsers, interpreters
into your application
Beginning
3
Session # 2221
Embeddable Parsers
Case Study: Configuration Data
• JDK parsers for configuration data
– java.util.Properties, XML, regex library
• java.util.Properties
– Limited to “property = value” format
– Takes care of comments, multi-line values, quotes
#app server properties
connectionPoolName = testPool
numThreads = 10
…
p = new Properties().load(inputStream)
Middle
4
Session # 2221
XML parsers
• Good for structured, hierarchical data
• DOM (Document Object Model) parser
– Converts an entire XML document into a
corresponding tree of Nodes.
• SAX (Simple API for XML)
– Callback class extends DefaultHandler
– Supplies methods for startDocument(…),
startElement(…), endElement(…) etc.
Middle
5
Session # 2221
Adding code to data
• Problem: We want to add add macros and
expressions to our properties.
numThreads = numProcessors
# Ensure that connection pool is smaller than
# thread pool.
connectionPoolSize = min(numThreads – 2, 1)
• This requires an expression evaluator
Middle
6
Session # 2221
Embeddable interpreters
• Plethora of free, high quality interpreters
available
–
–
–
–
BeanShell (Java-like syntax)
Rhino (JavaScript)
Jython (Python in Java)
Kawa (Scheme in Java)
• When embedded, flow of control easily goes
from java to interpreter to back.
• Command-line shell always included
Middle
7
Session # 2221
BeanShell
• Expressions identical to java
• Types are inferred dynamically
add( a, b ) {
return a + b;
}
sum = add(1, 2);
// 3
str = add("Web", "Logic"); // "WebLogic"
Middle
8
Session # 2221
Embedding BeanShell
import bsh.Interpreter;
Interpreter i = new Interpreter();
i.set("foo", 5);
i.eval("bar = foo*10");
System.out.println("bar = "+ i.get("bar"));
• Instead of writing code to parse the properties file,
just eval it!
– Comments should be “// … ”, not “# …
– Each property definition line should end in “;”
i.eval(new FileReader("config.properties"));
Integer n = i.get("connectionPoolSize");
Middle
9
Session # 2221
BeanShell features
• Strict java expression syntax
– no class declarations
• Loose convenience syntax
b = new java.awt.Button();
b.label = "Yo" // eqvt. to b.setLabel("Yo")
h = new Hashtable();
h{"spud"} = "potato";
// Swing stuff
b = new JButton("My Button");
f = new JFrame("My Frame");
f.getContentPane().add(b, "Center");
f.pack();
f.show();
Middle
10
Session # 2221
Rhino
• Free ECMAScript interpreter from Mozilla
• Slightly more cumbersome to embed than
BeanShell
• Contains bytecode compiler that can be
called from within java
• Closures
• Regex support built-in. Good for text
manipulation
Middle
11
Session # 2221
Case study: Command pattern
• Undo/Redo in an editor
function insertCommand(text) {
this.pos = buf.pos
buf.insert(text)
this.len = text.length
this.undo = function () {
buf.moveTo(this.pos)
buf.erase(this.len);
}
undoStack.push(this);
}
new insertCommand("foo")
undoStack.pop().undo()
Middle
12
Session # 2221
Python
• Python (Java implementation is "Jython")
–
–
–
–
–
Middle
13
powerful high-level language
Compiles to bytecode.
True scripting language
Can extend java classes
Static compilation and standalone execution
Session # 2221
More case studies
• Embedded expressions
– Spreadsheet formulae
• Customizable GUIs
– Macro facility, keyboard mapping
• Remote agents
• Monitoring
• Performance through partial evaluation
Middle
14
Session # 2221
Case Study: Remote Agents
• Example: Test Agents
• Can upload script to each agent to launch
processes, control them locally.
– Jython is well-suited for this kind of task
• Example: Scriptable IMAP mail server
– "All messages that contain this regex, make
a copy in this folder"
Middle
15
Session # 2221
Case Study: Monitoring
• SNMP model: Obtain attributes from each
node over the network, do calculation
• Alternatively, upload script to each node,
and let it return the result
– Conserves network bandwidth
• Can insert any kind of probe
• Study application data structures
• Application-specific profiling
Middle
16
Session # 2221
Case Study: Performance
• Partial evaluation can yield substantial
performance benefits
• Object - RDBMS adaptors
– Code generator studies class and db
schema
– Omits unnecessary conversions, null checks
• Vector dot product
dp = a[0]*b[0] + a[1]*b[1] + a[2]*b[2];
// But if 'a' is fixed {16,0,4} …
dp = b[0] << 4 + b[2] << 2
Middle
17
Session # 2221
Generating java
• Moving from embedded interpreters to
generating java source
– Example: JSP.
• Convert template to java, compile and
dynamically load
• BEA/WebLogic's weblogic.dtdc
– Converts XML DTD to a high performance
SAX parser tuned to that DTD
Middle
18
Session # 2221
Generating code with Doclets
• javadoc is a general purpose parser
javadoc –doclet ListClass foo.java
• ListClass.start() called with a hierarchy of *Doc
nodes
import com.sun.javadoc.*;
public class ListClass {
public static boolean start(RootDoc root) {
ClassDoc[] classes = root.classes();
for (int i = 0; i < classes.length; ++i) {
System.out.println(classes[i]);
}
return true;
}
• Arbitrary tags can be introduced at any level
Middle
19
Session # 2221
Case study: iContract
• Pattern: doclet expressions converted to
annotated java code
/**
* Ensure that argument is always > 0
* @pre f >= 0.0
*
* Ensure that the function produces the sqrt
* within a
* @post Math.abs((return * return) - f) < 0.001
*/
public float sqrt(float f) { ... }
Middle
20
Session # 2221
Case Study: EJBGen
/**
* @ejbgen:entity
* ejb-name = AccountEJB-OneToMany
* data-source-name = demoPool
* table-name = Accounts
*/
abstract public class AccountBean
implements EntityBean {
/**
* @ejbgen:cmp-field column = acct_id
* @ejbgen:primkey-field
* @ejbgen:remote-method transaction-attribute
= Required
*/
abstract public String getAccountId();
Middle
21
Session # 2221
Generating bytecode
• Example: WebLogic RMI adaptors
• Sometimes, some facilities are available
only in bytecode (goto's!)
• Example: fast string matching
– Given a search string, encode the state
machine into bytecode
– Worth it if the same pattern is going to be
used many times
• Virus scanners
• Searching genome sequences
Middle
22
Session # 2221
Example: String matching
• Problem: match "10100"
– Convert to a state machine
– Each state encodes a succesful prefix match
0
1
0
S0
1
S1
1
0
S2
1
Middle
23
Session # 2221
1
S3
0
S4
0
S5
String matching (contd.)
• If only goto were allowed in java …
try { //buf is the buffer to be searched
int i = -1;
s0: i++; if (buf[i] != '1') goto s0;
s1: i++; if (buf[i] != '0') goto s1;
s2: i++; if (buf[i] != '1') goto s0;
s3: i++; if (buf[i] != '0') goto s1;
s4: i++; if (buf[i] != '0') goto s3;
s5: i++; return i-5;
} catch (ArrayIndexOutOfBoundsException e) {
return -1;
}
• But, goto's are allowed in bytecode!
Middle
24
Session # 2221
String matching (contd.)
• Using an assembler like jasmin
iconst_m1
istore_1
S0:
;; i++; if a[i] != '1' goto S0;
iinc 1 1
; i++
aload_0
; load a[i]
iload_1
caload
bipush 49
; load '1'
if_icmpne S0 ; if .. goto S0
S1:
;;
i++; if a[i] != '0' goto S1
iinc 1 1
aload_0
iload_1
caload
bipush 48
if_icmpne S1
Middle
25
Session # 2221
Custom languages
• Craft a language that fits the context you are
working in
– Avoid XML ugliness: SRML (Simple Rule Markup)
– Instead of "if s.purchaseAmount > 100 … "
<simpleCondition
className="ShoppingCart"
objectVariable="s">
<binaryExp operator="gt">
<field name="purchaseAmount"/>
<constant type="float" value="100"/>
</binaryExp>
</simpleCondition>
Middle
26
Session # 2221
Antlr Introduction
• Antlr: A recursive descent parser with configurable
lookahead (LL(k) parser)
• Much, much simpler than lex/yacc
– Yacc error messages are cryptic, tough for non-CS
types to understand
– Even generated code easy to understand
• Includes tree building and recognition
– No such facility in yacc
• Lexer, parser and tree recognizer phase have
similar syntax
Middle
27
Session # 2221
Antlr
• Example: hierarchical property list
– A list consists of name value pairs
– Names are identifiers, values are numbers or lists
( a 200
b (c 10
d 20)
)
Middle
28
Session # 2221
Antlr (contd.)
class LispLexer extends Lexer;
ID : ('a' .. 'z')+;
NUM: ('0' .. '9')+;
LP : '(';
RP : ')';
class LispParser extends Parser;
list
: LP (nameValuePair)+ RP;
nameValuePair : ID value ;
value
Middle
29
Session # 2221
: NUM | list;
Antlr (contd.)
• Adding code, arguments, return values
nameValuePair returns [NVP ret=null]
{Object v;}
: t:ID v=value
{ret = new NVP(t.getText(),v);}
;
value returns [Object ret=null]
: t:NUM
{ret=t.getText();}
| ret=list
;
Middle
30
Session # 2221
Way out there …
• Configurable hardware
– New circuits on the fly
• Intentional programming
– Code not represented as a stream of characters
Middle
31
Session # 2221
Summary
• Run-time evaluation gives you a lot of power
• Other languages add features (e.g. closures) to
java
• Lots of simple, free, quality parsers, interpreters
• Produce custom java source or byte code for
performance
• Roll your own domain-specific language with
ANTLR or javacc.
• Yacc No More.
End
32
Session # 2221
References
• Doclets
– Doclet tools: www.doclet.com
– EJBGen: www.beust.com, Cedric Beust
– Icontract: www.reliable-systems.com, Reto Kramer
• Languages, interpreters
–
–
–
–
–
Beanshell: www.beanshell.org
Rhino: www.mozilla.org/rhino
Python: www.python.org, www.jython.org
ANTLR: www.antlr.org
More … flp.cs.tu-berlin.de/~tolk/vmlanguages.html
• SRML: xml.coverpages.org/srml.html
End
33
Session # 2221
References (contd.)
• Bytecode manipulation:
– Jasmin: mrl.nyu.edu/~meyer/jasmin/
– Jikes Bytecode toolkit:
www.alphaworks.ibm.com/tech/jikesbt
– BCEL: bcel.sourceforge.net
• "Rapid" - Reconfigurable hardware
– www.cs.washington.edu/research
• "The death of computer languages, the birth of
intentional programming", Charles Simonyi
– research.microsoft.com/scripts/pubs/trpub.asp
– Microsoft tech report MSR-TR-95-52
• Thinking in Patterns with Java, Bruce Eckel
– www.mindview.net/Books/TIPatterns
End
34
Session # 2221