Polyglot An Extensible Compiler Framework for Java

Download Report

Transcript Polyglot An Extensible Compiler Framework for Java

Polyglot
An Extensible Compiler Framework for Java
Nathaniel Nystrom
Michael R. Clarkson
Andrew C. Myers
Cornell University
Language extension
• Language designers often create extensions
to existing languages
• e.g., C++, PolyJ, GJ, Pizza, AspectJ, Jif,
ArchJava, ESCJava, Polyphonic C#, ...
• Want to reuse existing compiler
infrastructure as much as possible
• Polyglot is a framework for writing compiler
extensions for Java
2
Requirements
• Language extension
• Modify both syntax and semantics of the base
language
• Not necessarily backward compatible
• Goals:
• Easy to build and maintain extensions
• Extensibility should be scalable
• No code duplication
• Compilers for language extensions should be
open to further extension
3
Rejected approaches
• In-place modification
base
compiler
1.0
copy &
modify
bug fixes &
upgrades
base
compiler
2.0
extension
compiler
1.0
bug fixes &
upgrades (again)
copy &
modify
(again)
extension
compiler
2.0
• Macro languages
• Limited to syntax extensions
• Semantic checks after macro expansion
4
Polyglot
• Base compiler is a complete Java front end
• 25K lines of Java
• Name resolution, inner class support, type
checking, exception checking, uninitialized
variable analysis, unreachable code analysis, ...
• Can reuse and extend through inheritance
5
Scalable extensibility
Changes to the compiler should be
proportional to changes in the language.
• Most compiler passes are sparse:
AST Nodes
+
Passes
if
x
e.f
=
name resolution
type checking
exception checking
constant folding
6
Non-scalable approaches
Easy to add or modify
Using
Passes
AST nodes
Visitors


pass as AST
node method
(“naive OO”)


Polyglot


7
Polyglot architecture
Base Polyglot compiler
Java
source
Java
parser
AST rewriting
passes
Code
generator
Java
target
Ext
source
Ext
parser
AST rewriting
passes
Code
generator
Java
target
Ext2
Ext2
parser
AST rewriting
passes
Code
generator
Java
target
source
8
Architecture details
• Parser written using PPG
• Adds grammar inheritance to Java CUP
• AST nodes constructed using a node factory
• Decouples node types from implementation
• AST rewriting passes:
• Each pass lazily creates a new AST
• From naive OO: traverse AST invoking a method
at each node
• From visitors: AST traversal factored out
9
Example: PAO
• Primitive types as subclasses of Object
• Changes type system, relaxes Java syntax
• Implementation: insert boxing and unboxing
code where needed
HashMap m;
m.put(“two”, 2);
int v = (int) m.get(“two”);
HashMap m;
m.put(“two”, new Integer(2));
int v = ((Integer) m.get(“two”)).intValue();
10
PAO implementation
• Modify parser and type-checking pass to
permit e instanceof int
• Parser changes with PPG:
include “java.cup”
drop { rel_expr ::= rel_expr INSTANCEOF ref_type }
extend rel_expr ::= rel_expr:a INSTANCEOF type:b
{: RESULT = node_factory.Instanceof(a, b); :}
• Add one new pass to insert boxing and
unboxing code
11
Implementing a new pass
• Want to extend Node interface with rewrite() method
• Default implementation: identity translation
• Specialized implementations: boxing and unboxing
• Mixin extensibility: extensions to a base class should be
inherited by subclasses
Node
typeCheck()
codeGen()
If
cond
then
else
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
Add
lhs
rhs
typeCheck()
codeGen()
cond
then
else
typeCheck()
codeGen()
rewrite()
lhs
rhs
typeCheck()
codeGen()
rewrite()
12
Inheritance is inadequate
Node
typeCheck()
codeGen()
If
Add
cond
then
else
typeCheck()
codeGen()
lhs
rhs
typeCheck()
codeGen()
PaoIf
cond
then
else
typeCheck()
codeGen()
rewrite()
PaoNode
typeCheck()
codeGen()
rewrite()
PaoAdd
lhs
rhs
typeCheck()
codeGen()
rewrite()
13
Inheritance is inadequate
Node
typeCheck()
codeGen()
typeCheck()
codeGen()
PaoNode
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
codeGen()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
codeGen()
rewrite()
rewrite()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
codeGen()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
codeGen()
rewrite()
rewrite()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
codeGen()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
codeGen()
rewrite()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
codeGen()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
codeGen()
rewrite()
rewrite()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
typeCheck()
codeGen()
codeGen()
typeCheck()
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
typeCheck()
codeGen()
rewrite()
codeGen()
rewrite()
rewrite() 14
Extension objects
Use composition to mixin methods and fields into
AST node classes
Node
ext
typeCheck()
codeGen()
If
ext
cond
then
else
typeCheck()
codeGen()
PaoExt
ext
rewrite()
null
Add
ext
lhs
rhs
typeCheck()
codeGen()
PAO extension objects;
installed into all nodes
by node factory
15
Extension objects
Extension objects have their own ext field to leave
extension open
Node
ext
typeCheck()
codeGen()
If
ext
cond
then
else
typeCheck()
codeGen()
PaoExt
ext
rewrite()
Add
ext
lhs
rhs
typeCheck()
codeGen()
ext
typeCheck()
ext_type_info
null
16
Method invocation
• A method may be implemented in the node or in
any one of several extension objects.
• Extension should call node.ext.ext.typeCheck()
• Base compiler should call: node.typeCheck()
• Cannot hardcode the calls
Node
ext
typeCheck()
codeGen()
PaoExt
ext
rewrite()
ext
typeCheck()
ext_type_info
null
17
Delegate objects
• Each node & extension object has a del field
• Delegate object implements same interface as node or ext
• Directs call to appropriate method implementation
• Ex: node.del.typeCheck()
• Ex: node.ext.del.rewrite()
• Run-time overhead < 2%
JavaDel
typeCheck() { node.ext.ext.typeCheck() }
codeGen() { node.codeGen() }
Node
del
ext
typeCheck()
codeGen()
PaoExt
del
ext
rewrite()
del
ext
typeCheck()
ext_type_info
null
18
Scalable extensibility
• To add a new pass:
• Use an extension object to mixin default
implementation of the pass for the Node base class
• Use extension objects to mixin specialized
implementations as needed
• To change the implementation of an existing pass
• Use delegate object to redirect to method providing
new implementation
• To create an AST node type:
• Create a new subclass of Node
• Or, mixin new fields to existing node using an extension
object
19
Polyglot family tree
Polyglot base (Java)
PAO
parameterized
types
Coffer
PolyJ
JMatch
covariant
return
Jif
Jif/split
20
Results
• Can build small extensions in hours or days
• 10% of base code is interfaces and factories
Extension
# Tokens % of Base
Polyglot base (Java)
166K
100
Jif
129K
78
JMatch
108K
65
Jif/split
99K
60
PolyJ
79K
48
Coffer
24K
14
PAO
6.1K
3.6
parameterized types
3.2K
2
covariant return
1.6K
1
javac 1.1
132K
80
21
Related work
• Other extensible compilers
• e.g., CoSy, SUIF
• e.g., JastAdd, JaCo
• Macros
• e.g., EPP, Java Syntax Extender, Jakarta
• e.g., Maya
• Visitors
• e.g., staggered visitors, extensible visitors
22
Conclusions
• Several Java extensions have been
implemented with Polyglot
• Programmer effort scales well with size of
difference with Java
• Extension objects and delegate objects
provide scalable extensibility
• Download from:
http://www.cs.cornell.edu/projects/polyglot
23
Acknowledgments
Brandon Bray
JMatch
Michael Brukman
PPG
Steve Chong
Jif, Jif/split, covariant return
Matt Harren
JMatch
Aleksey Kliger
JLtools, PolyJ
Jed Liu
JMatch
Naveen Sastry
JLtools
Dan Spoonhower
JLtools
Steve Zdancewic
Jif, Jif/split
Lantian Zheng
Jif, Jif/split
http://www.cs.cornell.edu/projects/polyglot
24
Questions?
Mixin extensibility
Inheritance does not provide mixin extensibility:
when a base class is extended, subclasses should
inherit the changes
Node
typeCheck()
codeGen()
If
cond
then
else
typeCheck()
codeGen()
typeCheck()
codeGen()
rewrite()
Add
lhs
rhs
typeCheck()
codeGen()
cond
then
else
typeCheck()
codeGen()
rewrite()
lhs
rhs
typeCheck()
codeGen()
rewrite()
26
Other Polyglot features
• Quasi-quoting library
• Useful for translation from extension language AST to
base language or intermediate language AST
qqStmt(“if (%e.relabelsTo(%e)) %s;
else %s;”,
new Object[] { L, Li, then_body, else_body });
• Automatic separate compilation
• Serialize type information and store in the AST
• Encoded into the class file via javac
• Extracted from class file using reflection
• Data-flow analysis framework
27
PAO rewriting
• rewrite(ts) called for each AST node:
class PaoExt extends Ext {
Node rewrite(PaoTypeSystem ts) { return node(); } }
class PaoInstanceofExt extends PaoExt {
Node rewrite(PaoTypeSystem ts) {
Instanceof e = (Instanceof) node();
Type rtype = e.compareType();
// e.g., “e instanceof int”  “e instanceof Integer”
if (rtype.isPrimitive())
return e.compareType(ts.boxedType(rtype));
else return n; } }
28
Node factories
• Each extension has a node factory (nf)
• To create a node of type T, call method
nf.T()
• T() may return an extension-specific
subclass of T
• T() attaches the extension and delegate
objects to T via a call to extT()
• Mixin extensibility: if T is a subclass of S,
then extT() will call extS()
29
Results
• Can build small extensions in hours, days
• 10% of base compiler code is interfaces and
factories
Extension
javac 1.1 (excl. bytecode
asm)
Polyglot base
Jif
JMatch
Lexe Parse Total
r
r
% of
Base
119
K
2.7K 11K 166
K
2.7K 5.4K 129
K
2.7K 14K 108
K
72
100
78
65
30
Why not output bytecode?
• Wanted to be able to read the output
• The symmetry is satisfying
• Limitations of Java as a target language
• Scoping rules sometimes make it difficult to
output Java code, especially with inner classes
• Lack of goto can make generated control flow
inefficient
31
Name
resolution
Semantic
checking
Translation
32