Building Domain Languages atop Java

Download Report

Transcript Building Domain Languages atop Java

Language Oriented Programming and
Language Workbenches: Building Domain
Languages atop Java
Neal Ford
Application Architect
[email protected]
Questions, Slides, and Samples
• Please feel free to ask questions anytime
• The slides and samples will be available at
• I’ll show that address again at the end
Dynamic Languages
• Look at the way experienced developers use dynamic
languages (like Lisp or Ruby)
• Developers build meta-languages on top of the dynamic
• Sometimes they create layers of languages, each closer to the
problem domain and further from the language domain
• The Unix tradition of “little languages”
• Active Data Models
• The first on-line “build your own” ecommerce solution
created in Lisp
• Documented in Hackers and Painters by Paul Graham
• Implementing a PHP-like language in Lisp to create Lisp
HTML templates
• Created by Gene Michael Stover, Documented in Dr. Dobbs
Journal, Oct 2004
• Managed to recreate most of the functionality of PHP (with
some additions) in 200 lines of Lisp code
• Well stated in The Art of Unix Programming by Eric
• In Chapter 1, Section 1.6.14
• The Rule of Generation: Avoid hand-hacking; write programs to
write programs when you can
• The central idea is that programming languages reside
at too low a level
• This notion has been neglected in statically typed
modern languages
• But it’s still a good idea!
• Some terminology coined by Martin Fowler
• Domain Specific Language
• A limited form of computer language designed for a specific
class of problems.
• Some communities like to use DSL only for problem domain
languages, but I'm following the usage that uses DSL for any
limited domain.
• Language Oriented Programming
• The general style of development which operates about the idea
of building software around a set of domain specific languages
• Language Workbench
• A generic term for a new breed of tools that makes it easy to
perform language oriented programming
Building Domain Languages
• This session talks about building your own domain
languages in and around Java
• The motivation for building DSLs
• Internal DSLs
• Built using Java syntax
• External DSL’s
• Built using “compiler compilers”
• Language Workbenches
Domain Languages Already Underfoot
• Two examples:
• The Front Controller in a J2EE web framework
• Reading a positional text file
• We’re talking about two types of code
• The web framework code; the Reader and Strategy classes
• Abstractions to enable functionality
• The properties file; the configureXXX() methods
• Configuration information, utilizing the abstraction framework
• Framework abstraction and configuration are 2 different
• Why do you think there is so much XML in J2EE frameworks?
Internal Domain Specific Languages
• These are languages built using the syntactic elements
of the underlying language
• In the case of Java, building a DSL using Java classes
and methods
Java Building Blocks
• Java (and other similar language) make it a little tougher
• That annoying strong typing
• The fairly rigid infrastructure required by the language
• (Not that there’s anything wrong with this)
• You can mitigate the shortcomings of Java by using
some coding standards
Standards: Composed Method
• Composed method (originally in Smalltalk Best Practice
Patterns by Kent Beck)
• Method names are long, readable descriptions of what
the method does
• Every method is as discrete as possible
• No method longer than 15 lines of code
• Some limit this to 5!
• All public methods read as outlines of tasks, implemented as
private methods
Composed Method
Allows you to create methods that mean something in
the problem domain
new Swim().
Each method should perform 1 (and only 1) application
to the problem domain
If the problem domain insists on multi-step operations,
model is as composed methods
Example: Workout Log
• This application demonstrates:
• Building your own domain language
• Infrastructure
• Application
• Interesting Classes
• Exercise (and sub-classes)
• Log
• MonthlyExerLog
• Improvements:
• Better robustness
• Less state dependent on object instances
What Would Make this Cleaner?
• Java doesn’t support operator overloading
• One of the few times it would actually come in handy!
• Java’s strong typing forces you to “cruft” up the code
just a bit
• Enumerations help
• TypeSafe enum is ugly in this conext
• Java forces you to create a class with main()
• The compiler is very picky
• All goods things…except when you are creating a new
language on top of Java
Another Example: jMock
• jMock is a library for testing Java code using mock
• Typical jMock code
import org.jmock.*;
class PublisherTest extends MockObjectTestCase {
public void testOneSubscriberReceivesAMessage() {
Mock mockSubscriber = mock(Subscriber.class);
Publisher publisher = new Publisher();
publisher.add((Subscriber) mockSubscriber.proxy());
final String message = "message";
// expectations
once()).method("receive").with( eq(message) );
// execute
• The concrete syntax of a language is its syntax in its
representation that we see
• Is this a legitimate way to write the reader configuration?
<Mapping Code = "SVCL" TargetClass = "dsl.ServiceCall">
<Field name = "CustomerName" start = "4" end = "18"/>
<Field name = "CustomerID" start = "19" end = "23"/>
<Field name = "CallTypeCode" start = "24" end = "27"/>
<Field name = "DateOfCallString" start = "28" end = "35"/>
<Mapping Code = "USGE" TargetClass = "dsl.Usage">
<Field name = "CustomerID" start = "4" end = "8"/>
<Field name = "CustomerName" start = "9" end = "22"/>
<Field name = "Cycle" start = "30" end = "30"/>
<Field name = "ReadDate" start = "31" end = "36"/>
An Easier to Read Version
• Here is the same information, in another syntax
mapping SVCL dsl.ServiceCall
4-18: CustomerName
19-23: CustomerID
24-27 : CallTypeCode
28-35 : DateOfCallString
mapping USGE dsl.Usage
4-8 : CustomerID
9-22: CustomerName
30-30: Cycle
31-36: ReadDate
• This is a domain specific language, suitable only for the
purpose of mapping fixed length fields into classes
• A classic example of a Unix “little language”
• The XML file and the DSL file have different concrete
• Both share the same basic structure
• Multiple mappings
• Each with
• A code
• A target class name
• A set of fields
• This basic structure is the abstract syntax of the
Domain Specific Languages
• All three representations of the abstract syntax are in
fact DSL
• The Java reader configuration code
• The XML equivalent
• The simple text format
The Obligatory Ruby Slide
• What about this version?
mapping('SVCL', ServiceCall) do
extract 4..18, 'customer_name'
extract 19..23, 'customer_ID'
extract 24..27, 'call_type_code'
extract 28..35, 'date_of_call_string'
mapping('USGE', Usage) do
extract 9..22, 'customer_name'
extract 4..8, 'customer_ID'
extract 30..30, 'cycle'
extract 31..36, 'read_date'
• This is one of the reasons people rave about Ruby
• Minimally intrusive syntax
• Literals for ranges
• Flexible runtime evaluation
• Its DSL and Ruby code all rolled into one!
External Domain Specific Languages
• External DSLs
• Written in a different language than the main (host) language of
the application
• Transformed into it using some form of compiler or interpreter
• May include
• XML configuration files
• Plain text configuration files
• Full-blown languages
Building a Language with XML
• Another option is to use XML as the data container and
Java as the “active” parts
• Java classes read information from XML for data
• Java methods form the domain layer
• Example
• Every J2EE framework you’ve ever encountered!
• Examples too numerous to mention
• XML is ugly!
• Uglier than Java?
• Which would your end users like less?
• You can mitigate this with XSLT
• A little further away from the problem domain
• Now you have XML and Java cruft to deal with
The Ultimate External DSL
• Build your own language!
• You need 3 things
• A Grammar
• A Parser
• A Lexer
Using Antlr
• Antlr – ANother Tool for Language Recognition
• Antlr is an open source project that is used in many
other projects
• It provides a framework for constructing recognizers,
compilers, and translators from grammatical
• Supports tree construction, tree walking, and translation
Using Antlr: “Hello Whomever”
• First, specify what the language will look like with a
• The grammar is placed in a Lexer
• The lexer is a scanner or tokenizer
• Responsible for breaking the input stream into a series of
class L extends Lexer;
( 'a'..'z'|'A'..'Z' )+ NEWLINE
'\r' '\n'
// DOS
• Next, you create a parser
• The parser applies the language grammar to the input stream
• It reports compilation errors
• Both the lexer and parser can reside in the same file
class P extends Parser;
{System.out.println( "Hi there, “ + n.getText());}
Generate the Java Code
• Next, you run the Antlr tool to generate 2 classes, and interface,
and a text file: java antlr.Tool t.g
• ANTLR generates:
The lexical analyzer (Lexer).
The Parser.
The token type definitions (integer constants). The Lexer breaks up the
input stream of characters into vocabulary symbols called tokens,
which are identified by token types.
• PTokenTypes.txt
A file containing all the token types that ANTLR can easily read back in
if a parser in another file wants to use the vocabulary, lexer, or parser
defined in t.g. You can look at this as a persistence file.
A Slightly More Interesting Example
• Let’s do math
• Consider this lexer
class IntAndIDLexer extends Lexer;
INT : ('0'..'9')+ ;
: ('a'..'z')+ ;
COMMA: ',' ;
'\r' '\n'
// DOS
Math: Parser
class SeriesParser extends Parser;
{ int n = 1; // how many elements?
This is considered an initialization action
and is done before recognition of this
rule begins. These look like local
variables to the resulting method
At least 1 }
element (COMMA element {n++;})*
System.out.println("there were "+ n + " elements");
/* Match either an INT or ID */
Match an element (INT or
ID) with possibly a bunch
of ", element" pairs to
follow, matching input
that looks like
: a:INT { System.out.println(a.getText()); }
| b:ID
{ System.out.println(b.getText()); }
Abstract Syntax Trees
• To perform real work, you frequently must make multiple
passes over the source code
• Antlr makes this easy be facilitating the creation of
Abstract Syntax Trees
• AST’s are a structured representation of the text input
• You can either
• Walk them by hand
• Specify an Antlr tree grammar that describes the tree structure
(along with appropriate actions)
AST Example
class SeriesTreeParser extends TreeParser;
Match a flat tree (a
list) of one or more
INTs or IDs. This rule
differs from
which is in a different
Sum up all the integers
sumpass {
int sum = 0;
{sum += Integer.parseInt(a.getText());}
{System.out.println("sum is " + sum);}
AST Example
• When walking the tree, you can choose which target
you want to execute
• Handy for creating multi-pass languages
// Get the tree out of the parser
AST resultTree = parser.getAST();
// Make an instance of the tree parser
SeriesTreeParser treeParser = new SeriesTreeParser();
// walk AST once
treeParser.sumpass(resultTree); // walk AST again!!
Domain Language with Antlr
• Here is a subset of the Exercise Log domain language,
implemented with Antlr
• Changes
• Weekdays instead of dates (because parsing dates is tougher)
• Simple summary target
• More “hard coded” values in the parse tree
• Could be made just as dynamic as the original, but at the expense
of complexity
The ExerLang Domain Language
• Sample “source” file
swim on WED for 2150
run on TUE for 6.5
bike on SAT for 50
• The interesting files:
• Exer.g
• Main
Other Antlr Examples
• Antlr comes with grammars for a huge number of
• HTML, Java, XML, etc.
• It has hooks for many other languages
External DSLs
• Advantages
• You are free to use any form you like
• Limited only by your ability to parse and lex the language
• Disadvantages
• You have to build the translator
• They lack symbolic integration
• You don’t get the same support for your language as you get for
your base language
• What are you going to edit it with? Emacs?
• A common objection is that DSLs lead to language cacophony
• Not if used correctly!
Internal DSLs
• Advantages
• You have the full power of the underlying language
• You have full access to sophisticated tools
• Disadvantages
• Hard to write in modern “curly brace” languages
• Easier in Ruby and Lisp
• Limited by the syntax and structure of the language
• You must understand the base language or you are in syntax
Language Workbenches
• A language workbench is a tool that supports Language
oriented programming
• A brand new software category
• Today’s language workbenches
• Intentional Software (developed by Charles Simonyi)
• Meta Programming System (developed by JetBrains)
• Software Factories (developed by Microsoft)
Elements of a Language Workbench
• Traditional compilation
Post-IntelliJ IDEs
• IntelliJ isn’t just a cool IDE
• They were the first mainstream tool that allowed you to
edit against the AST instead of text
• That’s how refactoring and other intelligent support
• Now all the Java IDEs do this
Language Workbenches
Language Workbenches
• The editable representation is merely a projection of the
abstract representation
• There is no need for the editable representation to be complete
• Some aspects can be missing if they are needed for the task at
• You can have multiple projections - each showing a different
aspect of the abstract representation
• The storage representation still has to be worked out
• The abstract representation has to be comfortable with
errors and ambiguities
Defining a DSL with a Workbench
• Three steps to designing a DSL
• Define the abstract syntax (the schema of the abstract
• Define an editor to let people manipulate the abstract
representation through a projection
• Define a generator for the executable representation
JetBrains MPS
• Creating a Hello, World Language
• Define a schema
• Define an editor
• Define a generator
• Notice that the semantics are completely removed from
the implementation
• Schema is very loosely coupled to the generator
Summarizing the Tradeoffs in LOP
• The fundamental issue is the benefit of using DSLs
versus the cost of building the necessary infrastructure
• Using internal DSLs reduces the tool cost
• Adds serious constraints on the DSL
• An external DSL gives you the most potential but at a
higher cost
• This is why LOP hasn’t caught on yet
• The advent of Language Workbenches solves the external DSL
External DSLs and Non-Programmers
• One of the supposed strengths of COBOL was that you
could let non-programmers use it
• The COBOL Inference - most technologies that are supposed to
eliminate professional programmers do nothing of the sort
• The Holy Grail of Language oriented programming is to
create DSLs that non-programmers can read and write
• DSLs make everyone more productive (programmer
and non-programmer)
• If we create code that a business analyst can verify,
that’s a big win
• We don’t think that DSLs are the Holy Grail…but it’s
closer than COBOL or Java
• Language oriented programming has the potential to be
the next big revolution in programming paradigms
• Once the tools and techniques permeate the market and
developers minds
• The productivity of the tools starts making a big impact
• A level of encapsulation not possible in OOP
• Languages close to the problem domain, not just
Samples & slides at
Neal Ford
[email protected]