Building Domain Languages atop Java

Download Report

Transcript Building Domain Languages atop Java

Language Oriented Programming and
Language Workbenches: Building Domain
Languages atop Java
Neal Ford
Application Architect
ThoughtWorks
www.nealford.com
www.thoughtworks.com
[email protected]
Blog: memeagora.blogspot.com
Questions, Slides, and Samples
• Please feel free to ask questions anytime
• The slides and samples will be available at
www.nealford.com
• I’ll show that address again at the end
2
Dynamic Languages
• Look at the way experienced developers use dynamic
languages (like Lisp or Ruby)
• Developers build meta-languages on top of the dynamic
language
• Sometimes they create layers of languages, each closer to the
problem domain and further from the language domain
• The Unix tradition of “little languages”
• Active Data Models
3
Examples
• The first on-line “build your own” ecommerce solution
created in Lisp
• Documented in Hackers and Painters by Paul Graham
• Implementing a PHP-like language in Lisp to create Lisp
HTML templates
• Created by Gene Michael Stover, Documented in Dr. Dobbs
Journal, Oct 2004
• Managed to recreate most of the functionality of PHP (with
some additions) in 200 lines of Lisp code
4
Philosophy
• Well stated in The Art of Unix Programming by Eric
Raymond
• In Chapter 1, Section 1.6.14
• The Rule of Generation: Avoid hand-hacking; write programs to
write programs when you can
• The central idea is that programming languages reside
at too low a level
• This notion has been neglected in statically typed
modern languages
• But it’s still a good idea!
5
Nomenclature
• Some terminology coined by Martin Fowler
• Domain Specific Language
• A limited form of computer language designed for a specific
class of problems.
• Some communities like to use DSL only for problem domain
languages, but I'm following the usage that uses DSL for any
limited domain.
• Language Oriented Programming
• The general style of development which operates about the idea
of building software around a set of domain specific languages
• Language Workbench
• A generic term for a new breed of tools that makes it easy to
perform language oriented programming
6
Building Domain Languages
• This session talks about building your own domain
languages in and around Java
• The motivation for building DSLs
• Internal DSLs
• Built using Java syntax
• External DSL’s
• Built using “compiler compilers”
• Language Workbenches
7
Domain Languages Already Underfoot
• Two examples:
• The Front Controller in a J2EE web framework
• Reading a positional text file
• We’re talking about two types of code
• The web framework code; the Reader and Strategy classes
• Abstractions to enable functionality
• The properties file; the configureXXX() methods
• Configuration information, utilizing the abstraction framework
• Framework abstraction and configuration are 2 different
things
• Why do you think there is so much XML in J2EE frameworks?
8
Internal Domain Specific Languages
• These are languages built using the syntactic elements
of the underlying language
• In the case of Java, building a DSL using Java classes
and methods
9
Java Building Blocks
• Java (and other similar language) make it a little tougher
• That annoying strong typing
• The fairly rigid infrastructure required by the language
• (Not that there’s anything wrong with this)
• You can mitigate the shortcomings of Java by using
some coding standards
10
Standards: Composed Method
• Composed method (originally in Smalltalk Best Practice
Patterns by Kent Beck)
• Method names are long, readable descriptions of what
the method does
• Every method is as discrete as possible
• No method longer than 15 lines of code
• Some limit this to 5!
• All public methods read as outlines of tasks, implemented as
private methods
11
Composed Method
•
Allows you to create methods that mean something in
the problem domain
febLog.add(
new Swim().
onDate("02/01/2005").
forDistance(1250));
•
•
Each method should perform 1 (and only 1) application
to the problem domain
If the problem domain insists on multi-step operations,
model is as composed methods
12
Example: Workout Log
• This application demonstrates:
• Building your own domain language
• Infrastructure
• Application
• Interesting Classes
• Exercise (and sub-classes)
• Log
• MonthlyExerLog
• Improvements:
• Better robustness
• Less state dependent on object instances
13
What Would Make this Cleaner?
• Java doesn’t support operator overloading
• One of the few times it would actually come in handy!
• Java’s strong typing forces you to “cruft” up the code
just a bit
• Enumerations help
• TypeSafe enum is ugly in this conext
• Java forces you to create a class with main()
• The compiler is very picky
• All goods things…except when you are creating a new
language on top of Java
14
Another Example: jMock
• jMock is a library for testing Java code using mock
objects
• Typical jMock code
import org.jmock.*;
class PublisherTest extends MockObjectTestCase {
public void testOneSubscriberReceivesAMessage() {
Mock mockSubscriber = mock(Subscriber.class);
Publisher publisher = new Publisher();
publisher.add((Subscriber) mockSubscriber.proxy());
final String message = "message";
// expectations
mockSubscriber.expects(
once()).method("receive").with( eq(message) );
// execute
publisher.publish(message);
}
}
15
Nomenclature
• The concrete syntax of a language is its syntax in its
representation that we see
• Is this a legitimate way to write the reader configuration?
<ReaderConfiguration>
<Mapping Code = "SVCL" TargetClass = "dsl.ServiceCall">
<Field name = "CustomerName" start = "4" end = "18"/>
<Field name = "CustomerID" start = "19" end = "23"/>
<Field name = "CallTypeCode" start = "24" end = "27"/>
<Field name = "DateOfCallString" start = "28" end = "35"/>
</Mapping>
<Mapping Code = "USGE" TargetClass = "dsl.Usage">
<Field name = "CustomerID" start = "4" end = "8"/>
<Field name = "CustomerName" start = "9" end = "22"/>
<Field name = "Cycle" start = "30" end = "30"/>
<Field name = "ReadDate" start = "31" end = "36"/>
</Mapping>
</ReaderConfiguration>
16
An Easier to Read Version
• Here is the same information, in another syntax
mapping SVCL dsl.ServiceCall
4-18: CustomerName
19-23: CustomerID
24-27 : CallTypeCode
28-35 : DateOfCallString
mapping USGE dsl.Usage
4-8 : CustomerID
9-22: CustomerName
30-30: Cycle
31-36: ReadDate
• This is a domain specific language, suitable only for the
purpose of mapping fixed length fields into classes
• A classic example of a Unix “little language”
17
Nomenclature
• The XML file and the DSL file have different concrete
syntaxes
• Both share the same basic structure
• Multiple mappings
• Each with
• A code
• A target class name
• A set of fields
• This basic structure is the abstract syntax of the
language
18
Domain Specific Languages
• All three representations of the abstract syntax are in
fact DSL
• The Java reader configuration code
• The XML equivalent
• The simple text format
19
The Obligatory Ruby Slide
• What about this version?
mapping('SVCL', ServiceCall) do
extract 4..18, 'customer_name'
extract 19..23, 'customer_ID'
extract 24..27, 'call_type_code'
extract 28..35, 'date_of_call_string'
end
mapping('USGE', Usage) do
extract 9..22, 'customer_name'
extract 4..8, 'customer_ID'
extract 30..30, 'cycle'
extract 31..36, 'read_date'
end
• This is one of the reasons people rave about Ruby
• Minimally intrusive syntax
• Literals for ranges
• Flexible runtime evaluation
• Its DSL and Ruby code all rolled into one!
20
External Domain Specific Languages
• External DSLs
• Written in a different language than the main (host) language of
the application
• Transformed into it using some form of compiler or interpreter
• May include
• XML configuration files
• Plain text configuration files
• Full-blown languages
21
Building a Language with XML
• Another option is to use XML as the data container and
Java as the “active” parts
• Java classes read information from XML for data
• Java methods form the domain layer
• Example
• Every J2EE framework you’ve ever encountered!
• Examples too numerous to mention
• XML is ugly!
• Uglier than Java?
• Which would your end users like less?
• You can mitigate this with XSLT
• A little further away from the problem domain
• Now you have XML and Java cruft to deal with
22
The Ultimate External DSL
• Build your own language!
• You need 3 things
• A Grammar
• A Parser
• A Lexer
23
Using Antlr
• Antlr – ANother Tool for Language Recognition
• http://www.antlr.org/
• Antlr is an open source project that is used in many
other projects
• It provides a framework for constructing recognizers,
compilers, and translators from grammatical
descriptions
• Supports tree construction, tree walking, and translation
24
Using Antlr: “Hello Whomever”
• First, specify what the language will look like with a
grammar
• The grammar is placed in a Lexer
• The lexer is a scanner or tokenizer
• Responsible for breaking the input stream into a series of
tokens
class L extends Lexer;
NAME:
( 'a'..'z'|'A'..'Z' )+ NEWLINE
;
NEWLINE
:
'\r' '\n'
// DOS
|
'\n'
// UNIX
;
25
Parser
• Next, you create a parser
• The parser applies the language grammar to the input stream
• It reports compilation errors
• Both the lexer and parser can reside in the same file
class P extends Parser;
startRule
:
n:NAME
{System.out.println( "Hi there, “ + n.getText());}
;
26
Generate the Java Code
• Next, you run the Antlr tool to generate 2 classes, and interface,
and a text file: java antlr.Tool t.g
• ANTLR generates:
• L.java
The lexical analyzer (Lexer).
• P.java
The Parser.
• PTokenTypes.java
The token type definitions (integer constants). The Lexer breaks up the
input stream of characters into vocabulary symbols called tokens,
which are identified by token types.
• PTokenTypes.txt
A file containing all the token types that ANTLR can easily read back in
if a parser in another file wants to use the vocabulary, lexer, or parser
defined in t.g. You can look at this as a persistence file.
27
A Slightly More Interesting Example
• Let’s do math
• Consider this lexer
class IntAndIDLexer extends Lexer;
INT : ('0'..'9')+ ;
ID
: ('a'..'z')+ ;
COMMA: ',' ;
NEWLINE
:
'\r' '\n'
// DOS
|
'\n'
// UNIX
;
28
Math: Parser
class SeriesParser extends Parser;
series
{ int n = 1; // how many elements?
:
This is considered an initialization action
and is done before recognition of this
rule begins. These look like local
variables to the resulting method
SeriesParser.series()
At least 1 }
element (COMMA element {n++;})*
{
System.out.println("there were "+ n + " elements");
}
;
/* Match either an INT or ID */
element
Match an element (INT or
ID) with possibly a bunch
of ", element" pairs to
follow, matching input
that looks like
32,a,size,28923,i
: a:INT { System.out.println(a.getText()); }
| b:ID
;
{ System.out.println(b.getText()); }
29
Abstract Syntax Trees
• To perform real work, you frequently must make multiple
passes over the source code
• Antlr makes this easy be facilitating the creation of
Abstract Syntax Trees
• AST’s are a structured representation of the text input
• You can either
• Walk them by hand
• Specify an Antlr tree grammar that describes the tree structure
(along with appropriate actions)
30
AST Example
class SeriesTreeParser extends TreeParser;
series
:
(
a:INT
{System.out.println(a.getText());}
|
b:ID
{System.out.println(b.getText());}
Match a flat tree (a
list) of one or more
INTs or IDs. This rule
differs from
SeriesParser.series(),
which is in a different
grammar.
)+
;
Sum up all the integers
sumpass {
int sum = 0;
}
:
(
a:INT
|
ID
{sum += Integer.parseInt(a.getText());}
)+
{System.out.println("sum is " + sum);}
;
31
AST Example
• When walking the tree, you can choose which target
you want to execute
• Handy for creating multi-pass languages
parser.series();
// Get the tree out of the parser
AST resultTree = parser.getAST();
// Make an instance of the tree parser
SeriesTreeParser treeParser = new SeriesTreeParser();
treeParser.series(resultTree);
// walk AST once
treeParser.sumpass(resultTree); // walk AST again!!
32
Domain Language with Antlr
• Here is a subset of the Exercise Log domain language,
implemented with Antlr
• Changes
• Weekdays instead of dates (because parsing dates is tougher)
• Simple summary target
• More “hard coded” values in the parse tree
• Could be made just as dynamic as the original, but at the expense
of complexity
33
The ExerLang Domain Language
• Sample “source” file
swim on WED for 2150
run on TUE for 6.5
bike on SAT for 50
summary
• The interesting files:
• Exer.g
• Main
34
Other Antlr Examples
• Antlr comes with grammars for a huge number of
languages
• HTML, Java, XML, etc.
• It has hooks for many other languages
35
External DSLs
• Advantages
• You are free to use any form you like
• Limited only by your ability to parse and lex the language
• Disadvantages
• You have to build the translator
• They lack symbolic integration
• You don’t get the same support for your language as you get for
your base language
• What are you going to edit it with? Emacs?
• A common objection is that DSLs lead to language cacophony
• Not if used correctly!
36
Internal DSLs
• Advantages
• You have the full power of the underlying language
• You have full access to sophisticated tools
• Disadvantages
• Hard to write in modern “curly brace” languages
• Easier in Ruby and Lisp
• Limited by the syntax and structure of the language
• You must understand the base language or you are in syntax
trouble
37
Language Workbenches
• A language workbench is a tool that supports Language
oriented programming
• A brand new software category
• Today’s language workbenches
• Intentional Software (developed by Charles Simonyi)
• Meta Programming System (developed by JetBrains)
• Software Factories (developed by Microsoft)
38
Elements of a Language Workbench
• Traditional compilation
39
Post-IntelliJ IDEs
• IntelliJ isn’t just a cool IDE
• They were the first mainstream tool that allowed you to
edit against the AST instead of text
• That’s how refactoring and other intelligent support
works
• Now all the Java IDEs do this
40
Language Workbenches
41
Language Workbenches
• The editable representation is merely a projection of the
abstract representation
• There is no need for the editable representation to be complete
• Some aspects can be missing if they are needed for the task at
hand
• You can have multiple projections - each showing a different
aspect of the abstract representation
• The storage representation still has to be worked out
• The abstract representation has to be comfortable with
errors and ambiguities
42
Defining a DSL with a Workbench
• Three steps to designing a DSL
• Define the abstract syntax (the schema of the abstract
representation)
• Define an editor to let people manipulate the abstract
representation through a projection
• Define a generator for the executable representation
43
JetBrains MPS
• Creating a Hello, World Language
• Define a schema
• Define an editor
• Define a generator
• Notice that the semantics are completely removed from
the implementation
• Schema is very loosely coupled to the generator
44
Summarizing the Tradeoffs in LOP
• The fundamental issue is the benefit of using DSLs
versus the cost of building the necessary infrastructure
• Using internal DSLs reduces the tool cost
• Adds serious constraints on the DSL
• An external DSL gives you the most potential but at a
higher cost
• This is why LOP hasn’t caught on yet
• The advent of Language Workbenches solves the external DSL
issue
45
External DSLs and Non-Programmers
• One of the supposed strengths of COBOL was that you
could let non-programmers use it
• The COBOL Inference - most technologies that are supposed to
eliminate professional programmers do nothing of the sort
• The Holy Grail of Language oriented programming is to
create DSLs that non-programmers can read and write
• DSLs make everyone more productive (programmer
and non-programmer)
• If we create code that a business analyst can verify,
that’s a big win
• We don’t think that DSLs are the Holy Grail…but it’s
closer than COBOL or Java
46
Summary
• Language oriented programming has the potential to be
the next big revolution in programming paradigms
• Once the tools and techniques permeate the market and
developers minds
• The productivity of the tools starts making a big impact
• A level of encapsulation not possible in OOP
• Languages close to the problem domain, not just
structures
47
Questions?
Samples & slides at www.nealford.com
Neal Ford
www.nealford.com
[email protected]
memeagora.blogspot.com