Matchete Paths through the Pattern Matching Jungle

Download Report

Transcript Matchete Paths through the Pattern Matching Jungle

Matchete
Paths through the Pattern Matching Jungle
Martin Hirzel
Nate Nystrom
Bard Bloom
Jan Vitek
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
7+8 January 2008 PADL
1
What is Pattern Matching?
Examples:
–
–
–
–
–
–
Switch in C/Java
Exception handlers
ML-style patterns
Regular expressions
XPath patterns
Bit masks
Selection
– If match, then
execute handler
– E.g. is this a float?
22.341
Bindings
– Give names to parts
– E.g. integral part: 22,
fractional part: 341
2
Example: Lists
-- list multiplication
-1
4 )
0
mult( 3
4 )
0
= 3 * mult( -1
4 )
= 3 * -1 * mult( 0
= 3 * -1 * 0 * mult( 4 )
= 3 * -1 * 0 * 4 * mult(nil)
= 3 * -1 * 0 * 4 * 1 = 0
-- list construction
cons(3, cons(-1, cons(0, cons(4, null))))
-1
4
0
= 3
3
Matching Structured Terms
Selection
int mult(List ls) {
match(ls) {
cons~(0, _): return 0;
cons~(int h, List t): return h * mult(t);
null: return 1;
}
Bindings
return 1;
}
Central feature of ML, Haskell
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Hardly a jungle!
4
Less Structured Data
Data
Pattern
Language
Strings Regular expression
Perl
XML
XPath
XSLT
Raw bits
Binary pattern
Erlang
Major factor in success of
practical languages!
5
Why Unify?
• Given list of strings: sue 10 bob 15
• Given String variable: name
• Find name, extract int age
ann 11
• Match list  deconstructor pattern
cons~(…, …)
• Match string  nested RegExp
/([a-z]+) ([0-9]+)/(name, int age)
6
Matchete (Java Extension)
•
•
•
•
Integrates pattern sublanguages
Common set of primitive patterns
Nesting composite patterns
Simple uniform semantics
7
Primitive Patterns
Name
Wildcard
Examples
_
Value
22.341
x
tiger.stripes + spider.legs
Binder
int x
ScaryAnimal python
8
Composite Patterns
Name
Examples
Deconstructor
cons~(0, _)
Parameterized
re("([0-9]+)")~(int i)
Array
int[]{1, x, int y}
XPath
<bib/book>(NodeList n)
RegExp
/([a-z]) ([0-9]+)/(chr,int f)
BitLevel
[[(0x2cf9:16) 01 (int x:14)]]
9
Deconstructor Definition
Fields
Constructor
Deconstructor
class List {
private int head;
private List tail;
public List(int h, List t)
{ head = h; tail = t; }
public cons~(int h, List t)
{ h = head; t = tail; }
}
Match on receiver object
Out parameters = subjects for nested patterns
10
Nesting
sue 10
bob 15
ann 11
cons~(/([a-z]+) ([0-9]+)/(name, int age), _)
Deconstructor
cons
RegExp
([a-z]+) ([0-9]+)
Value
name
Binder
int age
Wildcard
_
11
Subjects flow to children
sue 10
bob 15
ann 11
Deconstructor
cons
sue 10
RegExp
([a-z]+) ([0-9]+)
sue
Value
name
bob 15
ann 11
10
Binder
int age
Wildcard
_
12
Decisions and bindings flow
to textual successor
Deconstructor
cons
RegExp
([a-z]+) ([0-9]+)
Value
name
Binder
int age
Wildcard
_
Handler
print(age)
13
Compilation
Matchete source code
Built on Rats!
Matchete compiler
parser generator
Other
Java source
Runtime
library
Generated
Java source
Java compiler
Debugging
information
Postprocessor
Java class files
14
Implemented Examples
•
•
•
•
Balance red-black tree
Process TCP/IP network packet
Pretty-print XML bibliography
… + smaller regression tests
15
Discussion: Typing
Matchete uses strong dynamic typing
– No runtime errors, just failed matches
– If Matchete compiler gives no error,
then Java compiler gives no error either
Why not (more) static typing?
– Data formats mismatch
– Test bed for a new scripting language
16
Discussion: Integration
Choice
Example
Advantage
Loose
integration
/a(b)c(d)/
(p,q)
Sublanguage
reuse
Tight
/a(p:b)c(q:d)/
integration
No need to
count
re("a(b)c(d)")
No
~(p,q)
integration
Simpler
language
Matchete choses tight integration for BitLevel,
loose integration for RegExp and XPath,
no integration for XML as terms
17
Related Work
• Structured terms
– Algebraic types: ML, Haskell, …
– Objects: Tom, OOMatch, JMatch, …
– Letting users define patterns: F#, Scala
• Strings: Perl; SNOBOL
• Bit-level data: Erlang; DataScript; PADS
• XML:
– As trees: XSLT, XJ (XPath)
– As terms: XDuce, HydroJ, …
18
Conclusions
• Pattern matching applies to
terms, strings, XML, and raw bits
• Matchete offers path to unification
19