Transcript Bracmat

Bracmat
A guided tour
Bart Jongejan
2013
The name
Applications
Core Methods
Why Bracmat?
Code examples
Documentation
Development
Download
Finale
Bracmat
(brachiat. – w. branches)
1741
Country on planet Nazar, inhabited by juniper
trees with good facilities for astronomy,
transcendental philosophy and mining.
Niels Klim, by Ludvig Holberg (1684-1754).
2013
Software for analysis and transformation of
uncharted and complex data.
Examples of Applications
HTML cleaning
validation of text corpora
extraction of tabular data from text
semantic analysis of text
automatic workflow creation
computer algebra
investigation of email chains
HTML cleaning
ensure standard header and footer
check links
add closing tags
warn if element not allowed in context
remove or translate disallowed attributes
translate deprecated elements (font, center)
remove redundant elements (small big)
Validation of text corpora
Dutch corpora (the Netherlands/Flanders):
CGN(2006), MWE (2007), D-COI (2008),
DPC (2010), Lassi (2011), SoNaR (2012)
XML wellformedness, tag usage, sampling,
visualisation for manual tasks, statistics,
tabular parts of reports.
Data extraction from text
Semantic analysis
"Skal jeg tisse mere af diabetes?"
(“Do I have to urinate more because of diabetes?”)
First: tokenizer, tagger (opennlp), parser
(mate-tools)
Then: using patterns, find relation and
concepts in parse tree. Result:
Polyuri
DUE TO
diabetes mellitus
Automatic Workflow Creation
Computer algebra
Face tracker: frame-by-frame video analysis
Head gestures: velocity, acceleration
y: pixel position
∝ muscle force
c: head
acceleration
x: frame #
a: head
position
b: head
velocity
Computer algebra
(
.
Solve three equations → acceleration
c
Bracmat solution
( -1*St2^3
Java code
+ 2*St*St2*St3
^
* (
+
+
+
+
+
)
)
+ St2*St4*period
+ -1*St^2*St4
+ -1*St3^2*period
)
-1
-1*Sh*St2^2
Sh*St*St3
St*St2*Sth
St2*St2h*period
-1*St3*Sth*period
-1*St^2*St2h
return
( Sh*(St*St3 - St2*St2)
+ Sth*(St*St2 - St3*period)
+ St2h*(St2*period - St*St)
)
/
( St2*(2*St*St3 - St2*St2 + St4*period)
- St*St*St4
- St3*St3*period
);
Email chains
Received: from [192.38.108.156]( "Bracmat,
(unknownGitHub"
[192.38.108.156])
by mailgate.sc.ku.dk (Postfix)
with
ESMTP;
, "Bart
Jongejan"
Thu, 11 Oct 2012 16:28:59 +0200 (CEST)
, 2012 10 11 14 28 59 200
From: [email protected]
, (.)
To: hans Keller <[email protected]>
Date: Thu, 11 Oct 2012 16:28:59) +0200
MIME-Version: 1.0
( Bracmat
Subject: Bracmat, GitHub
, "Bart Jongejan"
CC: [email protected]
, 2012 10 11 18 8 17 200
Message-ID: <[email protected]>
X-Confirm-Reading-To: [email protected]
,(
X-pmrqc: 1
. "Re: Bracmat"
Priority: normal
, "Hans
Keller"
X-mailer: Pegasus Mail for Windows
(4.63)
, 2012
10 12 6 45 15 200
Content-type: Multipart/Alternative;
boundary="Alt-Boundary-3336.1
--Alt-Boundary-3336.14371857
,(
. "Re: Bracmat"
Core methods
composition
normalization
pattern matching
procedural logic
Composition
Compose complex expressions from simpler
ones.
binary operator
complex expression
expression
another
expression
Normalization
Automatically derive canonical expressions
from unnormalized ones.
canonical
arbitrary expression
expression
Pattern matching
Deconstruct complex expressions into
simpler ones using pattern matching.
complex expression
pattern
?
?
simple expressions
Procedural logic
(
complex expression
pattern
?

&
)
|

?
WHY?
How does a test particle move, given a set of
basis vectors and a specific metric?
→ symbolic algebra
Symbolic manipulations easy, but MANY.
Pen and paper: doubts about correctness.
Computer: no errors.
1986: First version of Bracmat composes and
normalises algebraic expressions.
1988: Pattern matching and procedural logic
All Bracmat expressions are binary trees:
+
x
^
^
2
3
*
a
x ^ 2 + ( a * y )^ 3
y
Code examples
keyboard input
prompt {?} 1+2
{!} 3
answer
{?} a+a+a
“a” is a symbol,
answer
{!}
3*a
concise not a variable
follows
{?} b+a
non standard order
{!} a+b
canonical order
3, 3*a and a+b are canonical forms of
1+2, a+a+a and b+a, respectively.
Operators (initially):
*
multiplication
+
addition
^
exponentiation
\L
taking a logarithm
\D
taking a derivative
NO operators for subtraction and division:
a - b = a+(-1*b)
a / b = a*(b^-1)
Bracmat expressions autonomously seek
toward stable states.
Comparison: garbage falling on dump.
Small things slide down through the voids.
Chemicals interact.
Fumes disappear.
Finally all is quiet.
This is the
“Normal state”.
Landfill expression:
landfill=ashtray+5*bag+barbie+
12*bottle+9*cork+stone+television
Truck’s contents:
truck=apple+3*bag+paper+phone
Emptying the truck in the landfill:
(!landfill + !truck) : ?landfill
Landfill’s new stable state after a while:
apple+ashtray+8*bag+barbie+12*bottle
+9*cork+paper+phone+stone+television
Landfill: not nice, but unwieldy & repulsive.
Good News: there are gems in the landfill.
If Hengki wants to obtain gems, he needs to:
recognise valuable
items
and pick up those
valuable items
Jonathan McIntosh, 2004
most
doll
if you see
Hengki’s program
of it pattern doll, take it
!landfill:
?junk
+ ?n*((ken|barbie):?gem)
+ ?morejunk
scan the
landfill
after doll seen, go on with next step
& !HengkiStuff+!gem:?HengkiStuff
add doll to H.’s
possessions
&
|
and don’t return
it to the landfill
!junk+!morejunk+(!n+-1)*!gem
: ?landfill
if no doll seen, landfill and Hengki’s
possessions remain unchanged
Four new binary operators:
=
:
&
|
bind rhs to symbol on lhs
match lhs (subject) with rhs (pattern)
do rhs if lhs succeeds
do rhs if lhs fails
and two prefixes:
capture a value and bind it to
?
the adjacent symbol.
!
produce the value that is bound
to the adjacent symbol
= : & | and \D evaluated away (normally).
Dynamic forces that shake and break rubble.
, and . do always persist through evaluation.
Residual forces that keep things in place.
Whitespace + * ^ and \L can persist, e.g.:
y x → y x
y+x → x+y
y*x → x*y
But:
"" a, 0+a, 1*a → a, a, a
Examples of data structures that don’t change
when (re)evaluated.
x^2,y^2,100
3 algebraic expressions
separated by commas
(.1 0 0)
(.0 0 -1)
9 numbers in a matrix
(.0 1 0)
Lists built with
BUT:
(1 0 0)
whitespace, + and *
are always flattened!
(0 0 -1)
(0 1 0) → 1 0 0 0 0 -1 0 1 0
Because blank, comma and dot are
binary operators, this sentence is a
perfect Bracmat expression.
{?} Because blank, comma and dot are binary
operators, the sentence you are reading is a
perfect Bracmat expression.
{!}
Because blank
, comma and dot are binary operators
,
the
sentence
you
are
reading
is
a
perfect
Bracmat
expression
.
Logical expansion of application domain of
Bracmat as:
“Software for analysis and transformation of
uncharted and complex data.”
textual
Example:
Check sentence syntax with Bracmat patterns:
&
&
&
&
&
&
(S=!NP !VP)
non-terminals
(NP=!DET !N)
(VP=!V|!V !NP)
terminals
(DET=a|the)
(N=woman|man) rule application
(V=shoots|kisses)
(
a man kisses the woman:!S
& put$"That's grammatical!\n"
| put$"not grammatical\n"
)
screen output if screen output
if success
failure
Operator $ applies function to argument.
Only few built-in functions, e.g.:
get
get input from file, keyboard or string
put
write a result to file, screen or string
lst
serialize a variable to
file, screen or string
str
concatenate a tree into a single string
Function application:
str$(I m p l o d e) → Implode
Define your own functions.
E.g. syntax checker:
check=
= only evaluates lhs.
S NP VP DET N V
.
(S=!NP !VP)
before dot: declaration
& (NP=!DET !N)
of local variables
& (VP=!V|!V !NP)
& (DET=a|the)
after dot:
function body
& (N=woman|man)
& (V=shoots|kisses)
& !arg:!S
'check' succeeds if match ok
Call check with a sentence as argument:
{?}
{!}
{?}
{!}
check$(a woman shoots)&okay|no
okay
check$(a man a man shoots)&T|F
F
( ROOT
PARSE
TREE
.
(VERB.Skal.skal)
relation (‘attribute’)
(subj.PRON.jeg.jeg)
( vobj
.
(VERB.tisse.tisse)
( dobj
concept 1
.
(ADJ.mere.mere)
( pobj
.
(ADP.af.af)
(nobj.NOUN.diabetes.diabetes)
)
concept 2
)
)
(pnct.X."?"."?")
)
[…]
Relation
|
(its.hasTree)
$ ( !arg
Pattern
. (
Whatever
= (VERB.?.skal|skulle)… must also
matches?this …
match this.
( vobj
. ((VERB.?.?):?a)
( dobj
. ?b (pobj.(ADP.af.af) ?LC2)
)
)
location
of1
Combine
location
of
concept
)
concept 2
fragments
(fragmented)
Why
“!a”
?
)
& !a (dobj.!b):?LC1
)
relation (‘attribute’)
& "DUE TO"
Concept 1
pattern
(its.hasTree)
$ (!LC1
. (
=
(VERB.?.tisse)
(dobj.(ADJ.?.mere) ?)
)
)
concept 1
→ (28442001.Polyuri)
Concept 2
pattern
(its.hasLemma)
$ (!LC2.(=sukkersyge ?|diabetes ?))
concept 2
→ (73211009."diabetes mellitus ")
(attribute."DUE TO")
( concept1
. "Clinical Finding"
. 28442001.Polyuri
)
( concept2
. "Clinical Finding"
. 73211009."diabetes mellitus "
)
"Is sinning sincere?":?Mytext
& 0:?Bi Initialise bigram accumulator
any number of
& @( !Mytext
subject
pattern
bytes, even
none
:
?
( %?One %?Two ?
String
at least
pattern
& (!One !Two)+!Bi:?Bi
one byte
matching & ~
accumulate
)
)
embedded instructions
| lst$Bi
fail! (backtrack)
(Bi=
(" " i)
+ 2*(" " s)
+ (I s)
+ (c e)
+ (e "?")
+ (e r)
+ 3*(i n)
+ 2*(n " ")
+ (n c)
+ (r e)
+ (s " ")
+ 2*(s i));
0:?Bi
& "из фрагментов текстов":?Mytext
& @( !Mytext
:
?
(
(%?One & utf$!One)
(%?Two & utf$!Two)
?
& (!One !Two)+!Bi:?Bi
& ~
Require UTF-8
) character
)
| lst$Bi
Bi=
(" " т)
+ (" " ф)
+ (а г)
+ (в " ")
+ (г м)
+ (е к)
+ (е н)
+ (з " ")
+ (и з)
+ (к с)
+ (м е)
+ (н т)
+ 2*(о в)
+ (р а)
+ (с т)
+ (т е)
+ 2*(т о)
+ (ф р);
Parse 0n1n
Example of recursive pattern.
{?} AB= ( ""
left hand side of
| 0 !AB 1
| is ”nothing”.
So this matches
)
zero 0's and 1's.
recurse
{?} 0 0 1 1:!AB & good | bad
{!} good
Parse 0n1n2n
if zero 0's and 1's
then also zero 2's
{?} AB= ( "":?C
|
0 !AB 1
& 2 !C:?C
)
for each nested pair of
0 and 1, add a 2 to C
after parsing n 0's and n
1's, C contains n 2's
{?} 0 0 1 1 2 2:!ABC & good | bad
{!} good
{?} ABC=!AB !C
Documentation
http://jongejan.dk/bart/bracmat.html
Most complete documentation.
http://rosettacode.org/wiki/Category:Bracmat
Over 170 examples that can be compared
with implementations in other
programming languages.
Development
Evolution at moderate pace.
Great variety of snippets in valid.bra:
guards against unexpected and unwanted
behavioural changes and tests all C-code.
Behaviour described in file help (precursor
to bracmat.html).
Changes are logged.
Download
Open source since 3 June 2003 (GPL).
http://cst.dk/download/bracmat/
Source code spanning period 1986-2012.
https://github.com/BartJongejan/Bracmat
Always the latest source code.
Finale
garbage collection
most programming languages
gem collection
Bracmat
“Staten bruger dem imidlertid til at undersøge
Metalgruberne; thi ligesaa slet som de see
hvad der er oven paa Jorden, saa fortreffeligen
see de det der er inden i den.”
Finale
In Bracmat, trees are first class citizens.
Trees have autonomous behaviour.
Using pattern matching, the State controls trees.
The State itself consists of trees.
Finale
Ease of use, clarity and expressive power of
Bracmat’s patterns can compete with RE, SQL,
XQuery and Prolog.
Pattern matching as a primitive is strangely
absent in popular programming languages,
having died out with Snobol (1962 ~1990).
Modelling data in Bracmat is a small step away
from understanding and controlling it.
Contents Part 2
Bracmat expressions
Expression evaluation
Bracmat in use
Bracmat expressions
Code and Data
Numbers
Strings
Lists
Structures
Booleans
Functions
Special symbols
Arrays
Objects
Hash Tables
Code and Data
No sharp distinction:
"I’m stable"
997^1/2
(not so) stable
998^1/2
i*i:>0&pos|"not
→ "I’m stable"
→ 997^1/2
→ not so stable
→ 2^1/2*499^1/2
pos" → "not pos"
Numbers
Arbitrary-precision arithmetic.
Rational numbers.
No floating point!
2/3 + -1/6
→ 1/2
1/99+-1/100+1/101 → 10001/999900
2^216091+-1 → 746093103…815528447
(65050 digits, 31st Mersenne prime)
Strings
Strings cannot contain null-bytes, otherwise
no restrictions.
A
ру́
сский
"2^216091+-1"
"A string can
extend over
multiple lines"
Lists
Sums, products and sentences.
x + 4 + a + 8
p * q * z
This list has five elements
Lists inside lists:
a + 5 * e ^ (i * pi + x) + b
Lists (continued)
a + b : ?x + b + ?y
a * b : ?x * b * ?y
a
b : ?x
b
?y
{ y := 0 }
{ y := 1 }
{ y := ""}
0, 1 and the empty string "" are identity
(or neutral) elements in sums, products
and sentences respectively.
Structures
(uni . "Københavns Universitet")
(institut . CST)
(publications.(2011. pub1 pub2)
(2012. pub3 pub4)
(2013. pub5
pub6
pub7
)
)
Booleans
There is no separate boolean type.
Each node in a Bracmat expression has a
success/failure status flag.
a
~
1+2
a:b
success
failure
successful evaluation to 3
failing match operation
Functions
(=a b. !arg:(?a.?b) & (!b.!a))
local variables
$ (jeg.går)
parameter
returned value
argument
function application
result
→ går.jeg
Special symbols
e
i
pi
Euler’s constant, the basis of the
natural logarithm, 2.7182…
x \D (e ^ x)
→ e ^ x
unit imaginary number
i * i → -1
the ratio of a circle's circumference to
its diameter
e ^ (i * pi) → -1
Arrays
An array “A” is not a variable, it is the stack
of all variables named “A”.
tbl$(A,100)
117:?(41 $ A)
!(41$A)
tbl$(A,0)
create array A size 100
assign to element A[41]
inspect element A[41]
delete array A
Objects
Objects are the only Bracmat expressions
that can change.
(language=(iso639=) (spkrs=))
& new$language:?Danish
& da:?(Danish..iso639)
& 6M:?(Danish..spkrs)
Hash Tables
built-in object type
effectively store&search key-value pairs.
new$hash:?H
& (H..insert)$(Danmark.!Danish)
& (H..find)$Danmark
→ Danmark.(=(iso639=da) (spkrs=6M))
Expression Evaluation
Definition
Program flow
Pattern matching
Macro evaluation
λ – calculus
Normalization
Definition
Right hand side of = operator is not
evaluated.
pattern definition
J = ? ?#x ?;
function definition
double=.!arg+!arg; anonymous function
double$7
→ 14
(=.!arg+!arg)$7
→ 14
Program flow
Evaluation from left to right, depth first.
firstthis & thenthis
ifnotthis | thenthis
whl'(body)
fun$arg (or fun'arg)
!subroutine
Pattern matching
subject : pattern
Pattern matching evaluator
Normal evaluator
Pattern doesn’t evolve, side effects possible.
Primary result: subject/success or failure.
& operator: escape to normal evaluation.
@ prefix indicates string pattern matching.
string subject
side effect:
@( kabbatus
assignment
: ?
(?%x & rev$!x:?y)
!y
embedded pattern
?
escape to matching operation
normal
)
evaluation
normally
evaluated
string pattern matching
No regular expressions. Instead, use
@(string:pattern)
or
(tree:pattern)
nesting & recursion
regex pattern
no
yes
/yes
named variables
no
non-string subject
no
yes
yes/no
no
greedy
/yes
yes
Regex:
DO {stay in string world}
UNTIL (regex clear as mud)
THEN use other tool
Bracmat: A list, to begin with
XPath
XQuery
SQL
LINQ
CQL
tokenize input string → tree
WHILE(more and more interesting)
{ pattern match tree
& make more interesting tree
}
Macro evaluation
Replace variable by the value of the variable.
X=6;i=0;mltpls=;
Lhs: empty string
' ( whl
' ( !i+1:<10:?i
& !mltpls ""$X*!i:?mltpls
Lhs: empty string
)
)
: ?code;
Macro after evaluation.
lst$code;
(code=
' replaced by =
=
whl
' ( !i+1:<10:?i
& !mltpls 6 *!i:?mltpls
)
);
""$X replaced by 6
Macros
(1) speed up performance
tight loops
pattern matching over long lists,
(2) incrementally build the
”Greatest Common Pattern”
that matches a given set of
data structures. (Tree kernel)
(3) create or change code at run time
λ - calculus
The lambda abstraction
(λx.x)y
translates to
/('(x.$x))$y
For factorial and Fibonacci the hard way, see
http://rosettacode.org/wiki/Y_combinator#Bracmat
Normalization
,
WS
+
a
a
a
*
a
""
0
1
sort
a
a
combine
a
a
.
flatten
identity
^
\L
a
a
Normalization: flatten
(a,b,c),(d,e),f,g
,
,
,
a
,
,
,
b c
d e f g
→
a,b,c,d,e,f,g
,
→ a ,
b ,
c ,
d ,
e ,
f g
Because of associativity
Normalization: identity
"" a b c "" "" d "" → a b c d
0+a+b+c+0+0+d+0 → a+b+c+d
1*a*b*c*1*1*d*1 → a*b*c*d
Because "" 0 and 1 are neutral elements
Normalization: sort
f+a+b+g+c+e+d → a+b+c+d+e+f+g
f*a*b*g*c*e*d → a*b*c*d*e*f*g
e*a*b*c*d*f*g
3*b*c+a*b+a*e^(k+j)*i
↓
a*b+3*b*c+i*e^(j+k)*a
Because + and * are commutative
Normalization: combine
eat+the+the+zuppe → eat+2*the+zuppe
2*a*b+(a+-1*b)^2 → a^2+b^2
Because multiplication is
distributive over addition
Bracmat in use
Surface Appearance
Create a new program
Canonical lay-out
edit-save-reformat-reopen loop
Requirements
Build
Run code
Memory
Unicode support
SGML, XML, HTML support
Surface Appearance
Operators =.,|&: +*^\L\D'$_
white space!
Prefixes [~/#<>%@`?! and !!. Also Parentheses and quotes override default.
Comments: { yes, not stored in memory! }
Layout: free, convert to canonical with lst
Create a new program
(1) From scratch. Easy: no boiler plate.
(2) Wizzard project.bra creates skeleton:
description,
program stub,
facility to help you keep your code nicely
formatted.
Canonical lay-out
Principles:
- reasonable margins
- indented
- pair of parentheses aligned
(horizont|vertic)ally
- heading operator aligned w. parentheses
Some coding errors become conspicuous.
edit-save-reformat-reopen loop
Use combination of terminal window or
DOS-prompt and auto-reloading text
editor.
Edit your code, save, type !r in Bracmat,
(auto)reload program in editor.
{?} (diet =. !arg:? (fish|meat|chicken) ? & carnivore |
{1}
!arg:? (yoghurt|milk|cheese) ? & veggie
{1}
| !arg:? (fruit|nuts) ? & vegan | dontknow
{1}
)
{!} diet
S
0,00 sec
{?} lst$diet
(diet=
.
|
|
|
{!}
!arg:? (fish|meat|chicken) ?
& carnivore
!arg
: ? (yoghurt|milk|cheese) ?
& veggie
!arg:? (fruit|nuts) ?&vegan
dontknow);
diet
Requirements
C-compiler with standard libraries
32-bit or 64-bit OS
bracmat.c and xml.c (< 20 000 lines in all)
nothing more
Build
E.g. compile and link with GNU-C:
gcc -std=c99 -pedantic -Wall -O3 static -DNDEBUG -o bracmat
bracmat.c xml.c
Windows, dynamically linked, 32 bit: 100Kb
Linux, statically linked, 64 bit:
1 Mb
Run code
w/0 parameters: interactive (REPL)
each parameter is evaluated, left to right.
Bracmat as embedded s/w:
JNI – call Bracmat from Java
Linked with C or C++ code, Bracmat can
be called and can even call C-functions.
Memory
Nodes
Very compact, 4,8,12,16, … bytes per node
Reference counting
Structure sharing
Unreferenced nodes freed
Variables
dynamically scoped (lex.bra → lexical)
Unicode
UTF8 assumed, fall back to ISO-8859-1
Ø  ø in Unicode, UTF-8:
utf$low$chu$216 → 248
Ø  ø in ISO-8859-1:
asc$low$chr$216 → 248
upp$ру́
сский
→ РУ́
ССКИЙ
SGML, XML, HTML
Robust and fast parsing, done in C.
Transformation to Bracmat expression.
No nesting of elements. Separate step.
XML/HTML entities → UTF-8
get$("<p class='Q' enabled
>Sams&oslash;<br/</p>",MEM,HT,ML)
→ (p.(class.Q) (enabled.)) Samsø
(br.,) (.p.)