A Monadic-Memoized Solution for Left
Download
Report
Transcript A Monadic-Memoized Solution for Left
A Monadic-Memoized Solution for
Left-Recursion Problem of
Combinatory Parser
Rahmatullah Hafiz
60-520 Fall, 2005
Outline
Part1: Basic Concepts
Part2: Related Works
Part3: Our Solution
Parsing
Process of determining if input string can be
recognized by a set of rules for a specific language
Parser = Program that does parsing
exp
Input
Top-Down Parsing
“2*5+6”
Rules
digit
exp -> digit op Exp |digit
digit -> 0|1|..|9
Op-> *|+
2
op
digit
exp
op
exp
digit
*
5
+
6
Top-Down Parsing
Recognition of Input starts at root and
proceeds towards leaves
Left to Right recognition
Recursive-decent (back-tracking) parsing
If one rule fails then try another rule
recursively
Comparatively easy to construct
Exponential in worst-case
Combinatory Parser
Parsers written in functional languages
Lazy-Functional Languages
(Miranda, Haskell)
Can be used to parse
Natural Language (English)
Formal Language (Haskell)
Need to follow some rules
Context-Free Grammar
Why Lazy-Functional Language
Modular code and easy to implement
Higher-Order functions can represent
BNF notations of CFG
Functions are First-Class citizens
Suitable for Top-Down Recursive-Decent
fully backtracking parsing
Higher-Order functions
Input/Output arguments could be
another function(s)
*Frost and Launchbury (JFP, 1989)—NLI in Miranda
*Hutton (JFP, 1992 )— Uses of Combinatory Parser
Example
BNF notation of CFG
s ::= a s|empty
We can read it like
“s is either a then s or empty”
=> s = a then s or empty
**Possible to write parsers exactly like this in LFL
using-higher order functions
Example (cont.)
--s :: a s|empty
empty input
a (x:xs)
(p `or` q ) input
(p `then` q) input
=
=
=
=
[input]
if x==‘a’ then xs else []
p input ++ q input
if r == []
then []
else q r
where r = p input
s input = (a `then` s `or` empty) input
--*Main> s "aaa“
--["","a","aa","aaa"]
Problem with Left-Recursive Grammar
Left-recursive grammar
s :: s a|a
Input “aaa”
aaa s
aaa
aaa
s
a
a
s
a
aa
s
(“a”, “aa”)
s
a
a
aaa
s
Right-recursive grammar
s :: a s|a
Input “aaa”
s aaa
(“aa”, “a”)
a
Never
Terminates
Terminates
“”
a
(“aaa”, “”)
Example
--s :: s a|empty
empty input
a (x:xs)
(p `or` q ) input
(p `then` q) input
=
=
=
=
[input]
if x==‘a’ then xs else []
p input ++ q input
if r == []
then []
Never
else q r
Terminates
where r = p input
s input = (s `then` a `or` empty) input
--*Main> s "aaa“
--(*** Exception: stack overflow
Why bother for Left-Recursive Parsing?
Easy to implement
Very modular
In case of Natural Language Processing
Expected ambiguity can be achieved easily
Non-left Recursive grammars might not
generates all possible parses
As left recursive grammar’s root grows all the
way to the bottom level, it ensures all possible
parsing
Related Approaches
Approach leads to exponential time complexity
Lickman (1995) – Fixed point solution
Most of the approaches deals with
Transforming left-recursive grammar to non-left
recursive grammar
Frost (1992) – Guarding left-production with non- left
productions
Hutton (1992, 1996) – Grammar transformation
Related Approaches
Transforming left-recursive grammar to non-left recursive
grammar
violates semantic rules
structure of parse trees are completely different
Example
s::sa
s::bs`
[rule 1]
[rule ??]
|b
s`::as`|empty
s
s
[rule 2]
[rule ??]
a
s
s
b
S`
b
a
a
S`
a
empty
Our Approach: Monadic-Memoization
*Frost and Hafiz (2005)
The idea is simple
Let not the parse tree grow infinitely
Left-recursive grammar
s :: s a
|a
aaa
Input “aaa”
Parser “fails”
when
Depth >= length
aaa
s
aaa
s
a
s
a
a
s
Depth=1
Length=3
Depth=2
Length=3
Depth=3
Length=3
*Wadler (1985)-- How to replace failure by a list of successes
Monadic-Memoization (cont.)
aaa
Left-recursive grammar
s :: s a
|a
aaa
Input “aaa”
As the recognizer
fails, the
production rule
‘s::sa’ fails too
Control goes to
upper level with
‘failure’
‘backtracking’,
parser tries
‘alternatives’
aaa
s
s
a
Depth=1
Length=3
a
a
s
s
Depth=2
Length=3
Depth=3
Length=3
If Alternative rule succeeds
Control goes to upper level with left
un-recognized inputs
Recursive procedure
Monadic-Memoization (cont.)
Left-recursive grammar
s :: s a
|a
Input “aaa”
“”
aa
s
s [fail]
[fail]
s
s
a
Depth=1
Length=3
a
a
a
Depth=2
Length=3
Depth=3
Length=3
Monadic-Memoization (cont.)
This approach is applicable in Mixed-Environment
Grammar may contain
Left-recursive production
non Left-recursive production
s :: s a | a
a :: b a | b
During parsing execution of one rule for same
input may occur multiple time
Also need to keep track of input length and depth
Top-Down Back-trucking is Exponential
Monadic-Memoization (cont.)
Memoization is helpful
The idea is
Checking a ‘Memo’ table along with input to
each recursive call
‘Memo’ table contains
List of previous parsed outputs for any input,
paired with appropriate production
Length and depth of current parse
Before parsing any input
if “lookup to Memo table fails”
then “perform parsing & update the memo
table”
else “return the result from table”
Monadic-Memoization (cont.)
s
s
“aa” a
b
a
“aa”
=(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])])
Monadic-Memoization (cont.)
Memoization reduces worst-case time complexity
from exponential to O(n3)
The problem is
Lazy-functional languages don’t let variable
updating or keeping a global storage for the whole
program
Need to pass around the ‘Memo’ table so that
All recursive parsing calls access ‘Memo’ table
if ‘Memo’ table is used as the function arguments
Code gets messy and error-prone
Monadic-Memoization (cont.)
Or we can use ‘Monad’
*Derived from “Category Theory” -- Moggi (1989)
*S.E approaches for LFLs -- Wadler (1990)
State, exception, I/O etc of LFL
*Monadic Framework for Parsing –-Frost (2003)
Reusable
Complex tasks could be achieved
by adding/modifying existing monadic objects
Structured computation
Monadic-Memoization (cont.)
Monad is a triple (M, unit, bind)
‘M’ type constructor
memo = [([Char],[([Char],[[Char]])],[(Int,Int)])]
M inp = memo -> (inp, memo)
‘unit’ :: aM a
takes a value and returns the computation of the value
Works as a ‘container’
‘bind’ :: M a (a M b) M b
applies the computation ‘a M b’ to the computation ‘M
a’ and returns a computation ‘M b’
Ensures sequential computation
Monadic-Memoization (cont.)
The mental picture of ‘how monad works’
Monad is a triple (M, unit, bind)
M = Loader
unit = tray
bind = combiner
*Picture source Newbern (2003)
Monadic-Memoization (cont.)
Transform combinatory parsers into Monadic object
Example Original “Or” recognizer
(p `or` q ) inp
= p inp ++ q inp
Monadic version
(p `or` q) inp = p inp `bindS` f
where f m = q inp`bindS`g
where g n = unitS(nub(m ++ n))
Check LR
S::a or b
Input = “ab”
lookup Memo
a
b
Memo1
“ab”
update Memo
Monadic
Object
‘Or’
Parsed out
Memo2
Monadic-Memoization (cont.)
s
“aa”
s
“aa”
a
b
a
“aa”
Memo table propagation is ensured correctly => O(n3)
=(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])])
References
1.
2.
3.
4.
5.
6.
7.
8.
Frost, R. A. and Launchbury, E. J. (1989) Constructing natural language
interpreter in a lazy functional language. The computer Journal –
Special edition on Lazy functional Programming, 32(2) 3-4
Wadler, P. (1992) Monads for functional programming. Computer and
systems sciences, Volume 118
Hutton, G. (1992) Higher-order functions for parsing. Journal of
Functional Programming, 2(3):323-343, Cambridge University Press
Frost, R. A. (1992)Guarded attribute grammars: top down parsing and
left recursive productions. SIGPLAN Notices 27(6): 72-75
Lickman, P. (1995) Parsing With Fixed Points. Masters thesis. Oxford
University
Frost, R.A., Szydlowski, B. (1996) Memoizing Purely Functional TopDown Backtracking Language Processors. Sci. Comput. Program. 27(3):
263-288
Hutton, G. (1998) Monadic parsing in Haskell. Journal of Functional
Programming, 8(4):437-444, Cambridge University Press
Frost, R.A.(2003) Monadic Memoization towards Correctness-Preserving
Reduction of Search. Canadian Conference on AI 2003: 66-80