A Monadic-Memoized Solution for Left

Download Report

Transcript A Monadic-Memoized Solution for Left

A Monadic-Memoized Solution for
Left-Recursion Problem of
Combinatory Parser
Rahmatullah Hafiz
60-520 Fall, 2005
Outline
Part1: Basic Concepts
Part2: Related Works
Part3: Our Solution
Parsing
Process of determining if input string can be
recognized by a set of rules for a specific language
Parser = Program that does parsing
exp
Input
Top-Down Parsing
“2*5+6”
Rules
digit
exp -> digit op Exp |digit
digit -> 0|1|..|9
Op-> *|+
2
op
digit
exp
op
exp
digit
*
5
+
6
Top-Down Parsing
 Recognition of Input starts at root and
proceeds towards leaves
 Left to Right recognition
 Recursive-decent (back-tracking) parsing
If one rule fails then try another rule
recursively
 Comparatively easy to construct
 Exponential in worst-case
Combinatory Parser
Parsers written in functional languages
Lazy-Functional Languages
(Miranda, Haskell)
Can be used to parse
Natural Language (English)
Formal Language (Haskell)
Need to follow some rules
 Context-Free Grammar
Why Lazy-Functional Language
Modular code and easy to implement
Higher-Order functions can represent
BNF notations of CFG
Functions are First-Class citizens
Suitable for Top-Down Recursive-Decent
fully backtracking parsing
Higher-Order functions
Input/Output arguments could be
another function(s)
*Frost and Launchbury (JFP, 1989)—NLI in Miranda
*Hutton (JFP, 1992 )— Uses of Combinatory Parser
Example
BNF notation of CFG
s ::= a s|empty
We can read it like
“s is either a then s or empty”
=> s = a then s or empty
**Possible to write parsers exactly like this in LFL
using-higher order functions
Example (cont.)
--s :: a s|empty
empty input
a (x:xs)
(p `or` q ) input
(p `then` q) input
=
=
=
=
[input]
if x==‘a’ then xs else []
p input ++ q input
if r == []
then []
else q r
where r = p input
s input = (a `then` s `or` empty) input
--*Main> s "aaa“
--["","a","aa","aaa"]
Problem with Left-Recursive Grammar
Left-recursive grammar
s :: s a|a
Input “aaa”
aaa s
aaa
aaa
s
a
a
s
a
aa
s
(“a”, “aa”)
s
a
a
aaa
s
Right-recursive grammar
s :: a s|a
Input “aaa”
s aaa
(“aa”, “a”)
a
Never
Terminates
Terminates
“”
a
(“aaa”, “”)
Example
--s :: s a|empty
empty input
a (x:xs)
(p `or` q ) input
(p `then` q) input
=
=
=
=
[input]
if x==‘a’ then xs else []
p input ++ q input
if r == []
then []
Never
else q r
Terminates
where r = p input
s input = (s `then` a `or` empty) input
--*Main> s "aaa“
--(*** Exception: stack overflow
Why bother for Left-Recursive Parsing?
 Easy to implement
 Very modular
 In case of Natural Language Processing
 Expected ambiguity can be achieved easily
 Non-left Recursive grammars might not
generates all possible parses
 As left recursive grammar’s root grows all the
way to the bottom level, it ensures all possible
parsing
Related Approaches
 Approach leads to exponential time complexity
Lickman (1995) – Fixed point solution
Most of the approaches deals with
Transforming left-recursive grammar to non-left
recursive grammar
Frost (1992) – Guarding left-production with non- left
productions
Hutton (1992, 1996) – Grammar transformation
Related Approaches
Transforming left-recursive grammar to non-left recursive
grammar

violates semantic rules

structure of parse trees are completely different
Example
s::sa
s::bs`
[rule 1]
[rule ??]
|b
s`::as`|empty
s
s
[rule 2]
[rule ??]
a
s
s
b
S`
b
a
a
S`
a
empty
Our Approach: Monadic-Memoization
*Frost and Hafiz (2005)
The idea is simple
Let not the parse tree grow infinitely
Left-recursive grammar
s :: s a
|a
aaa
Input “aaa”
Parser “fails”
when
Depth >= length
aaa
s
aaa
s
a
s
a
a
s
Depth=1
Length=3
Depth=2
Length=3
Depth=3
Length=3
*Wadler (1985)-- How to replace failure by a list of successes
Monadic-Memoization (cont.)
aaa
Left-recursive grammar
s :: s a
|a
aaa
Input “aaa”
As the recognizer
fails, the
production rule
‘s::sa’ fails too
Control goes to
upper level with
‘failure’
‘backtracking’,
parser tries
‘alternatives’
aaa
s
s
a
Depth=1
Length=3
a
a
s
s
Depth=2
Length=3
Depth=3
Length=3
If Alternative rule succeeds
Control goes to upper level with left
un-recognized inputs
Recursive procedure
Monadic-Memoization (cont.)
Left-recursive grammar
s :: s a
|a
Input “aaa”
“”
aa
s
s [fail]
[fail]
s
s
a
Depth=1
Length=3
a
a
a
Depth=2
Length=3
Depth=3
Length=3
Monadic-Memoization (cont.)
 This approach is applicable in Mixed-Environment
 Grammar may contain
 Left-recursive production
 non Left-recursive production
s :: s a | a
a :: b a | b
 During parsing execution of one rule for same
input may occur multiple time
 Also need to keep track of input length and depth
Top-Down Back-trucking is Exponential
Monadic-Memoization (cont.)
 Memoization is helpful
 The idea is
Checking a ‘Memo’ table along with input to
each recursive call
 ‘Memo’ table contains
List of previous parsed outputs for any input,
paired with appropriate production
Length and depth of current parse
 Before parsing any input
if “lookup to Memo table fails”
then “perform parsing & update the memo
table”
else “return the result from table”
Monadic-Memoization (cont.)
s
s
“aa” a
b
a
“aa”
=(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])])
Monadic-Memoization (cont.)
 Memoization reduces worst-case time complexity
from exponential to O(n3)
 The problem is
Lazy-functional languages don’t let variable
updating or keeping a global storage for the whole
program
 Need to pass around the ‘Memo’ table so that
 All recursive parsing calls access ‘Memo’ table
if ‘Memo’ table is used as the function arguments
Code gets messy and error-prone
Monadic-Memoization (cont.)
Or we can use ‘Monad’
*Derived from “Category Theory” -- Moggi (1989)
*S.E approaches for LFLs -- Wadler (1990)
 State, exception, I/O etc of LFL
*Monadic Framework for Parsing –-Frost (2003)
 Reusable
 Complex tasks could be achieved
by adding/modifying existing monadic objects
 Structured computation
Monadic-Memoization (cont.)
Monad is a triple (M, unit, bind)
 ‘M’ type constructor
memo = [([Char],[([Char],[[Char]])],[(Int,Int)])]
M inp = memo -> (inp, memo)
 ‘unit’ :: aM a
takes a value and returns the computation of the value
Works as a ‘container’
 ‘bind’ :: M a  (a  M b)  M b
applies the computation ‘a  M b’ to the computation ‘M
a’ and returns a computation ‘M b’
Ensures sequential computation
Monadic-Memoization (cont.)
The mental picture of ‘how monad works’
Monad is a triple (M, unit, bind)
M = Loader
unit = tray
bind = combiner
*Picture source Newbern (2003)
Monadic-Memoization (cont.)
 Transform combinatory parsers into Monadic object
Example Original “Or” recognizer
(p `or` q ) inp
= p inp ++ q inp
Monadic version
(p `or` q) inp = p inp `bindS` f
where f m = q inp`bindS`g
where g n = unitS(nub(m ++ n))
Check LR
S::a or b
Input = “ab”
lookup Memo
a
b
Memo1
“ab”
update Memo
Monadic
Object
‘Or’
Parsed out
Memo2
Monadic-Memoization (cont.)
s
“aa”
s
“aa”
a
b
a
“aa”
Memo table propagation is ensured correctly => O(n3)
=(["","a","aa"],[(“a",[("aa",["","a","aa"]),("a",["","a"]),(10,11)])])
References
1.
2.
3.
4.
5.
6.
7.
8.
Frost, R. A. and Launchbury, E. J. (1989) Constructing natural language
interpreter in a lazy functional language. The computer Journal –
Special edition on Lazy functional Programming, 32(2) 3-4
Wadler, P. (1992) Monads for functional programming. Computer and
systems sciences, Volume 118
Hutton, G. (1992) Higher-order functions for parsing. Journal of
Functional Programming, 2(3):323-343, Cambridge University Press
Frost, R. A. (1992)Guarded attribute grammars: top down parsing and
left recursive productions. SIGPLAN Notices 27(6): 72-75
Lickman, P. (1995) Parsing With Fixed Points. Masters thesis. Oxford
University
Frost, R.A., Szydlowski, B. (1996) Memoizing Purely Functional TopDown Backtracking Language Processors. Sci. Comput. Program. 27(3):
263-288
Hutton, G. (1998) Monadic parsing in Haskell. Journal of Functional
Programming, 8(4):437-444, Cambridge University Press
Frost, R.A.(2003) Monadic Memoization towards Correctness-Preserving
Reduction of Search. Canadian Conference on AI 2003: 66-80