schemeInPython
Download
Report
Transcript schemeInPython
Scheme in
Python
Overview
We’ll look at how to implement a simple
scheme interpreter in Python
This is based on the Scheme in Scheme
interpreter we studied before
We’ll look at pyscheme 1.6, which was
implemented by Danny Yoo as an
undergraduate at Berkeley
Since Python doesn’t optimize for tail
recursion, he uses trampolining, which we’ll
introduce
What we need to do
A representation for Scheme’s native data
structures
• Pairs (aka, cons cells), symbols, strings,
numbers,Booleans
A reader that converts a stream of characters
into a stream of s-expressions
• We’ll introduce an intervening step reading
characters and converting to tokens
Implement various built-ins
• e.g., cons, car, +, …
What we won’t need to do
We can rely on Python for a number of very
useful things
• Representing numbers and strings
• Garbage collection
• Low level I/O
atoms
Atoms include strings, number, and symbols
We’ll use Python’s native representation for
string and numbers
Symbols in Scheme are interned – there is a
unique object for each symbol read
This is how they differ from strings,
which are not interned
• Note: some Lisp implementations intern small integers
Symbols
# A global dictionary that contains all known symbols
__INTERNED_SYMBOLS = {}
class __Symbol(str):
"""A symbol is just a special kind of string"""
def __eq__(self, other): return self is other
def symbol(s):
""""Returns symbol given string, creating new ones if needed”””
global __interned_symbols
if s not in __INTERNED_SYMBOLS:
__INTERNED_SYMBOLS[s] = __Symbol(s)
return __INTERNED_SYMBOLS[s]
# Here are definitions of symbols that we should know
scheme_false = symbol("#f")
scheme_true = symbol("#t")
__empty_symbol = Symbol("")
def isSymbol(s): return type(s) == type(__empty_symbol)
GCing Unused Symbols
If the only reference to a symbol is from the
global list of interned symbols, it can be
garbage collected
We’ll use Python’s weakref’s for this
A weak reference is a reference that doesn’t
protect an object from garbage collection
Objects referenced only by weak
references are considered unreachable
(or "weakly reachable") and may be
collected at any time
using weakrefs
import weakref
from UserString import UserString as __UserStr
…
__INTERNED_SYMBOLS =
weakref.WeakValueDictionary({})
…
class __Symbol(__UserStr):
…
if s not in __INTERNED_SYMBOLS:
# make a temp strong reference
newSymbol = __Symbol(s)
__INTERNED_SYMBOLS[s] = newSymbol
return __INTERNED_SYMBOLS[s]
Representing pairs
The core of scheme only has one kind of data
structure – lists– and it is made up out of pairs
What Python types should we use?
• A user defined class, Pair
• Lists
• Tuples
• Dictionary
• Closures
Aside: pairs as closures
Functions are very powerful
We can use them to represent cons cells or
pairs
We don’t want to do this in practice
But it shows the power of programming with
functions
(define (mycons theCar theCdr)
;; mycons returns a closure that takes a 2-arg function and
applies
;; it to the two remembered vlue's, i.e., the pair's car and cdr.
(lambda (f) (f theCar theCdr)))
(define (mycar cell)
;; mycar takes a pair closure and feeds it a 2-arg function that
;; just returns the first arg
(cell (lambda (theCar theCdr) theCar)))
(define (mycdr cell)
;; mycdr takes a pair closure and feeds it a 2-arg function that
;; just returns the first arg
(cell (lambda (theCar theCdr) theCdr)))
(define myempty
;; the empty list is just a function that always returns true.
(lambda (f) #t))
(define (mynull? cell)
;; a pair is not the empty list
(eq? cell myempty))
example
> (define p1 (mycons 1 (mycons 2 myempty)))
> p1
#<procedure:.../scheme/pairs.ss:1:31>
> (mycar p1)
1
> (mycdr p1)
#<procedure:.../scheme/pairs.ss:1:31>
> (mycar (mycdr p1))
2
> (mycdr (mycdr p1))
#<procedure:myempty>
Representing pairs
We’ll define a subclass of list to represent a pair
Class Pair(list) : pass
The cons functions creates a new cons cell with
a given car and cdr
def cons(car, cdr):
return Pair([car, cdr])
Defining built-in functions for pairs will be easy
def car(p): return p[0]
def cdr(p): return p[1]
def cadr(p): return car(cdr(p))
def set_car(p,x): p[0] = x
Lexical Analyzer
Consume a string of characters, identify
tokens, throw away comments and
whitespace, and return a list of remaining
tokens
Each token will be a (<type>, <token>) tuple
like (‘number’, ‘3.145’) or (‘comment’, ‘;; foo’)
Recognize tokens using regular expressions
We won’t worry about efficiency
Token regular expressions
PATTERNS = [
('whitespace', re.compile(r'(\s+)')),
('comment', re.compile(r'(;[^\n]*)')),
('(', re.compile(r'(\()')),
(')', re.compile(r'(\))')),
('dot', re.compile(r'(\.\s)')),
('number', re.compile(r'([+\]?(?:\d+\.\d+|\d+\.|\.\d+|\d+))')),
('symbol', re.compile(r'([a-zAZ\+\=\?\!\@\#\$\%\^\&\*\\/\.\>\<][\w\+\=\?\!\@\#\$\%\^\&\*\-\/\.\>\<]*)')),
('string', re.compile(r'"(([^\"]|\\")*)"')),
('\'', re.compile(r'(\')')),
('`', re.compile(r'(`)')),
(',', re.compile(r'(,)')) ]
Lex Examples
>>> from lex import *
>>> tokenize("")
[(None, None)]
>>> tokenize(" 1 2. .3 1.3 -4")
[('number', '1'), ('number', '2.'), ('number',
'.3'), ('number', '1.3'), ('number', '-4'),
(None, None)]
>>> tokenize('foo 12.3foo +')
[('symbol', 'foo'), ('number', '12.3'),
('symbol', 'foo'), ('symbol', '+'), (None,
None)]
>>> tokenize('(foo (bar ()))')
[('(', '('), ('symbol', 'foo'), ('(', '('),
('symbol', 'bar'), ('(', '('), (')', ')'),
(')', ')'), (')', ')'), (None, None)]
Raw string notation
>>> s = ‘\nfoo\n’
>>> s
'\nfoo\n'
>>> print s
foo
>>> s = r'\nfoo\n'
>>> s
'\\nfoo\\n'
>>> print s
\nfoo\n
tokenize()
def tokenize(s):
toks = []
found = True
while s and found:
found = False
for type, regex in PATTERNS:
match_obj = regex.match(s)
if match_obj:
if type not in ('whitespace', 'comment'):
toks.append((type, match_obj.group(1)))
s = s[match_obj.span()[1] :]
found = True
break
if not found:
print "\nNo match'", s, ”’ – tokenize”
toks.append(EOF_TOKEN)
return tokens
tokenize() examples
>>> from lex import *
>>> tokenize('(a 1.0)')
[('(', '('), ('symbol', 'a'), ('number',
'1.0'), (')', ')'), (None, None)]
>>> tokenize('(define (add1 x)(+ x 1))')
[('(', '('), ('symbol', 'define'), ('(', '('),
('symbol', 'add1'), ('symbol', 'x'), (')',
')'), ('(', '('), ('symbol', '+'), ('symbol',
'x'), ('number', '1'), (')', ')'), (')',
')'), (None, None)]
parse
Consume a sequence of tokens and produce
a sequence of s-expressions
Use a recursive descent parser
We’ll handle just a few special cases, namely
quote and backquote and dotted pairs
Peeking and eating
def peek(tokens):
"""Take a quick glance at the first token
in our tokens list.""”
if len(tokens) == 0:
raise ParserError, "While peeking:
ran out of tokens.”
return tokens[0]
Peeking and eating
def eat(tokens, desired_type):
"""If the type of the next token is desired_type,
pop it from the list and return it, else return
False”””
if len(tokens) == 0:
raise ParserError, 'No tokens left,
seeking ' + desired_type
return tokens.pop(0)
if tokens[0][0] == desired_type
else False
Peeking and eating
def eat_safe(tokens, tokenType):
"""Digest the first token in our tokens list, making
sure that we're
biting on the right tokenType of thing.""”
if len(tokens) == 0:
raise ParserError, "While trying to eat %s: ran
out of tokens." % tokenType )
if tokens[0][0] != tokenType:
raise ParserError, "Seeking %s got %s" %
(tokenType, tokens[0])
return tokens.pop(0)
def parseExpression(tokens):
if eat(tokens, '\''):
return cons(symbol('quote'),
cons(parseExpression(tokens), NIL))
if eat(tokens, '`'):
return cons(symbol('quasiquote'),
cons(parseExpression(tokens), NIL))
elif eat(tokens, ','):
return cons(symbol('unquote'),
cons(parseExpression(tokens), NIL))
elif eat(tokens, '('):
return parse_list_members(tokens)
elif peek(tokens)[0] in
('number’,'symbol’,'string'):
return parse_atom(tokens)
else:
raise ParserError, ”Parsing: no alternatives"
parse
parse_list_members()
def parse_list_members(tokens):
if eat(tokens, 'dot'):
final = parseExpression(tokens)
eat_safe(tokens, ')')
return final
if peek(tokens)[0] in ('\'’,'`’,',’,'(’,
'number’,'symbol’,'string'):
return cons(parseExpression(tokens),
parse_list_members(tokens))
if eat(tokens, ')'):
return NIL
raise ParserError, "Can't finish list” + tokens
Recursive descent parsing
Remember one problem with recursive
descent parsing is that the grammar has to be
right recursive
Another potential problem is recursing too
deeply and exceeding the limit on the stack
But maybe we can use tail recursion, which an
interpreter or compiler can recognize and
execute as iteration?
Not in Python
Python doesn’t optimize tail recursion
def fact0(n):
# iterative facorial
result = 1
while n>1:
result *= n
n -= 1
return result
def fact1(n):
# simple recursive factorial
return 1 if n==1 else n*fact2(n - 1)
def fact2(n, result=1):
# tail recursive factorial
return result if n==1 else fact2(n-1, n*result)
Try this
http://www.csee.umbc.edu/331/fall08/0101/code/python/
pyscheme-1.7/src/fact.py
Default limit is 999
fact2(1000) and fact3(1000) both die
>>> fact2(1000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "fact.py", line 17, in fact2
return result if n==1 else fact2(n-1, n*result)
File "fact.py", line 17, in fact2
…
File "fact.py", line 17, in fact2
return result if n==1 else fact2(n-1, n*result)
RuntimeError: maximum recursion depth exceeded
How to solve this?
You can set the maximum recursion depth
higher
>>> import sys
>>> sys.getrecursionlimit()
1000
>>> sys.setrecursionlimit(10000)
>>> fact2(1100)
53437084880926377034242155 ... 00000000L
But this is not a general solution
And Guido is on the record as not wanting to
optimize tail recursion
• http://www.artima.com/forums/flat.jsp?forum=106&thread=147358
Trampoline Style
A trampoline is a loop that iteratively invokes
thunk-returning functions
A thunk is just a a piece of code to perform a
delayed computation (e.g., a closure)
A single trampoline can express all control
transfers of a program
Converting a program to trampolined style is
trampolining
This is kind of continuation passing style of
programming
Trampolined functions can do tail recursive
function calls in stack-oriented languages
Trampolining is one answer
A way to program using CPS, Continuation
Passing Style
CPS is a style of programming where control is
passed explicitly as continuations
Trampolining is a simple way to
eliminate recursion
We’ll use a simple kind of trampolining
Instead of making a recursive call, a
procedure can bounce back up to its
caller with a continuation, which can
be called to proceed with the computation
Pogo
from pogo import pogo, land, bounce
def fact3(n):
# factorial in a trampolined style
return pogo(fact_tramp(n))
def fact_tramp(n, result=1):
return land(result) if n==1 else
bounce(fact_tramp, n-1, n*result)
Variable length argument lists
>>> def foo(*args):
print "Number of arguments:", len(args)
print "Arguments are: ", args
>>> foo(1,2,3,'d',5)
Number of arguments: 5
Arguments are: (1, 2, 3, 'd', 5)
>>> def bar(arg1, *rest):
print …
pogo.py
def bounce(function, *args):
"""Returns new trampolined value that
continues bouncing"""
return ('bounce', function, args)
def land(value):
"""Returns new trampolined value that
lands off trampoline"""
return ('land', value)
It works
>>> sys.setrecursionlimit(10)
>>> fact3(100)
93326215443944152681699238856266700490715
9682643816214685929638952175999932299156
0894146397615651828625369792082722375825
1185210916864000000000000000000000000L
>>> fact3(1000)
4023872600770937735...00000000000000L
pogo.py
def pogo(bouncer):
try:
while True:
if bouncer[0] == 'land’:
return bouncer[1]
elif bouncer[0] == 'bounce':
bouncer = bouncer[1](*bouncer[2])
else:
traceback.print_exc()
raise TypeError, "not a bouncer”
except TypeError:
traceback.print_exc()
raise TypeError, "not a bouncer”
See pyscheme1.6
Pyscheme1.6 is written in trampoline style
Which was done by hand, as opposed to
using an automatic trampoliner
And which I’ve been undoing by hand
def eval(exp, env):
return pogo.pogo(teval(exp, env, pogo.land))
def teval(exp, env, cont):
if expressions.isIf(exp):
return evalIf(exp, env, cont)
…
def evalIf(exp, env, cont):
def c(predicate_val):
if isTrue(predicate_val):
return teval(ifConsequent(exp), env, cont)
else:
return teval(ifAlternative(exp), env, cont)
return teval(expressions.ifPredicate(exp), env, c)
eval
def eval(exp, env):
if exp.isSelfEvaluating(exp): return exp
if exp.isVariable(exp):
return env.lookupVariableValue(exp, env)
if exp.isQuoted(exp): return evalQuoted(exp, env)
if exp.isAssignment(exp):
return evalAssignment(exp, env)
if exp.isDefinition(exp):
return evalDefinition(exp, env)
if exp.isIf(exp): return evalIf(exp, env)
if exp.isLambda(exp):
return exp.makeProcedure(exp.lambdaParameters(exp),
exp.lambdaBody(exp), env)
if exp.isBegin(exp):
return evalSequence(exp.beginActions(exp), env)
if exp.isApplication(exp):
return evalApplication(exp, env)
raise SchemeError, "Unknown expr, eval " + str(exp)
apply
def apply(procedure, arguments, env):
if exp.isPrimitiveProcedure(procedure):
return applyPrimProc(procedure, arguments, env)
if exp.isCompoundProcedure(procedure):
newEnv = env.extendEnvironment(
exp.procedureParameters(procedure),
arguments,
exp.procedureEnvironment(procedure))
return evalSequence(exp.procedureBody(procedure),
newEnv)
raise SchemeError, "Unknown proc - apply " +
str(procedure)
Environments
An environment will be a list of frames
Each frame will be a Python dictionary with
the variable names as keys and their values
as values
THE_EMPTY_ENVIRONMENT = []
env
def enclosingEnvironment(env): return env[1:]
def firstFrame(env): return env[0]
def extendEnvironment(var_pairs, val_pairs, base):
new_frame = {}
vars = toPythonList(var_pairs)
vals = toPythonList(val_pairs)
if len(vars) != len vals:
raise SchemeError, "Mismatched vals and vars"
for (var, val) in zip(vars, vals):
new_frame[var] = val
return new_frame + base_env
Lookup a Variable Value
def lookupVariableValue(var, env):
while True:
if env == THE_EMPTY_ENVIRONMENT:
raise SchemeError,"Unbound var “+var
frame = firstFrame(env)
if frame.has_key(var):
return frame[var]
env = enclosingEnvironment(env)
Define/Set a Variable
def defineVariable(var, val, env):
firstFrame(env)[var] = val
def setVariableValue(var, val, env):
while True:
if env == THE_EMPTY_ENVIRONMENT:
raise SchemeError, "Unbound variable -- SET! " + var
top = firstFrame(env)
if top.has_key(var):
top[var] = val
return
env = enclosingEnvironment(env)
Builtins
We’ll define a Python function to handle each of
the primitive Scheme functions
Many List functions take any number of args:
• (+ 1 2) => 3
• (+ 1 2 3 4 5) => 15
• (+ ) => 0
We can takuse Python’s (new) syntax for
functions that take any number or args, e.g.:
• If the last parameter in a function’s parameter list is
preceded by a *, it’s bound to a list of the remaining args
• def add (*args): sum(args)
Builtins
def allNumbers(numbers):
for n in numbers:
if type(n) not in (types.IntType, types.LongType,
types.FloatType):
return 0
return 1
def schemeAdd(*numbers):
if not allNumbers(numbers):
raise SchemeError, "prim + - non-numeric arg”
return sum(numbers)
Setting up the initial environment
def setupEnvironment():
PRIME_PROCEDURES = [ ["car", pair.car],
["cdr", pair.cdr], ["+", schemeAdd], ... ]
init_env = env.extendEnvironment(
pair.NIL, pair.NIL, env.THE_EMPTY_ENVIRONMENT)
for name, proc in PRIME_PROCEDURES:
p = cons(symbol("primitive"), cons(proc, NIL))
defineVariable(symbol(name), p, env)
defineVariable(symbol("#t"),symbol("#t"), init_env)
defineVariable(symbol("#f"), symbol("#f"), init_env)
return initial_environment