Code Monkey home page Code Monkey logo

lisp-interpreter's Introduction

lisp-interpreter

A bare bones lisp parser & interpreter. A demo of the interpreter compiled to WASM can be found at marcusklaas.nl/lisp.

Build Status

Features

Parses and evaluates simple lisp-like statements. Its features include lambdas, closures and currying. All data is immutable and the only types availables are unsigned integers, booleans, functions and lists. The interpreter simulates its own stack, so recursion is not bounded by the stack size of the interpreter.

Available built-in functions:

function name arguments output type
add1 int int
sub1 positive int int
zero? int bool
car non-empty list *
cdr non-empty list list
null? list bool
cons *, list list
define name, * empty list
lambda list, * function
cond bool, *, * *
list * list
int? * bool
bool? * bool
list? * bool
fun? * bool

Further, the main binary introduces some convenience functions, including add, mult, map, filter, >, sort, append, not and and. These are defined in terms of the built-in functions above.

Example evaluations:

> (add1 (add1 3))
5
> (define sub (lambda (x y) (cond (zero? y) x (sub (sub1 x) (sub1 y)))))
()
> (list sub)
(func[2 -> t(func[cond] (func[zero?] $1) $0 t(sub (func[sub1] $0) (func[sub1] $1)))])
> (sub 5 3)
2
> (sub 10)
func[1 -> t(func[2 -> t(func[cond] (func[zero?] $1) $0 t(sub (func[sub1] $0) (func[sub1] $1)))] 1 $0)]
> (:last 1)
9
> (sub 1 5)
Evaluation error: SubZero

Installation

Make sure you have rust installed:

$ curl https://sh.rustup.rs -sSf | sh

Clone the repo

$ git clone https://github.com/marcusklaas/lisp-interpreter.git

Build using cargo

$ cd lisp-interpreter
$ cargo build --release

To start the command-line REPL, run

$ cd yalp-repl
$ cargo run --release

Execute the test suite using

$ cargo test

Some technical details

This lisp interpreter has a fairly simple design. It goes through the usual steps in the interpretation: it parses input strings into a sequence of tokens, builds an AST, does some (light) analysis on this AST to produce a "finalized" AST and finally compiles this into bytecode. The unit of compilation and execution is the function. Execution is done by keeping a stack of values, onto which function arguments and return value are pushed and popped. Every time a function is called, a new stack reference is created, which contains a pointer to the function's bytecode, an instruction pointer, and the position of its arguments on the stack. When a function calls another function, its own stack reference is pushed onto the reference stack (unless analysis showed that this call is a tail-call) and the current stack reference is replaced by that of the callee. Whenever a function returns, the stack reference of the calling function is popped off the reference stack and execution continues there.

This interpreter does not use a garbage colllector to keep the design simple. Functions are reference counted and all other values are cloned or moved. Mutation of values is not possible, although mutation does happen at execution time as an optimization. There is a single environment that holds definitions. Definitions cannot be overwritten.

Because the set of buitl-in functions is so sparse, writing performant code for this interpreter is generally not possible. However, it does perform elementary operations relatively quickly. For example, the prelude function add, which recursively adds 1 to the first argument and subtracts 1 from the second until the second argument is zero is about twice as fast as the following loop in PHP 7.1.8:

function add(int $a, int $b): int {
    while($b > 0) {
        $a += 1;
        $b -= 1;
    }
    return $a;
}

However, it is about five times slower than a similar loop in V8.

Code that heavily relies on list operations should be reasonably fast, as it is internally represented by Vectors and not by Linked Lists, as is common for many other interpreters. Again, since there are few built-in functions, common operations like appending lists is not optimized (it does a list reversal as an intermediate step!) and is way slower than it could be with a better optimizing compiler.

Adding loop analysis and mathy substitution rules (repeated increments = addition, repeated addition = multiplication, etc.) would be a cool project that could significantly speed up common operations.

lisp-interpreter's People

Contributors

marcusklaas avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

zjsxwc

lisp-interpreter's Issues

Unreverse bytecode execution order

We tried this before and decided against it because an inexplicable performance hit, but we should reconsider. This has the potential to significantly clean up the instruction generation code and should (theoretically) perform as well as the reverse execution order.

Add FinalizedExpr type

Fully processed expression that is ready to be compiled. Should have no variable references any more. Tail/ self calls should be resolved. Each macro should have its own type.

We need this because we're doing the same checks multiple times when processing expressions.

Reverse lists

cons and car now take off the last element of lists instead of the last. This is very unorthodox in lisps. We should fix this.

use library for parsing

Writing a parser from scratch has been fun, but using a parser combinator or parser generator may be more maintainable. Try nom or lalrpop.

Church numerals do not work

(((lambda (n f x) (f (n f x))) (lambda (f x) x)) add1 0) results in a NonFunctionApplication error when it should return 1.

It tries to apply a false to some value, which is probably an indication that we've messed up the move analysis.

add a list function

So we can have things like (x y a) always represent function application.

Add move value instruction

Swap a dummy value onto the return stack and swap it. Only do this when the argument is used no more than once in a function!

Add recurse instruction

We're spending a lot of time getting and cloning functions - there's no need for that when we're straight up recursing.

implement trampolining

since we run out of stack space pretty quickly. we'll have to emulate our own stack on the heap and run evaluation in a loop? not sure of the implementation details yet. we should probably have state execute evaluations instead of regular functions? not sure if that would prevent stackgrowth..

Do list splits more efficiently

Expressions that both car and cdr the same list are common. This split can be done more efficiently without cloning that it is currently done (using at least one clone).

Move through instructions instead of pushing/ popping

The current approach works OK, but it has drawbacks.

Firstly, the instruction stack grows very large for deeply recursive calls. This is because we now push the instructions for both branches of a condition.

Further, we clone a vector onto the instruction stack for each function call. Since instructions are not Copy, this is pretty expensive.

It's not immediately clear what the ideal solution here is. We cannot precompile the bytecode for all functions and add them to a large vector since new functions can (only!) be made during runtime. Just pushing to a big vector whenever a function is created at runtime may work, but this may overflow when many one-shot functions are created, for example in (map (lambda (x) (lambda (y) (list x y))) (range 1 10000)).

Ideally, we don't ever clone instructions and yet throw away that which we know won't run any more. One way we may do this is by introducing (another!) stack with an Rc<Vec<Instr>> and usize, where the usize points to current instruction in the instruction vector. Instead of creating two more vectors, these values should probably be combined with the value stack pointer stack into one vector of structs, containing instruction pointer, value pointer and current instruction vector.

Introduce more efficient instruction storage

Instruction vectors are now Rc<RefCell<Vec<Instr>>>, which is inefficient since it may be as many as three indirections to get a vector and some runtime checking (because of the RefCell). We should probably introduce a new type which combines the properties of Rc, RefCell and Vec. It should be possible to have only a single level of indirection and no runtime checks other than the bounds check.

list only lisp?

Could we write a lisp that has no integers, but only lists? Where we would internally represent ints as ordinals maybe?

We would have to do the recursion in the heap instead of the stack though, because it will probably overflow very quickly.

Add almost-compiled bytecode

Basically this would be a version of bytecode that can have references to variables and arguments, which can be substituted to create closures.

Bug in argument recursion copy elision

Test case:

(define popn (lambda (l n) (cond (zero? n) l (popn (cdr l) (sub1 n)))))
(popn (list 1 2 3) 2)

returns an ArgumentTypeMismatch error.

This happens because we do partial copy elision from the back instead of the front and that won't work (I think).

Ternary expressions won't return curried functions?

(cond #f 0 (add 1)) evaluates to 0. How weird is that? This expression compiles to

Return,
PushValue(Integer(0)),
EvalFunction(1, Some(0)),
PushValue(Function(Custom(CustomFunc(InnerCustomFunc { arg_count: 2, body: Cond((FunctionCall(Value(Function(BuiltIn(CheckZero))), [Argument(StackOffset(1), Scope(0), NeedFull)], false, false), Argument(StackOffset(0), Scope(0), Unconstrained), FunctionCall(Variable(InternedString(3)), [FunctionCall(Value(Function(BuiltIn(AddOne))), [Argument(StackOffset(0), Scope(0), Unconstrained)], false, false), FunctionCall(Value(Function(BuiltIn(SubOne))), [Argument(StackOffset(1), Scope(0), Unconstrained)], false, false)], true, true)), false, true, CanTailCall), returns: false, byte_code: UnsafeCell })))),
PushValue(Integer(1)),
CondJump(3),
PushValue(Boolean(false))]

The EvalFunction is probably not treated as a tail-call during execution. Or it should be followed by a Return instruction.

Return analysis?

So we have pretty efficient argument copy elision for tail calls and recursive calls, but nothing yet for non-tail calls. We could analyse per end-point of a function whether or not all arguments have been moved. If this is the case, we could replace the regular way of moving arguments (swapping them with a false) by simply copying the bytes to the top of the stack. When the function returns, we would then splice the functions arguments from the value stack without dropping them. Profiling has shown that a lot of time is spent dropping items from the value stack, even though they have usually been replaced and therefore no cleanup is required. We could gain a lot of time here.

This would entail introducing a Return instruction, which encapsulates whether or not the arguments need dropping. We would no longer need to check the instruction pointer on every loop, which would also be a performance win.

Downsides are mostly implementation costs: this will almost certainly entail some unsafe code. Worse, the current infrastructure cannot really additional late stage analysis. To make this feasible, we should probably introduce another intermediate expression type, between the FinalizedExpr and Vec<Instr>. It will be a lot of work.

Fix shadowing issues

(lambda (x) (lambda (x) x)) should return the identity function regardless of input variable.
Test case:

(define f (lambda (x) (lambda (x) x)))
(list ((f #t) 3) ((f (list)) #f))
(3 #f)

Figure out proper implementation of CustomFunc

Maybe it should be

struct CustomFunc(Rc<RefCell<InnerCustomFunc>>);

where

enum InnerCustomFunc {
    Body(LispExpr),
    Finalized {
        body: FinalizedExpr,
        instr: Rc<Vec<Instr>>,
        specializations: HashMap<SmallVec<ArgType>, Rc<Vec<Instr>>>,
    }
}

?

Would that let us add new specializations when multiple copies of the function are in expressions somewhere and possibly has instructions on the stack? I believe so.

Do note that getting/ setting specializations is pretty expensive in this setup. It may be necessary to get the flexibility we need though.

Add range function to prelude

I feel like this should work, but it doesn't :-(

(define range (lambda (start end) (cond (> end start) (cons end (range start (sub1 end))) (list start))))

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.