marcusklaas / lisp-interpreter Goto Github PK

View Code? Open in Web Editor NEW

11.0 2.0 1.0 719 KB

A bare bones lisp parser & interpreter

Rust 82.71% Shell 0.11% HTML 4.24% CSS 12.95%

lisp-interpreter lisp interpreter rust wasm

lisp-interpreter's Introduction

lisp-interpreter

A bare bones lisp parser & interpreter. A demo of the interpreter compiled to WASM can be found at marcusklaas.nl/lisp.

Features

Parses and evaluates simple lisp-like statements. Its features include lambdas, closures and currying. All data is immutable and the only types availables are unsigned integers, booleans, functions and lists. The interpreter simulates its own stack, so recursion is not bounded by the stack size of the interpreter.

Available built-in functions:

function name	arguments	output type
add1	int	int
sub1	positive int	int
zero?	int	bool
car	non-empty list	*
cdr	non-empty list	list
null?	list	bool
cons	*, list	list
define	name, *	empty list
lambda	list, *	function
cond	bool, ,	*
list	*	list
int?	*	bool
bool?	*	bool
list?	*	bool
fun?	*	bool

Further, the main binary introduces some convenience functions, including add, mult, map, filter, >, sort, append, not and and. These are defined in terms of the built-in functions above.

Example evaluations:

> (add1 (add1 3))
5
> (define sub (lambda (x y) (cond (zero? y) x (sub (sub1 x) (sub1 y)))))
()
> (list sub)
(func[2 -> t(func[cond] (func[zero?] $1) $0 t(sub (func[sub1] $0) (func[sub1] $1)))])
> (sub 5 3)
2
> (sub 10)
func[1 -> t(func[2 -> t(func[cond] (func[zero?] $1) $0 t(sub (func[sub1] $0) (func[sub1] $1)))] 1 $0)]
> (:last 1)
9
> (sub 1 5)
Evaluation error: SubZero

Installation

Make sure you have rust installed:

$ curl https://sh.rustup.rs -sSf | sh

Clone the repo

$ git clone https://github.com/marcusklaas/lisp-interpreter.git

Build using cargo

$ cd lisp-interpreter
$ cargo build --release

To start the command-line REPL, run

$ cd yalp-repl
$ cargo run --release

Execute the test suite using

$ cargo test

Some technical details

This lisp interpreter has a fairly simple design. It goes through the usual steps in the interpretation: it parses input strings into a sequence of tokens, builds an AST, does some (light) analysis on this AST to produce a "finalized" AST and finally compiles this into bytecode. The unit of compilation and execution is the function. Execution is done by keeping a stack of values, onto which function arguments and return value are pushed and popped. Every time a function is called, a new stack reference is created, which contains a pointer to the function's bytecode, an instruction pointer, and the position of its arguments on the stack. When a function calls another function, its own stack reference is pushed onto the reference stack (unless analysis showed that this call is a tail-call) and the current stack reference is replaced by that of the callee. Whenever a function returns, the stack reference of the calling function is popped off the reference stack and execution continues there.

This interpreter does not use a garbage colllector to keep the design simple. Functions are reference counted and all other values are cloned or moved. Mutation of values is not possible, although mutation does happen at execution time as an optimization. There is a single environment that holds definitions. Definitions cannot be overwritten.

Because the set of buitl-in functions is so sparse, writing performant code for this interpreter is generally not possible. However, it does perform elementary operations relatively quickly. For example, the prelude function add, which recursively adds 1 to the first argument and subtracts 1 from the second until the second argument is zero is about twice as fast as the following loop in PHP 7.1.8:

function add(int $a, int $b): int {
    while($b > 0) {
        $a += 1;
        $b -= 1;
    }
    return $a;
}

However, it is about five times slower than a similar loop in V8.

Code that heavily relies on list operations should be reasonably fast, as it is internally represented by Vectors and not by Linked Lists, as is common for many other interpreters. Again, since there are few built-in functions, common operations like appending lists is not optimized (it does a list reversal as an intermediate step!) and is way slower than it could be with a better optimizing compiler.

Adding loop analysis and mathy substitution rules (repeated increments = addition, repeated addition = multiplication, etc.) would be a cool project that could significantly speed up common operations.

lisp-interpreter's People

Contributors

Stargazers

Watchers

Forkers

zjsxwc

lisp-interpreter's Issues

Add pretty printing for functions

Debugging closures is too hard now

Add documentation

make functions first class objects

so we can map them, for example

Unreverse bytecode execution order

We tried this before and decided against it because an inexplicable performance hit, but we should reconsider. This has the potential to significantly clean up the instruction generation code and should (theoretically) perform as well as the reverse execution order.

increase test coverage

Add FinalizedExpr type

Fully processed expression that is ready to be compiled. Should have no variable references any more. Tail/ self calls should be resolved. Each macro should have its own type.

We need this because we're doing the same checks multiple times when processing expressions.

Reverse lists

cons and car now take off the last element of lists instead of the last. This is very unorthodox in lisps. We should fix this.

use library for parsing

Writing a parser from scratch has been fun, but using a parser combinator or parser generator may be more maintainable. Try nom or lalrpop.

Church numerals do not work

(((lambda (n f x) (f (n f x))) (lambda (f x) x)) add1 0) results in a NonFunctionApplication error when it should return 1.

It tries to apply a false to some value, which is probably an indication that we've messed up the move analysis.

Add current stack pointer

Should be a bit faster than peeking at the stack of stack pointers every time.

Add separate bytecode for tail calls

Very minor optimization. Should save us one check.

add a list function

So we can have things like (x y a) always represent function application.

Do reference counting on functions

This should reduce the current excessive cloning and hence improve performance

add closures

Add move value instruction

Swap a dummy value onto the return stack and swap it. Only do this when the argument is used no more than once in a function!

Add recurse instruction

We're spending a lot of time getting and cloning functions - there's no need for that when we're straight up recursing.

Intern variable names

Micro optimization? Do this last!

split into modules

parser, evaluation etc

Restrict usage of define

Only allow it at top level.

Walk instructions forward instead of backward

Walking backwards is a remnant from our instruction stack days. Walking forward may perform better.

implement trampolining

since we run out of stack space pretty quickly. we'll have to emulate our own stack on the heap and run evaluation in a loop? not sure of the implementation details yet. we should probably have state execute evaluations instead of regular functions? not sure if that would prevent stackgrowth..

Do list splits more efficiently

Expressions that both car and cdr the same list are common. This split can be done more efficiently without cloning that it is currently done (using at least one clone).

Merge cond-zero branch

It's got the latest goodies.

Add specialization

This is a big one.

Upgrade parser zodat ook niet-calls geaccepteerd worden

Als je nu een variable uit de state uit wil lezen, dan moet dat via iets als (car (list var)), want var wordt niet geparst.

Move through instructions instead of pushing/ popping

The current approach works OK, but it has drawbacks.

Firstly, the instruction stack grows very large for deeply recursive calls. This is because we now push the instructions for both branches of a condition.

Further, we clone a vector onto the instruction stack for each function call. Since instructions are not Copy, this is pretty expensive.

It's not immediately clear what the ideal solution here is. We cannot precompile the bytecode for all functions and add them to a large vector since new functions can (only!) be made during runtime. Just pushing to a big vector whenever a function is created at runtime may work, but this may overflow when many one-shot functions are created, for example in (map (lambda (x) (lambda (y) (list x y))) (range 1 10000)).

Ideally, we don't ever clone instructions and yet throw away that which we know won't run any more. One way we may do this is by introducing (another!) stack with an Rc<Vec<Instr>> and usize, where the usize points to current instruction in the instruction vector. Instead of creating two more vectors, these values should probably be combined with the value stack pointer stack into one vector of structs, containing instruction pointer, value pointer and current instruction vector.

Introduce more efficient instruction storage

Instruction vectors are now Rc<RefCell<Vec<Instr>>>, which is inefficient since it may be as many as three indirections to get a vector and some runtime checking (because of the RefCell). We should probably introduce a new type which combines the properties of Rc, RefCell and Vec. It should be possible to have only a single level of indirection and no runtime checks other than the bounds check.

implement elementary list functions

cons, null?, cdr and car are essential.

list only lisp?

Could we write a lisp that has no integers, but only lists? Where we would internally represent ints as ordinals maybe?

We would have to do the recursion in the heap instead of the stack though, because it will probably overflow very quickly.

Experiment with quickcheck?

Useful? Maybe?

Interesting? Definitely!

implement Display for LispVals

So that it's not as painful to read results.

Add zip to prelude

Some tests and a benchmark would be useful!

Add bytecodes for each built-in function

Should make things a bit faster.

Add almost-compiled bytecode

Basically this would be a version of bytecode that can have references to variables and arguments, which can be substituted to create closures.

Bug in argument recursion copy elision

Test case:

(define popn (lambda (l n) (cond (zero? n) l (popn (cdr l) (sub1 n)))))
(popn (list 1 2 3) 2)

returns an ArgumentTypeMismatch error.

This happens because we do partial copy elision from the back instead of the front and that won't work (I think).

Ternary expressions won't return curried functions?

(cond #f 0 (add 1)) evaluates to 0. How weird is that? This expression compiles to

Return,
PushValue(Integer(0)),
EvalFunction(1, Some(0)),
PushValue(Function(Custom(CustomFunc(InnerCustomFunc { arg_count: 2, body: Cond((FunctionCall(Value(Function(BuiltIn(CheckZero))), [Argument(StackOffset(1), Scope(0), NeedFull)], false, false), Argument(StackOffset(0), Scope(0), Unconstrained), FunctionCall(Variable(InternedString(3)), [FunctionCall(Value(Function(BuiltIn(AddOne))), [Argument(StackOffset(0), Scope(0), Unconstrained)], false, false), FunctionCall(Value(Function(BuiltIn(SubOne))), [Argument(StackOffset(1), Scope(0), Unconstrained)], false, false)], true, true)), false, true, CanTailCall), returns: false, byte_code: UnsafeCell })))),
PushValue(Integer(1)),
CondJump(3),
PushValue(Boolean(false))]

The EvalFunction is probably not treated as a tail-call during execution. Or it should be followed by a Return instruction.

add truth type

We need #t and #f for the cond function to work.

Update README

Update build instructions, point to online repl

Don't do double checks

Return analysis?

So we have pretty efficient argument copy elision for tail calls and recursive calls, but nothing yet for non-tail calls. We could analyse per end-point of a function whether or not all arguments have been moved. If this is the case, we could replace the regular way of moving arguments (swapping them with a false) by simply copying the bytes to the top of the stack. When the function returns, we would then splice the functions arguments from the value stack without dropping them. Profiling has shown that a lot of time is spent dropping items from the value stack, even though they have usually been replaced and therefore no cleanup is required. We could gain a lot of time here.

This would entail introducing a Return instruction, which encapsulates whether or not the arguments need dropping. We would no longer need to check the instruction pointer on every loop, which would also be a performance win.

Downsides are mostly implementation costs: this will almost certainly entail some unsafe code. Worse, the current infrastructure cannot really additional late stage analysis. To make this feasible, we should probably introduce another intermediate expression type, between the FinalizedExpr and Vec<Instr>. It will be a lot of work.

Add tail call optimization

Shouldn't be too difficult now that we have a stack machine.

Remove unwanted files

vscode stuff, flamegraph, perf data

and add them to .gitignore

Crash on custom func evaluation without arguments

Running (add) will produce a panic.

Improve error messages

We now state what is wrong, but not the scope or place.

Fix shadowing issues

(lambda (x) (lambda (x) x)) should return the identity function regardless of input variable.
Test case:

(define f (lambda (x) (lambda (x) x)))
(list ((f #t) 3) ((f (list)) #f))
(3 #f)

Make it compile to javascript

That'd be sooo dope.

Figure out proper implementation of CustomFunc

Maybe it should be

struct CustomFunc(Rc<RefCell<InnerCustomFunc>>);

where

enum InnerCustomFunc {
    Body(LispExpr),
    Finalized {
        body: FinalizedExpr,
        instr: Rc<Vec<Instr>>,
        specializations: HashMap<SmallVec<ArgType>, Rc<Vec<Instr>>>,
    }
}

Would that let us add new specializations when multiple copies of the function are in expressions somewhere and possibly has instructions on the stack? I believe so.

Do note that getting/ setting specializations is pretty expensive in this setup. It may be necessary to get the flexibility we need though.

(define range (lambda (start end) (cond (> end start) (cons end (range start (sub1 end))) (list start))))