sfultong / stand-in-language Goto Github PK

View Code? Open in Web Editor NEW

32.0 9.0 5.0 1.32 MB

a simple total pure functional language, eventually to have powerful static checking and optimization

License: Apache License 2.0

Haskell 87.55% Makefile 0.08% C 7.34% CMake 0.14% C++ 2.95% Nix 0.84% Scheme 1.10%

language pure-function

stand-in-language's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger mkloczko cocreature jeltsch tluyben

stand-in-language's Issues

Fix partial evaluation

uncomment and debug code in Eval.hs

LLVM: calculate the size of pairheap necessary to execute grammar

Right now there's a constant called heapSize, which is ugly. When the type of a SIL expression is data, then we should be able to calculate exactly how many pairs will be generated by running the expression. Then we can get rid of heapSize.

Formal description of operational semantics

It would be nice to have a formal description (in the form of some text document) of the operational semantics of SIL that can be used as a reference instead of having to look at the interpreters and compilers scattered throughout the codebase.

make parser capable of treating AST instuctions as first-class symbols

E.G. you should be able to pass AST instructions as arguments, like app (pair zero zero) left

so right now in a .sil file, we could have a line like
test = left {0,0}

left is a keyword, and the parser always expects it to be followed by an argument.

So if I wanted to use it in a function, I'd have to do something like
test = (\f -> f {0,0}) (\x -> left x)

I want to be able to do this instead:
test = (\f -> f {0,0}) left

So we can parse left as a regular function, but it should also generate the same grammar as before when parsing old syntax (e.g. left {0,0})

optimize two forms of lambda

There are two forms of lambda in the code: (Pair x Env) the regular kind; (Pair x Zero) what I call a "complete lambda". The former binds its outside environment before a value is applied to it; the latter does not.

I have put the wrong form in certain places (and this will eventually show up as errors).

The complete lambdas should only be used when the type of 'x' contains no functions.

We can parse into the intermediate parse AST that only uses regular lambdas, then in an optimization pass convert all eligible lambdas to complete lambdas.

mini-repl doesn't resolve names for type inference

When I do :t succ in the repl, I would expect to get the same thing as :t (\n -> (n,0))

remove Twiddle instruction

This instruction is redundant.

Remove if/then/else from builtin parsing, make ITE function in Prelude

This might actually be a bad idea, but I generally don't like syntactic sugar.

research simpler formulation of Church numerals

This might not produce anything

Fix whitespace/indent parsing

Memoization

implement hashing of terms

define llvm functions by their hashes (so automatic deduplication)

Have some sort of memoization store with garbage collection (garbage collection will be run on each iteration of evalLoop or similar "frame")

Track down type checking bug

running line unitTestQC "DataTypedCorrectlyTypeChecks" 1000 quickcheckDataTypedCorrectlyTypeChecks usually freezes the test suite. This is most likely caused by some sort of type checking bug, which should be fixed.

Memory management

I thought a bit about how memory management in SIL could work, so let’s use this issue to track ideas.

The first option would be to use garbage collection. Integrating a conservative garbage collector (e.g. Boehm GC) with LLVM seems doable. However, integrating a precise garbage collector (or even writing a simple one from scratch) is probably a lot of work (and I’m not really familiar with that).
There is some cool work called ASAP: As Static As Possible memory management which basically boils down to doing as much static memory management as you can infer using various static analyses and generating code to handle the rest at runtime. In terms of complexity, this is definitely non-trivial but I would say it is far less work than getting precise garbage collection right. However, it’s worth pointing out that this is fairly new research and I’ve only read part of the thesis and not attempted to implement anything yet.
Try to stick somewhat close to the current approach but try to improve it in various ways. This ties in with #7. Having thought about #7 for a bit, I haven’t come up with a way to calculate the size that would be significantly cheaper than just executing the code. That makes me somewhat pessimistic about going down that route since at runtime that would not be useful and at compile time you might as well perform constant folding. You have definitely thought more about this so if you have some idea on how this would be beneficial, let me know!

Personally, I think the two most promising options are the following:

A conservative garbage collector which has the advantage of being the simplest option to implement and I think conservative garbage collection would also work reasonably well for our usecase.
Going down the ASAP route. This would definitely be the most interesting option from a research perspective while having the advantage of probably still being simpler than precise garbage collection.

Reconsider merging zero pair types

SIL’s type system currently has a special case where PairType ZeroType ZeroType is merged to ZeroType. As far as I understand (please correct me if I’m wrong!) the main motivation for this is to have a single type for integers (which are represented as left-associated, nested pairs with 0 elements). IMHO this approach is somewhat problematic and worth reconsidering:

It is confusing as evidenced by the fact that both @mkloczko and I were confused by this.
It is pretty uncommon, I never seen such a case in any other type system. Obviously that doesn’t necessarily mean it’s a bad idea but it’s at least something worth keeping in mind. It might also be worth trying to do a type safety proof for this if you want to stick to the current approach. At least to me it’s not obvious that this doesn’t break in some weird way (but I haven’t come up with a concrete example so far).

If my understanding of the motivation for the current behavior is correct, then I propose that we remove this special case in the type system and instead add primitive integers (either fixed size or arbitrary-sized integers) to IExpr and the type system. This would also help with compile time performance and memory usage and remove the need for reconstructing integers in the intermediate representation.

Regardless, of whether you want to stick to the current approach or not, I would very much like to see a formalization of the type system (just in textual form, e.g. typing rules in LaTeX). I believe that would make it significantly easier to understand SIL and help new contributors get started. (I’m happy to work on this, if you want me to).

Fix LLVM memory use

I want to be able to run all unit tests without running out of memory. Let's say that running them all should take no more than 16GB of memory

Cannot build because of missing library “jumper”

When trying to build SIL using cabal new-build, I get the following error message:

cabal: Missing dependency on a foreign library:
* Missing (or bad) C library: jumper

Unfortunately, I have no clue what that jumper library is and from where to get it.

This technique should also be used for let bindings.

If it makes sense, also fix parsing so that declarations don't have to come before they are used, and make sure recursive definitions are disallowed.