Code Monkey home page Code Monkey logo

purec's Introduction

Pure-C

An alternative backend for the PureScript programming language that targets C.

Introduction

PureScript is a high-level, statically typed, functionally pure programming language that enables various backends using its intermediate representation of elaborated programs.

PureC provides a backend that transpiles to the C programming language, thus enabling native, ahead-of-time compilation of PureScript programs. With a dead simple FFI to C, performance critical sections can easily be implemented outside of PureScript when the need arises.

For an example of how PureC could be used, take a look at purec-uv, a project implementing both purescript-aff and bindings to libuv on top of PureC.

Features

  • Write native programs in PureScript
  • Simple FFI to C
  • Generates readable, human- and machine-debuggable C
  • Simple and efficient reference counted GC
  • Opt-in tracing GC (Boehm–Demers–Weiser garbage collector)

Usecases

  • Scrap your node.js
  • Write small, easy to distribute command line utilities
  • No JS parsing, instant start up times, native performance

Development

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

For now, the default makefiles use the clang compiler toolchain. So either install clang (if not installed already), or open a PR adding support for other compilers.

Prerequisites

PureC is written in PureScript and currently not self-hoisting; A full node.js runtime is required. With node.js installed (any recent version will work) install the node.js dependencies to build purec.js: npm install.

Additionally, we require

  • make
  • libcmocka-dev
  • valgrind

Build the purec utility

Run npm run build to build the purec.js utility.

Running the tests

npm t

Contributing

Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.

Versioning

This project is alpha quality and will likely remain alpha quality for a while. That means for now there's one version, and that's origin/HEAD.

Authors

See also the list of contributors who participated in this project.

License

This project is licensed under the MIT License - see the LICENSE file for details

Acknowledgments

  • uthash - A hash table for C structures
  • vec - A type-safe dynamic array implementation for C
  • ccan - The C Code Archive Network
  • purescript-native - An experimental C++11/native compiler backend for PureScript

purec's People

Contributors

felixschl avatar lupino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

purec's Issues

Fix compile errors for examples on macOS

On macOS High Sierra,

$ make examples
[...]
usage:  ar -d [-TLsv] archive file ...
	ar -m [-TLsv] archive file ...
	ar -m [-abiTLsv] position archive file ...
	ar -p [-TLsv] archive [file ...]
	ar -q [-cTLsv] archive file ...
	ar -r [-cuTLsv] archive file ...
	ar -r [-abciuTLsv] position archive file ...
	ar -t [-TLsv] archive [file ...]
	ar -x [-ouTLsv] archive [file ...]
make[3]: *** [libpurec.a] Error 1
make[2]: *** [../../libpurec.a] Error 2
make[1]: *** [main] Error 2
make: *** [examples] Error 2

Compile dependencies prior to running test suite

Also see #12

We vendored the tests that come with the purescript compiler (for now). These tests are discovered, built, and executed in https://github.com/pure-c/pure-c/blob/c578687bff74370b92bdf95976b38c8adcc7b10d/test/Main.purs#L31-L59.

All tests use this makefile that is generated exactly once per pulp test:

https://github.com/pure-c/pure-c/blob/c578687bff74370b92bdf95976b38c8adcc7b10d/test/Main.purs#L61-L83

This means that all tests share the same build directory and thus can re-use shared build artifacts. Hence, the first build to run will take the longest, has it has to build a lot of the dependencies that are required. This will usually time out.

Instead, we should create a dummy target (or choose a test we consider infallible) that is built prior the actual tests running.

Error due to redifinition of local variables

PureScript test "passing/1991.purs" is not passing because the currently generated C code redefines local variables (#12)

Without the inlineVariables optimization pass in the PureScript JS codegen, it spits out code looking similar to this:

var foldMap = function ($copy_dictSemigroup) {
    return function ($copy_f) {
        return function ($copy_v) {
            var $tco_var_dictSemigroup = $copy_dictSemigroup;
            var $tco_var_f = $copy_f;
            var $tco_done = false;
            var $tco_result;
            function $tco_loop(dictSemigroup, f, v) {
                var $2 = f;
                var $3 = v;
                var f1 = $2;
                if ($3.length === 5) {
                    var $4 = $3[0];
                    var a = $4;
                    var $5 = $3[1];
                    var b = $5;
                    var $6 = $3[2];
                    var c = $6;
                    var $7 = $3[3];
                    var d = $7;
                    var $8 = $3[4];
                    var e = $8;
                    $tco_done = true;
                    return Data_Semigroup.append(dictSemigroup)(f1(a))(Data_Semigroup.append(dictSemigroup)(f1(b))(Data_Semigroup.append(dictSemigroup)(f1(c))(Data_Semigroup.append(dictSemigroup)(f1(d))(f1(e)))));
                };
                var f1 = $2;
                var xs = $3;
                $tco_var_dictSemigroup = dictSemigroup;
                $tco_var_f = f1;
                $copy_v = xs;
                return;
            };
            while (!$tco_done) {
                $tco_result = $tco_loop($tco_var_dictSemigroup, $tco_var_f, $copy_v);
            };
            return $tco_result;
        };
    };
};

JavaScript doesn't care that e.g. f1 gets redefined at will, but we're not so fortunate in C. I think we should be able to generate unoptimized code that still compiles. We could forward declare variable introductions on a per-block level and then assign to them later. This rewrite pass would run on the initially generated AST.

WebAssembly

WebAssembly seems like one obvious use case of this project so I was surprised that there wasn't any discussion about this here already. Forgive me if I'm overlooking something that already exists.

I'm wondering whether you've given this much thought or tried it out, and what challenges/limitations you've encountered or foresee.

Translate corefn.json in parallel

Update the mk/target.mk file to have a one-to-one mapping from corefn.json to a .c file, something like: %.c: %/corefn.json. That means Make can compile the corefn jsons into C files in parallel using its' -j flag.

Fork and implement FFI for Prelude

Fork: https://github.com/purescript-c/purescript-prelude
Modules:

  • Prelude
  • Control.Applicative
  • Control.Apply
  • Control.Bind
  • Control.Category
  • Control.Monad
  • Control.Semigroupoid
  • Data.Boolean
  • Data.BooleanAlgebra
  • Data.Bounded (incl. FFI)
  • Data.CommutativeRing
  • Data.DivisionRing
  • Data.Eq (incl. FFI)
  • Data.EuclideanRing (incl. FFI)
  • Data.Field
  • Data.Function
  • Data.Functor (incl. FFI)
  • Data.HeytingAlgebra (incl. FFI)
  • Data.Monoid
  • Data.NaturalTransformation
  • Data.Ord (incl. FFI)
  • Data.Ordering
  • Data.Ring (incl. FFI)
  • Data.Semigroup (incl. FFI)
  • Data.Semiring (incl. FFI)
  • Data.Show (incl. FFI)
  • Data.Symbol (incl. FFI)
  • Data.Unit (incl. FFI)
  • Data.Void
  • Data.Monoid.Additive
  • Data.Monoid.Conj
  • Data.Monoid.Disj
  • Data.Monoid.Dual
  • Data.Monoid.Endo
  • Data.Monoid.Multiplicative
  • Data.Ord.Unsafe (incl. FFI)
  • Data.Semigroup.First
  • Data.Semigroup.Last
  • Data.Record.Unsafe (incl. FFI)
  • Type.Data.Row
  • Type.Data.RowList

Move main function into `Effect Int`

Either:

  • mandate this for all main functions,
  • or detect at compile time based on the type of main (if that information is even available),
  • or by checking the return value at runtime: if NULL (implying Unit), return 0, else return purs_any_get_int(...) of result.

Copy C FFI files with purs compile

It would be preferable if the files could be copied as part of purs compile so that we can have incremental rebuilds of single modules. Currently the only way to invalidate a module is by producing a modified corefn.

Clang chokes on generated code for Data.Either.Nested

Clang appears to choke on compiling the generated code for Data.Either.Nested. It uses all of my system's memory until I have to manually interrupt it to give up. The generated c code is about 3692 loc but that's pre macro expansion. I am not sure how to diagnose this further at this point.

Implement reference counting

Experimental: Implement reference counting as an alternative or as a complement to GC.

  • Add a refcount int to purs_any_t
  • Add a retain function to increment refcount (we can worry about atomicity later)
  • Add a release function decrement the refcount
    • If refcount drops to zero clean up the contained purs_value_t, and free the pointer
  • Update purs_vec_t to properly increment/decrement the reference count
  • Update purs_record_t to properly increment/decrement the reference count
  • Keep retain-ed references in scope structs
  • Keep retain-ed references in _FFI scopes (and release after BODY)
  • Emit release calls for allocated resources before returning from functions

Figure out tooling for building projects with purec

What could be a nice workflow for users wanting to use purec? Install the purec utility from npm and somehow combine pulp with purec (maybe using pulp --then?). That would buy users file watching for free (although it wouldn't pick up on .c and .h files). We need to figure out how to generate the Makefiles and how to bring in the runtime (#13).

Figure out how to find FFI files

Currently the compilation goes from some sources into purs -g corefn -o <output-dir>, which will spit them into <output-dir>. We then iterate over the corefn.json files found in <output-dir>, but those files give no hint back to where they originated from, making it necessary to remember their origin somehow.

Redundant variable assignments and indirect values

When looking into #40, I discovered this:

    const purs_any_t * b$$$ =
      NULL;;
    b$$$ =
      ({
        const purs_any_t * * $ivalue =
          purs_indirect_value_new();;
        const purs_any_t * $value =
          NULL;;
        purs_indirect_value_assign(
          $ivalue,
          $value
        );
        $value;
      });
    b$$$ =
      ({
        const purs_any_t * * $ivalue =
          purs_indirect_value_new();;
        const purs_any_t * $value =
          NULL;;
        purs_indirect_value_assign(
          $ivalue,
          $value
        );
        $value;
      });
    b$$$ =
      ({
        const purs_any_t * * $ivalue =
          purs_indirect_value_new();;
        const purs_any_t * $value =
          NULL;;
        purs_indirect_value_assign(
          $ivalue,
          $value
        );
        $value;
      });
    b$$$ =
      ({
        const purs_any_t * * $ivalue =
          purs_indirect_value_new();;
        const purs_any_t * $value =
          NULL;;
        purs_indirect_value_assign(
          $ivalue,
          $value
        );
        $value;
      });

Firstly, why is b'' ("b$$$") re-assigned over and over again and secondly, what is going on with the indirect variables here?

This does not appear to break anything, but is messy, causes unnecessary heap allocations, and should be removed.

Improve compiler performance

Currently compiling Data.Either.Nested takes about 10 seconds. It emits about 84k lines of code! So we should (a) investigate optimization passes to remove the lines to output and (b) make it run fast even when generating large files like that.

Explore not using Blocks for scopes

Currently we rely on a clang extension called "Blocks" contributed to the compiler project by Apple that essentially gives us closures in C. It would be interesting to see if we can go without this extension in order to be able to compile without clang and to avoid the blocks runtime.

It should be possible to - at compile time - inspect each function's scope, create a struct for it, and model
functions like this:

f x =
  let r y =
   in x
typedef (const purs_any_t *) (* purs_fun_fn) (const void * ctx, const purs_any_t * arg);
typedef struct purs_fun {
  const void * ctx;
  purs_fun_fn fn;
} purs_fun_t;
typedef struct f_ctx { /* empty */ } f_ctx_t; // just NULL?
typedef struct f_r_ctx { const purs_any_t * x } f_r_ctx_t;
purs_any_t * f (const void * ctx, const purs_any_t * x) {
  const f_ctx_t * f_ctx = (const f_ctx_t *) f_ctx;
  purs_any_t * out = GC_NEW(purs_any_t);
  purs_fun_t * fun = GC_NEW(purs_fun_t);
  f_r_ctx * f_r_ctx = GC_NEW(f_r_ctx);
  f_r_ctx->x = x;
  fun->fn = f_r;
  fun->ctx = f_r_ctx;
  out->tag = FUNCTION;
  out->value.fn = fun;
  return (const purs_any_t *) out;
}

Consider copying boxed integers and numbers

Heap allocating boxes for integers, doubles, and chars seems very redundant when it should be cheaper to just copy around the box itself. This needs to be properly investigated.

I just found myself writing this FFI code:

PURS_FFI_FUNC_4(Control_Monad_ST_Internal_for, _lo, _hi, f, _, {
	int lo = purs_any_get_int(_lo);
	int hi = purs_any_get_int(_hi);
	for (int i = 0; i < hi; i++) {
		purs_any_app(purs_any_app(f, purs_any_int_new(i) /* ouch! */), NULL)
	}
	return NULL;
});

Create structs for type classes

The current dictionary-based approach has poor performance because the dictionaries are frequently allocated and released, meaning that strings (keys) are frequently allocated and released as well. Considering that the dictionary is completely known a priori we might as well emit matching structs or use some purs_any_t ** and access them by some known pointer offset by key.

GC issues under clang compile time optimizations

Branch bug/clang-opts demonstrates a minimal program to reproduce GC issues due to compile time optimization (any level of optimization appears to trigger this). Note that the various random seeming pure units need to be there to trigger the issue. I have a hunch it's related to type-classes (or hash maps, more generally speaking) - hence the monomorphic versions of bind, pure, and discard for Effect.)

For example, on my machine the example program prints: "test: append" as opposed to "test: quark", for example when compiled with -O3.

Run against all passing tests in PureScript compiler test suite

All of these tests should eventually pass using this backend: https://github.com/purescript/purescript/tree/master/tests/purs/passing.

The directory structure is set up such that every (NAME).purs file is a test with additional supporting modules optionally residing in a folder named (NAME)/ in the same directory.

The logic for this already exists for the most part: https://github.com/felixSchl/purescript-c/blob/5ccdd0584d856584259096eadd11d316375f476d/test/Main.purs#L49-L57.

  • Resolve dependencies (do all tests just rely on prelude?)
  • Compile Makefiles (#13)
  • Generate test suite using purescript-spec (each test case should become an it block)

Write a transform that turns expressions into statements

The current output is heavily based on expressions, meaning that we apply functions and feed the result into other functions etc. It would be interesting to explore a transform that would turn these expressions into statements by generating fresh variable names in the current block scope and then perform the wiring up.

For example:

Turn this:

purs_any_app(f, purs_any_app(g, purs_any_int_new(100)))

into this:

const ANY * $value0 = purs_any_int_new(100);
const ANY * $value1 = purs_any_app(g, $value0);
const ANY * $value2 = purs_any_app(f, $value1);
return $value2;

At least one benefit of this is that it makes it easier to set breakpoints in generated programs. I am currently performing these transformations by hand when debugging generated output.

Inline common operators

Implement an equivalent for the inlineCommonOperators pass in PureScript.

https://github.com/purescript/purescript/blob/master/src/Language/PureScript/CoreImp/Optimizer/Inliner.hs#L109

I think a good few hours can be sunk into this one alone, but will be worth it. We should be able to apply the extra optimization of unboxing literals where applicable or even statically computing results where applicable. For reference, see: https://github.com/pure-c/pure-c/blob/cb4c6ffad5af96928c5934ea29138fbb9ae708a3/src/Language/PureScript/CodeGen/C/Optimizer/Inliner.purs#L222-L242

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.