Code Monkey home page Code Monkey logo

compiler's Introduction

The Draco programming language compiler

Discord Online Playground

What is this?

This is the repository for the work-in-progress compiler for the Draco programming language, a new .NET programming language under development. If you want further details on the language itself or want to contribute ideas, head over to the language suggestions repository.

Try it out

You can either use the online playground, or you can play with it locally, for that look at the Getting started guide.

Roadmap

  • Syntax analysis
    • Lexing
    • Parsing
    • Red-green trees
  • Semantic analysis
    • Symbol resolution
    • Type checking
    • Type inference
    • Dataflow analysis
  • Codegen
    • AST
    • Lowering
    • Custom IR
    • Writing IL
    • Writing PE
  • Optimization
    • TCO
    • Vectorization

compiler's People

Contributors

binto86 avatar jl0pd avatar kuinox avatar lpeter1997 avatar lucyelle avatar reflectronic avatar terria-k avatar thinker227 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

compiler's Issues

Add an Option<T> type

It's pretty often needed, ideally with a functional API that works with both reference and value types.

Make the query engine garbage-collect old values

The query engine versions results, but it does not collect and throw out old, irrelevant values yet. In an incremental context this means infinitely growing memory consumption. We should come up with an API to define a GC strategy.

Remove OS-dependent command invocations

Currently our package.json for the VSC client has the following scripts defined:

  "scripts": {
    "copy-grammar": "(mkdir syntaxes || true) && (cp ../Draco.SyntaxHighlighting/draco.tmLanguage.json ./syntaxes/draco.tmLanguage.json)",
    "vscode:prepublish": "(npm run copy-grammar) && (npm run compile)",
    "compile": "tsc -p ./",
    "watch": "tsc -watch -p ./",
    "lint": "eslint src --ext ts"
  },

Sadly the copying is a Windows-only command, so won't work on other OSes. We can replace with a dev-dependency from NPM or even write a node script inline.

Display diagnostic messages in the CLI

There is this lovely-looking thing with a really nice API that can help us: https://github.com/spectreconsole/errata

Integration should be fairly easy as well, we just need to write a converter from our own diags to the Errata format (examples here: https://github.com/spectreconsole/errata/blob/main/examples/Demo/Program.cs), then ask the report to display itself on the console.

Integrating the sources is fairly easy as well, we just need to implement an ISourceRepository. (example here: https://github.com/spectreconsole/errata/blob/main/examples/Shared/EmbeddedResourceRepository.cs)

Parser hangs

For the following code the parser will hang indefinitely:

WriteLine(""""""
    \{capitalize(bottles(i))} of beer on the wall, \{bottles(i)} of beer.
    Take one down, pass it around, \{bottles(i - 1)} of beer on the wall.
    """""");

The very simple reason is, the synchronize function doesn't eat a single token here. We need to figure out what to do here. Maybe keep a context stack and consume based on that?

Implement a parse tree visitor

The parser branch already has a parse tree, but we will need a visitor base class to be able to write recursion schemes simpler

Authenticate github API request in the CI

Currently we pull ressources from github to build the webeditor and we are getting throttled (most likely because we share IPs with other users) and lot of builds fails due to that.

We should inject a readonly access token in the CI and update the build script to use it."

Implement a query system

What is a query system?

A query system is an optimizing system, that is able to cache the results of computations, but invalidate them in case one of the dependencies change. This is very similar to a build system, that tries to perform the least work possible, by only rebuilding parts that went out of date.

Example

Let's say we define the following computations we want to perform:

  • Read a list of numbers from a file, called read_numbers(file)
  • Multiply a list of numbers, called product(numbers)

Now let's say we have two files, A.txt and B.txt:

// A.txt
2 3 4 5

// B.txt
1 3 5 7

Let's say we want to multiply all the numbers from A.txt and B.txt. We'd do that by:

a_nums := read_numbers('A.txt')
b_nums := read_numbers('B.txt')

a_prod := product(a_nums)
b_prod := product(b_nums)

result := a_prod * b_prod

Then let's change the contents of A.txt to something else, like

9 8 7 6

If we want to recompute the product of all numbers in A.txt and B.txt, the algoritm could go something like this:

// Re-read the numbers from A.txt, the contents changed
a_nums := read_numbers('A.txt')

// We can just recall the last result of read_numbers('B.txt'), it didn't change
b_nums := read_numbers('B.txt')

// We need to perform this again, the numbers changed
a_prod := product(a_nums)

// We can just recall the old results again, the input did not change
b_prod := product(b_nums)

result := a_prod * b_prod

Which means that two operations didn't truly execute out of the 5 we had in this algorithm, they could just return their cached results!

We call these computations queries.

Observations

There are some key things/rules to take away from the example:

  • There are inputs to the system, that will be the root dependencies of computations
  • The computations have to be side-effect free, you can not cache results of a computation that has side-effects
  • Computations can depend on each other, but they should never form a cycle
  • Parameters and the results of the computations have to have value-based equality

Why do we need a query system?

Syntax analysis is relatively easy to make incremental without any fancy systems, because the theory for them has existed for a long time. The problem is that we are now past that, we need to make the rest of the frontend incremental, and that is not trivial by any means. A system like this could make it somewhat easier, because we would mostly have to focus on the actual logic, the incrementality would be given by this system.

Existing systems/literature

Designing such a system is hard, so it might be worth to look at some inspiration. I have collected out a few resources:

Internals

If you are interested about how such a system can work, you could watch this video by one of the Salsa authors (I promise it's interesting!), or you can take a peek at the pseudo-code I wrote that extracts out the core algorithm.

Designs I currently have

An old system inspired by the old Salsa framework

I have implemented a system like this, heavily inspired by an old version of Salsa that you can find here (usage is detailed in the README right there!). The gist of it is that you define the queries in interfaces, then register them in a DI framework with a special method. The method aliases the originally injected interface with a proxy that the framework generates. This proxy does all the versioning and caching behind the scenes. When the DI injects a service, it will be the proxy of it.

I really dislike this solution: it's easy to forget to call a method through the proxy if it's on the same instance, and the heavy dependency on SGs makes it really closed off, hard to fix bugs in it, hard to configure.

A new, also unattractive design I have

On the other extreme, I have a very dynamic API idea that requires no code generation, but it's not type-safe or even method-safe at all! Implementing the initial example with it would look something like so:

// NOTE: Sequence<T> is just a type that compares by values
static Sequence<int> ReadNumbers(Database db, string file) =>
    db.Get<string>("file_text", file).Split(' ').Select(int.Parse).ToSequence();

static int Product(Database db, Sequence<int> ns) =>
    ns.Aggregate((a, b) => a * b);

static int ProductOfNumbersInFiles(Database db)
{
    var aNums = db.Get<Sequence<int>>("ReadNumbers", "A.txt");
    var bNums = db.Get<Sequence<int>>("ReadNumbers", "B.txt");

    var aProd = db.Get<int>("Product", aNums);
    var bProd = db.Get<int>("Product", bNums);

    return aProd * bProd;
}

The pros of this system:

  • The Database type is exposed for easy configuration, it's not some magic internal type.
  • It's easier to extend, you don't need to modify an interface to add a query
  • It works with static functions

The cons of this system:

  • You reference the queries by some value key, which implies a few other points on this list
  • The DB has no type-safe info about what the called query returns
  • The calls to other queries are severely uglified

An ideal implementation would look like so (only showing the last query):

static int ProductOfNumbersInFiles(Database db)
{
    var aNums = ReadNumbers(db, "A.txt");
    var bNums = ReadNumbers(db, "B.txt");

    var aProd = Product(db, aNums);
    var bProd = Product(db, bNums);

    return aProd * bProd;
}

Key differences are that calls to other queries actually look like calls to queries, no key references through some value, type safety. The problem is that currently, this is not possible, Source Generators can't wrap up existing methods to allow for this.

Implement red-green trees

Currently the compiler uses a strictly parent->child (immutable) structure. We would like to keep immutability, but have parent pointers.

Add a parse-tree transformer

For formatting and other transformations it might be handy to generate a transformer base. The visitor generator can be copied and adapted easily.

Test the lexer

Once the lexer is implemented, we should write unit tests for it.

Change the internal parse-tree to be regular classes

They are record-classes currently, which is dangerous, given that they are not written with value-semantics in mind. For that we can provide comparers later, but referential equality should be fine.

We can keep the records if there's a simple way to reintroduce referential equality for all nodes. Otherwise, we'll need classes, which means explicit constructors.

Set up CI

Since we have tests, we should definitely run tests on the compiler in Actions for example.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.