The compiler from draco-lang

Parser hangs

func main() {

Disallow newline in interpolation of line-strings

parser adds expected semicolon if `if` statement or `while` statement have no body

with the code

func main(){
  while(x < 5)
}

the parser adds unexpected token after the while declaration, but then there is missing semicolon node which is wrong

Add an Option<T> type

It's pretty often needed, ideally with a functional API that works with both reference and value types.

Make the query engine garbage-collect old values

The query engine versions results, but it does not collect and throw out old, irrelevant values yet. In an incremental context this means infinitely growing memory consumption. We should come up with an API to define a GC strategy.

Fuzz test the lexer and parser

Implement a lexer

The specs already defines plenty of tokens that can be lexed.

Add a Result type

Remove OS-dependent command invocations

Currently our package.json for the VSC client has the following scripts defined:

  "scripts": {
    "copy-grammar": "(mkdir syntaxes || true) && (cp ../Draco.SyntaxHighlighting/draco.tmLanguage.json ./syntaxes/draco.tmLanguage.json)",
    "vscode:prepublish": "(npm run copy-grammar) && (npm run compile)",
    "compile": "tsc -p ./",
    "watch": "tsc -watch -p ./",
    "lint": "eslint src --ext ts"
  },

Sadly the copying is a Windows-only command, so won't work on other OSes. We can replace with a dev-dependency from NPM or even write a node script inline.

Write more tests for the parser

Set up README

The README is super bare-bones compared to the language suggestions repo. We should at least set up the logo, the badge and a short description.

Parser hangs

func main() {
    ,

Generate a visitor for the red tree

This is a bit of a complication, as the red tree itself is generated and we can't guarantee ordering between SGs.

Display diagnostic messages in the CLI

There is this lovely-looking thing with a really nice API that can help us: https://github.com/spectreconsole/errata

Integration should be fairly easy as well, we just need to write a converter from our own diags to the Errata format (examples here: https://github.com/spectreconsole/errata/blob/main/examples/Demo/Program.cs), then ask the report to display itself on the console.

Integrating the sources is fairly easy as well, we just need to implement an ISourceRepository. (example here: https://github.com/spectreconsole/errata/blob/main/examples/Shared/EmbeddedResourceRepository.cs)

Add `InvalidHexadecimalUnicodeCodepoint` error

Right now, it throws UnclosedUnicodeCodepoint where it should throw InvalidHexadecimalUnicodeCodepoint.

Parser hangs

For the following code the parser will hang indefinitely:

WriteLine(""""""
    \{capitalize(bottles(i))} of beer on the wall, \{bottles(i)} of beer.
    Take one down, pass it around, \{bottles(i - 1)} of beer on the wall.
    """""");

The very simple reason is, the synchronize function doesn't eat a single token here. We need to figure out what to do here. Maybe keep a context stack and consume based on that?

Make QueryDatabase threadsafe

Currently the QueryDatabase type and all of its datastructures are not thread-safe.

Implement a parse tree visitor

The parser branch already has a parse tree, but we will need a visitor base class to be able to write recursion schemes simpler

Authenticate github API request in the CI

Currently we pull ressources from github to build the webeditor and we are getting throttled (most likely because we share IPs with other users) and lot of builds fails due to that.

We should inject a readonly access token in the CI and update the build script to use it."

Implement a language server that can show diagnostics

We can add a language client in our VS Code extension to test it

Add Type property to the public ParseTree.Token

The public token doesn't have any type property

Implement a query system

What is a query system?

A query system is an optimizing system, that is able to cache the results of computations, but invalidate them in case one of the dependencies change. This is very similar to a build system, that tries to perform the least work possible, by only rebuilding parts that went out of date.

Example

Let's say we define the following computations we want to perform:

Read a list of numbers from a file, called read_numbers(file)
Multiply a list of numbers, called product(numbers)

Now let's say we have two files, A.txt and B.txt:

// A.txt
2 3 4 5

// B.txt
1 3 5 7

Let's say we want to multiply all the numbers from A.txt and B.txt. We'd do that by:

a_nums := read_numbers('A.txt')
b_nums := read_numbers('B.txt')

a_prod := product(a_nums)
b_prod := product(b_nums)

result := a_prod * b_prod

Then let's change the contents of A.txt to something else, like

9 8 7 6

If we want to recompute the product of all numbers in A.txt and B.txt, the algoritm could go something like this:

// Re-read the numbers from A.txt, the contents changed
a_nums := read_numbers('A.txt')

// We can just recall the last result of read_numbers('B.txt'), it didn't change
b_nums := read_numbers('B.txt')

// We need to perform this again, the numbers changed
a_prod := product(a_nums)

// We can just recall the old results again, the input did not change
b_prod := product(b_nums)

result := a_prod * b_prod

Which means that two operations didn't truly execute out of the 5 we had in this algorithm, they could just return their cached results!

We call these computations queries.

Observations

There are some key things/rules to take away from the example:

There are inputs to the system, that will be the root dependencies of computations
The computations have to be side-effect free, you can not cache results of a computation that has side-effects
Computations can depend on each other, but they should never form a cycle
Parameters and the results of the computations have to have value-based equality

Why do we need a query system?

Syntax analysis is relatively easy to make incremental without any fancy systems, because the theory for them has existed for a long time. The problem is that we are now past that, we need to make the rest of the frontend incremental, and that is not trivial by any means. A system like this could make it somewhat easier, because we would mostly have to focus on the actual logic, the incrementality would be given by this system.

Existing systems/literature

Designing such a system is hard, so it might be worth to look at some inspiration. I have collected out a few resources:

Internals

If you are interested about how such a system can work, you could watch this video by one of the Salsa authors (I promise it's interesting!), or you can take a peek at the pseudo-code I wrote that extracts out the core algorithm.

Designs I currently have

An old system inspired by the old Salsa framework

I have implemented a system like this, heavily inspired by an old version of Salsa that you can find here (usage is detailed in the README right there!). The gist of it is that you define the queries in interfaces, then register them in a DI framework with a special method. The method aliases the originally injected interface with a proxy that the framework generates. This proxy does all the versioning and caching behind the scenes. When the DI injects a service, it will be the proxy of it.

I really dislike this solution: it's easy to forget to call a method through the proxy if it's on the same instance, and the heavy dependency on SGs makes it really closed off, hard to fix bugs in it, hard to configure.

A new, also unattractive design I have

On the other extreme, I have a very dynamic API idea that requires no code generation, but it's not type-safe or even method-safe at all! Implementing the initial example with it would look something like so:

// NOTE: Sequence<T> is just a type that compares by values
static Sequence<int> ReadNumbers(Database db, string file) =>
    db.Get<string>("file_text", file).Split(' ').Select(int.Parse).ToSequence();

static int Product(Database db, Sequence<int> ns) =>
    ns.Aggregate((a, b) => a * b);

static int ProductOfNumbersInFiles(Database db)
{
    var aNums = db.Get<Sequence<int>>("ReadNumbers", "A.txt");
    var bNums = db.Get<Sequence<int>>("ReadNumbers", "B.txt");

    var aProd = db.Get<int>("Product", aNums);
    var bProd = db.Get<int>("Product", bNums);

    return aProd * bProd;
}

The pros of this system:

The Database type is exposed for easy configuration, it's not some magic internal type.
It's easier to extend, you don't need to modify an interface to add a query
It works with static functions

The cons of this system:

You reference the queries by some value key, which implies a few other points on this list
The DB has no type-safe info about what the called query returns
The calls to other queries are severely uglified

An ideal implementation would look like so (only showing the last query):

static int ProductOfNumbersInFiles(Database db)
{
    var aNums = ReadNumbers(db, "A.txt");
    var bNums = ReadNumbers(db, "B.txt");

    var aProd = Product(db, aNums);
    var bProd = Product(db, bNums);

    return aProd * bProd;
}

Key differences are that calls to other queries actually look like calls to queries, no key references through some value, type safety. The problem is that currently, this is not possible, Source Generators can't wrap up existing methods to allow for this.

Incorrect lexer behavior on illegal escape

Compiler/src/Draco.Compiler/Syntax/Lexer.cs

Line 630 in eb746ef

++offset;

We should only eat a character here, if it's a non-control character, otherwise the error can become confusing. In that case, the returned escaped character should be a whitespace for example.

Add syntax highlight for the Web IDE

Replace fonts emoji with ours.

Create an AST rewriter

We need to be able to rewrite the AST to implement lowering.

Use process worker so user code can't deadlock the UI

Implement red-green trees

Currently the compiler uses a strictly parent->child (immutable) structure. We would like to keep immutability, but have parent pointers.

Lazy Load Runtime DLLs for user code runtime

Create the Parse Tree structure

Define symbols for semantic checking

Implement symbol resolution

Detect circular references in the query system

Circular computations should be illegal, but can be easily made by accident. We should build in a safety check, at least for debug mode.

Write tests for the parser

Remove async over sync in QueryTask.

Expose diagnostics in the public API

Parser incorectly regognizes tokens after malformed string input

for this input with unclosed interpolation

func main(){
  val x = """
            Hello, 
            \{"World!"
            """;
}

it recognizes missing interpolation end, but doesn't recognize multiline string end.

"hello \{
var x = 0;

"hello \{"bye
var x = 0;

"hello \{"bye\{
var x = 0;

The lexer should pop off all modes to be on a non-line-string and non-line-string-interpolation state.

draco-lang / compiler Goto Github PK

compiler's Introduction

The Draco programming language compiler

What is this?

Try it out

Roadmap

compiler's People

Contributors

Stargazers

Watchers

Forkers

compiler's Issues

What is a query system?

Example

Observations

Why do we need a query system?

Existing systems/literature

Internals

Designs I currently have

An old system inspired by the old Salsa framework

A new, also unattractive design I have

Recommend Projects

Recommend Topics

Recommend Org