Code Monkey home page Code Monkey logo

mab's Introduction

Mab, a Lossless Lua Parser in Rust


No Longer Maintained

Check out Full Moon instead, mab's successor!


This is a work in progress lossless Lua 5.1+ parser written in Rust. It will preserve all comments, whitespace, and style distinctions like quotes and escaping.

I want to use Mab for a number of future projects:

  • Static analysis tool to replace luacheck
  • Style checker and reformatter like gofmt or rustfmt (maybe named "Stylua")
  • Static typing like TypeScript or Flow
  • Documentation parser/generator like Rustdoc, more robust than LDoc
  • A tool like Google's Rerast for Rust or Facebook's codemod

Goals

  • PUC-Rio Lua 5.1+, LuaJIT 2.0+ support
    • Optionally validate against specific versions of Lua by casting between ASTs
  • 100% style and whitespace preservation
    • You should be able to read and overwrite your entire project and have zero changes
  • Foundation for static analysis and strong typing
  • Support for language extensions without breaking existing tools
    • The AST should be able to cast to any normal version of Lua
    • The project should either:
      • Leverage Rust's type system (non-exhaustive patterns, especially) to guarantee that tools can be recompiled with forks of this project with zero changes.
      • Or, use a technique similar to an Entity Component System to implement extended tokens and AST nodes.

Contributing

Contributions are welcome -- there is a lot of work to be done!

Mab supports Rust 1.26 stable and newer.

There is already a fairly sizable test suite implemented as a "parse by example" system. Test file inputs are located in parse_examples/source.

The test runner (cargo test) will read, tokenize, and parse these source files and check the last-known good results folder (parse_examples/results) to see if they match what they did before.

If you're making changes that modify the parser's AST, delete the corresponding serialized token list and AST JSON files. When you run the test runner next, it will generate files that should be manually reviewed and submitted alongside your change. Git's diff viewer can help identify if what was changed was intentional.

Be careful with line endings when developing on Windows. The repository has a .editorconfig file as well as a .gitattributes file to try to guarantee that all Lua files have LF line endings as opposed to CRLF. Checking in a parse by example token list with CRLF line endings baked into it will cause CI to fail.

License

This project is available under the terms of The Mozilla Public License, version 2.0. Details are available in LICENSE.

mab's People

Contributors

amaranthinecodices avatar andyfriesen avatar lpghatguy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mab's Issues

Style preservation in AST

We probably want to hold off on jamming style information into the AST until we have complete grammar coverage.

How can this be done in a way that isn't too cumbersome?

A sample block, with all the places we care about whitespace being marked:

<HERE>if <expression><HERE>then
    <chunk>
<HERE>elseif <expression><HERE> then
    <chunk>
<HERE>else
    <chunk>
<HERE>end

Since we always consume whitespace before a token, any whitespace before <expression> or <chunk> belongs to those AST nodes instead, which simplifies things a bit.

Remove 'prefix' field in favor of Token-level whitespace/comments?

Having to have an EndOfFile token is awkward. It might make more sense if the parser rules are a little smarter and comments and whitespace appeared in stream, next to tokens.

Tokens would have another layer added to them, something like:

enum Token<'a> {
    Semantic(ActualToken<'a>),
    Whitespace(Cow<'a, str>),
    Comment(Comment<'a>),
}

Create version of AST for each version of Lua

It should be possible to validate that the given AST is valid for a given version of Lua by casting between ASTs.

For example:

let ast = parse(source);
let ast_5_1 = ast.to_version_5_1().expect("Source was not valid Lua 5.1! (maybe it used goto?)");
let new_ast = ast_5_1.to_version_agnostic(); // What do we call this?

assert_eq!(ast, new_ast);

Create parse rule for local declaration

I had some conversations about how to structure parsing of local declarations, and I think there should be one rule that branches to both local variables and local functions.

I'm on mobile from bed so I can't go into details, but I wanted to get this down before I forgot! ๐Ÿ˜…

Test coverage?

Right now test coverage for Rust is kind of a mess, but it's possible to get it working on Linux, which we're building tests on. This thread has a tutorial on collecting test coverage; the first reply has a .travis.yml snippet for integrating kcov, travis, and coveralls.io. It might be nice to get some coverage metrics working early, so we can keep track of what needs to be tested and what's okay.

Benchmarks

Once the grammar is finished, we should start tracking how quick the parser is.

I hope it's fast?

Emitter

I'll keep this issue open until it's done, and start filling out missing features once the emitter can do anything at all.

ExpressionList can match trailing commas

This is a departure from Lua syntax, where a trailing comma in an expression list is a parse error:

print(1, 2,)

ParseExpressionList does not handle this right now; it will parse a trailing comma perfectly fine.

Document test harness usage

Right now, the test system is a little bit unintuitive. For contributors, it would be useful to document the expected flow when working on the parser.

Pick cool name

I don't know what to call this project. Suggestions? Something moon related?

Error reporting

Error reporting coverage is pretty spotty, and the idea probably needs to be revisited.

Finish grammar

In rough order of importance:

  • Precedence (I've been putting this off -- simple explanation of precedence climbing/Pratt parsing)
  • Assignment (depends on varlist)
  • Function Declarations
    • Correct function name declaration (currently only identifiers are allowed)
    • Vararg
    • Anonymous function expression
  • String literals
    • Regular single and double quote forms
    • Long-form strings
  • Comments
    • Infrastructure in tokens for comments
    • Single line comments
    • Multi-line comments
  • Generic For (for-in)
  • Remaining binary operators
  • Function call without parentheses
  • Semicolon as statement separator
  • ParenExpression ('(' expression ')')
  • Lua 5.2 syntax
    • Goto and labels
  • Lua 5.3 syntax
    • Bitwise operators

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.