Code Monkey home page Code Monkey logo

tree-sitter-d's Introduction

tree-sitter-d

This repository hosts a tree-sitter grammar for the D programming language.

About

The process of generating the grammar consists of a number of steps. The following lists the full process that the grammar goes through.

  1. The origin of the grammar described here is the official specification of the D programming language.

    Though it can be perused online, we use the source code, which is written in DDoc (the D documentation macro processor) and is maintained in the dlang/dlang.org GitHub repository.

    The generated/dlang.org submodule points to the copy that is used by this repository, which may contain some fixes (whether to make it more machine-readable or to more accurately describe the language) which have not been upstreamed yet.

  2. The grammar is then consumed by a custom program which attempts to automatically convert it as much as feasible into a tree-sitter grammar. This program and its output are located in the generated branch.

    The first step of processing the grammar is to parse it. Thus, the grammar specification above is parsed into a DOM representing the document structure, with one node per DDoc macro.

    Though the canonical way to consume DDoc documents is to specify a file with custom macro definitions and to run DMD's DDoc macro processor using it, the approach used here was to implement a simple DDoc parser instead (which also helped validate our assumptions about DDoc syntax).

  3. The DDoc DOM is then converted to the initial grammar definition, which roughly corresponds to tree-sitter grammar structure. The conversion is done in the parser module.

  4. After conversion, the grammar passes through a few preprocessing steps. These mold the grammar into a shape which is more useful to be used for typical tree-sitter applications.

    Two main preprocessing steps are:

    • De-recursion, which converts definitions for lists of things from a recursive definition to one using explicit repetition.
      (Example: ImportList)

    • Body extraction, which splits some definitions into two, in which one is the definition "body" containing the operation actually described by the definition's name, and the other is a hidden rule which resolves either to the body or to the next operation with higher precedence.
      (Example: OrOrExpression)

    The grammar is then optimized to reduce redundancies manifested during preprocessing.

  5. The grammar is now ready to be saved to grammar.js, the tree-sitter definition of the grammar.

    The latest version of this generated file can be found in the root of the generated branch.

  6. The generated file is not quite ready to be used, and requires some manual fixups.

    For this purpose, the master branch holds these fixes on top of the generated branch (which is merged into master regularly).

    You can see all manual fixes by comparing the two branches.

    The master branch also hosts the test suite, as well as the custom scanner, which implements D-specific syntax which cannot be described using the declarative tree-sitter grammar, such as nested comments or delimited string literals.

  7. From this point, grammar.js is ready to be passed on to tree-sitter's build process, so the steps below simply describe how any tree-sitter grammar is compiled.

    tree-sitter-cli is used to generate the parser C source code from grammar.js. If installed via npm (i.e. npm install), this can be done by running:

    ./node_modules/.bin/tree-sitter generate
    

    This will populate the src directory, as well as create additional build files.

  8. Finally, the C source code is compiled into a loadable shared library, which can be directly used by a tree-sitter-enabled application.

    This step happens automatically when running tree-sitter test. Alternatively, invoking tree-sitter build-wasm builds a WebAssembly module instead of a native shared object.

Contributing

If you would like to help, please have a look at the list of open issues.

If you spot an error in the grammar or the way it behaves and would like to fix it, the first step would be to identify the correct place to perform the fix.

  • If the problem is due to an incorrect grammar definition, and the error is also present in the official specification, then please fix and send a pull request there.

  • Otherwise, if you believe that the problem is due to a translation error between the official grammar and the generated grammar.js file, then it may be due to a bug in the generator program.

  • Finally, if the problem is tree-sitter specific or cannot be fixed through the above avenues, then the fix should be applied to grammar.js on the master branch.

If you are having trouble with anything, please don't hesitate to open an issue.

tree-sitter-d's People

Contributors

cybershadow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

tree-sitter-d's Issues

case_statement do not include first comment line in scope_statement_list:

Hi,

In the code below, the comment is not included in the "scope_statement_list:" of "case_statement:" which makes indentation incorrect for this line.

override bool handleAction(const Action a) {
    switch (a.id) {
        case IDEActions.EditPreferences:
            //showPreferences();
            return true;
        default:
            return super.handleAction(a);
    }
    return false;
}
                    case_statement:
                      argument_list:
                        postfix_expression:
                          primary_expression:
                            identifier:
                          identifier:
                      line_comment:
                      scope_statement_list:
                        statement_list_no_case_no_default:
                          return_statement:
                            expression:
                              primary_expression:

All lines after a "case" should be included in the scope of this "case" even if it is a comment.

Want parser.c and parser.h in default master branch

Using this module today from something like helix, one has to clone the repo, and then do the final steps to generate the parser.c file. This makes it really hard to use from e.g. the Helix editor. It took me a bit to figure out that this step was needed.

Make use of what we have so far

Can we have a D grammar be added to the Emacs package tree-sitter-langs eventhough the grammar is not 100 percent correct? So we can start to benefit in Emacs from what we have so far?

Go through `conflicts` and ensure they are correct

Currently our grammar has a very long list of conflicts declarations. These were all stuffed there semi-automatically to get the grammar to compile in the simplest way possible.

However, many of these are likely wrong. For each entry, the correct fix can be any one of:

  • a conflicts declaration (i.e. what we have now)
  • a prec call to indicate precedence
  • a prec.left or prec.right call to indicate associativity
  • a fix in the official language spec, if the root of the problem is there.

In any case, each fix should be associated with a test (or several) which verifies that our tree-sitter grammar parses code correctly in this circumstance.

Currently, there are the following unresolved conflict clauses:

  • [$.qualified_identifier],
  • [$.storage_class, $.attribute],
  • [$.storage_class, $.enum_declaration],
  • [$.storage_class, $.function_attribute_kwd],
  • [$.storage_class, $.type_ctor],
  • [$.storage_class, $.type_ctor, $.attribute],
  • [$.storage_class, $.attribute, $.synchronized_statement],
  • [$.attribute, $.static_destructor],
  • [$.attribute, $.static_constructor],
  • [$.attribute, $.synchronized_statement],
  • [$._decl_def, $._declaration],
  • [$.alias_assignment, $.qualified_identifier],
  • [$.template_instance, $.mixin_qualified_identifier],
  • [$.auto_assignment, $.qualified_identifier, $.auto_func_declaration],
  • [$.var_declarator, $.func_declarator],
  • [$.primary_expression, $.template_instance],
  • [$.type_ctor, $.variadic_arguments_attribute],
  • [$.in_out, $.variadic_arguments_attribute],
  • [$.parameter_attributes],
  • [$.storage_class, $.synchronized_statement],
  • [$._declaration, $._non_empty_statement_no_case_no_default], // ???
  • [$.qualified_identifier, $.template_sequence_parameter],
  • [$.qualified_identifier, $.template_type_parameter],
  • [$.qualified_identifier, $.template_instance],
  • [$.unary_expression, $.parameter],
  • [$.new_expression],
  • [$.type],
  • [$.attribute, $.return_statement],
  • [$.primary_expression, $.postblit, $.constructor, $.constructor_template],
  • [$._decl_def, $.declaration_statement],
  • [$._decl_def, $._non_empty_statement_no_case_no_default],
  • [$.foreach_type_list, $.range_foreach], // TODO fix grammar
  • [$.foreach_type_attributes],
  • [$.parameter, $.template_value_parameter],
  • [$.conditional_declaration], // TODO else precedence
  • [$.or_or_expression, $.and_and_expression], // TODO precedence
  • [$.and_and_expression, $.or_expression], // TODO precedence
  • [$.or_expression, $.xor_expression],
  • [$.xor_expression, $.and_expression],
  • [$._cmp_expression, $.rel_expression, $.shift_expression],
  • [$._cmp_expression, $.rel_expression],
  • [$._cmp_expression, $.identity_expression, $.in_expression],
  • [$._cmp_expression, $.equal_expression],
  • [$._cmp_expression, $.in_expression],
  • [$._cmp_expression, $.identity_expression],
  • [$.shift_expression, $.add_expression],
  • [$.shift_expression, $.cat_expression],
  • [$.add_expression, $.mul_expression],
  • [$._maybe_pow_expression, $.postfix_expression],
  • [$._maybe_pow_expression, $.index_expression, $.slice_expression],
  • [$._maybe_pow_expression, $.pow_expression],
  • [$.super_class_or_interface, $.interface],
  • [$.postfix_expression, $.template_instance],
  • [$.argument_list, $.slice],
  • [$.primary_expression, $.symbol_tail],
  • [$.type_suffix, $.unary_expression],
  • [$.specified_function_body, $.missing_function_body],
  • [$.primary_expression, $.destructor],
  • [$.declaration_block, $.block_statement],
  • [$.parameter_with_member_attributes, $.deallocator],
  • [$.conditional_statement],
  • [$.static_constructor, $.missing_function_body],
  • [$.postblit, $.missing_function_body],
  • [$.array_initializer, $.array_literal],
  • [$.exp_initializer, $.argument_list],
  • [$.array_member_initialization, $.key_expression],
  • [$.struct_initializer, $.block_statement],
  • [$.mixin_type, $.mixin_expression],
  • [$.qualified_identifier, $.symbol_tail],
  • [$.asm_rel_exp, $.asm_shift_exp],
  • [$.mixin_expression, $.mixin_statement],
  • [$.primary_expression, $.synchronized_statement],
  • [$.try_statement],
  • [$.type_suffix, $.array_literal],
  • [$.type_suffix, $.argument_list],
  • [$.shared_static_constructor, $.missing_function_body],
  • [$.static_destructor, $.missing_function_body],
  • [$.parameter, $.template_value_parameter_default],
  • [$.unary_expression, $.template_instance],
  • [$.rel_expression, $.shift_expression],
  • [$.equal_expression, $.shift_expression],
  • [$.in_expression, $.shift_expression],
  • [$.identity_expression, $.shift_expression],
  • [$.cat_expression, $.mul_expression],
  • [$.slice],
  • [$.asm_log_or_exp, $.asm_log_and_exp],
  • [$.asm_log_and_exp, $.asm_or_exp],
  • [$.asm_or_exp, $.asm_xor_exp],
  • [$.asm_xor_exp, $.asm_and_exp],
  • [$.asm_and_exp, $.asm_equal_exp],
  • [$.asm_equal_exp, $.asm_rel_exp],
  • [$.asm_shift_exp, $.asm_add_exp],
  • [$.asm_add_exp, $.asm_mul_exp],
  • [$.asm_mul_exp, $.asm_br_exp],
  • [$.mixin_declaration, $.mixin_expression, $.mixin_statement],
  • [$.shared_static_destructor, $.missing_function_body],
  • [$.if_statement],
  • [$.mixin_declaration, $.mixin_statement],
  • [$.alt_declarator_suffix, $.qualified_identifier],
  • [$.storage_classes],
  • [$.type_ctors],
  • [$.function_contracts],
  • [$.decl_defs],
  • [$.type_suffixes],
  • [$.statement_list_no_case_no_default],
  • [$.catches],
  • [$.pragma_statement, $.empty_statement],
  • [$.empty_declaration, $.empty_statement],
  • [$.static_foreach_declaration],
  • [$.debug_condition],
  • [$.scope_block_statement, $.function_literal_body],
  • [$.labeled_statement],
  • [$._scope_statement, $.function_literal_body],
  • [$._no_scope_statement, $.function_literal_body],
  • [$._no_scope_non_empty_statement, $.function_literal_body],
  • [$.struct_member_initializer, $.labeled_statement],
  • [$.packages],
  • [$.qualified_identifier, $.primary_expression],
  • [$.basic_type, $.primary_expression],
  • [$.qualified_identifier, $.primary_expression, $.symbol_tail],
  • [$._maybe_assign_expression, $.assign_expression],
  • [$._maybe_conditional_expression, $.or_or_expression],
  • [$._maybe_conditional_expression, $.conditional_expression],
  • [$._maybe_or_or_expression, $.and_and_expression],
  • [$._maybe_and_and_expression, $.or_expression],
  • [$._maybe_or_expression, $.xor_expression],
  • [$._maybe_xor_expression, $.and_expression],
  • [$._maybe_shift_expression, $.add_expression],
  • [$._maybe_shift_expression, $.cat_expression],
  • [$.exp_initializer, $._maybe_comma_expression],
  • [$._maybe_asm_rel_exp, $.asm_shift_exp],
  • [$._maybe_asm_exp, $.asm_log_or_exp],
  • [$._maybe_asm_exp, $.asm_exp],
  • [$._maybe_asm_log_or_exp, $.asm_log_and_exp],
  • [$._maybe_asm_log_and_exp, $.asm_or_exp],
  • [$._maybe_asm_or_exp, $.asm_xor_exp],
  • [$._maybe_asm_xor_exp, $.asm_and_exp],
  • [$._maybe_asm_and_exp, $.asm_equal_exp],
  • [$._maybe_asm_equal_exp, $.asm_rel_exp],
  • [$._maybe_asm_shift_exp, $.asm_add_exp],
  • [$._maybe_asm_add_exp, $.asm_mul_exp],
  • [$._maybe_asm_mul_exp, $.asm_br_exp],
  • [$.storage_class, $.attribute, $.shared_static_constructor, $.shared_static_destructor],
  • [$.import_declaration, $.attribute],
  • [$._maybe_storage_class, $._maybe_at_attribute],
  • [$._maybe_storage_class, $._maybe_attribute],
  • [$._module_attribute, $._maybe_attribute],
  • [$._module_attribute, $._maybe_at_attribute],
  • [$._maybe_basic_type, $.primary_expression],
  • [$.type_ctors, $._maybe_in_out],
  • [$.enum_members, $._maybe_anonymous_enum_member],
  • [$._maybe_basic_type, $.basic_type],
  • [$._maybe_attribute, $.pragma_statement],
  • [$._decl_def, $._declaration, $._non_empty_statement_no_case_no_default],
  • [$.var_declarator_identifier, $.alt_declarator_identifier],
  • [$._maybe_add_expression, $.mul_expression],
  • [$.primary_expression, $.with_statement, $.symbol_tail],
  • [$.alt_declarator_inner, $.qualified_identifier, $.primary_expression],
  • [$.alt_declarator_inner, $.primary_expression],
  • [$.alt_declarator, $.qualified_identifier, $.primary_expression],
  • [$.primary_expression, $.template_value_parameter_default],
  • [$.asm_primary_exp],

Impossible to indent a 'then_statement' without block

Hi

It is impossible to indent the line below the second if in the following example:

override bool onMouseEvent(MouseEvent event) {
    bool result = super.onMouseEvent(event);
    if (event.doubleClick) {
        if (onItemDoubleClick !is null)
            onItemDoubleClick(this, selectedItemIndex);
    }
    return result;
}

Indeed this 'then_statement' has no child in the tree to distinguish it as a node to indent:

if_statement:
  expression:
    identity_expression:
      primary_expression:
        identifier:
      primary_expression:
  then_statement:
    expression_statement:
      expression:

I analyzed your grammar and I noticed that you did not allow the generation of the 'NonEmptyStatementNoCaseNoDefault' node.
It is this node that allows you to distinguish an if without a block:

IfStatement:
    if ( IfCondition ) ThenStatement
    if ( IfCondition ) ThenStatement else ElseStatement

ThenStatement:
    ScopeStatement

ScopeStatement:
    NonEmptyStatement
    BlockStatement

NonEmptyStatement:
    NonEmptyStatementNoCaseNoDefault
    CaseStatement
    CaseRangeStatement
    DefaultStatement

incorrect alias_assign instead of assign_expression

Hi,

I am testing syntax highlighting with tree-sitter under emacs (tree-sitter + tree-sitter-langs emacs packages).
I made a test with the following file : dmledit.d

I found several incorrect alias_assign delarations :

52        popupMenu = editPopupItem;
...
131        tb = res.getOrAddToolbar("Standard");
...
134        tb = res.getOrAddToolbar("Edit");
...
156        _filename = filename;
...
291            msg = replaceFirst(msg, " near `", "\nnear `");
...
337        _editor = new DMLSourceEdit();
...
368            auto widgetClassName = widgetsList.selectedItem;
...
374        _preview = new ScrollWidget();

It seems that a rule has been forgotten concerning alias_assign as mentionned at (https://dlang.org/spec/template.html#TemplateDeclaration)
'AliasAssign' must be declared in a template :

The AliasAssign and its corresponding AliasDeclaration must both be declared in the same [TemplateDeclaration].

Upstream our grammar fixes to `dlang/dlang.org`

When tree-sitter-d does not behave correctly due to an error in the language specification, the workflow for fixing the problem is as follows:

  • Make the fix in the dlang.org fork/branch used to generate the tree-sitter grammar
  • Re-run the conversion program, re-creating the grammar.js file on the generated branch, and commit any changes plus the submodule update
  • Merge the generated branch into the master branch
  • Observe the effect of the change (test suite fallout)
  • Amend the merge commit with any necessary fix-ups, such as conflicts and test-suite updates
  • If the grammar change has the desired expected effect, submit it upstream.

Previously, all upstream submissions have been using a linear history. However, as the complexity of the fixes has continued to rise from trivial DDoc syntax fixes to non-trivial grammar changes, reviewing a linear history has become increasingly impractical. For this reason, individual grammar fixes ought to be submitted upstream as individual pull requests.

As our fixes are no longer contained within a single PR, this issue will be used to track the submission status of our grammar fixes:

You can help by reviewing or assisting reviewers of the upstream PRs.

Ensure top community D projects parse successfully

With #2 closed, we now successfully parse all files in DMD's compilable test suite. However, there is still lots of valid code out there which tree-sitter-d fails to parse successfully.

The following is a list of errors which currently result from attempting to parse D source files from the list of D projects used by the community project tester.

Many of these are simply due to implementation bugs in our parser, but some of these may be due to underspecified grammar. Furthermore, since the problem was not detected testing against the DMD compilable test suite, each occurrence represents an opportunity to improve the coverage of the DMD suite by adding a reduced sample of the unparseable code to it.

For each of the problems below, we should:

  • Fix it in the correct place;
  • Add a (reduced) test to our test suite, and/or:
  • If applicable, add a sample of the problem to the DMD test suite, to improve its coverage as well.

Ensure all DMD `test/compilable` tests parse successfully

The test suite for DMD (the reference D compiler) has a category for source files which are expected to compile successfully.

Respectively, they should also parse without errors using our tree-sitter grammar. However, currently this is not the case.

Here are the files that have parse errors right now:

These should be made to compile, either by updating the spec to match the compiler's behavior, or by adding a grammar fix-up (or, in the unlikely case that the test really shouldn't compile, by updating the test suite and fixing the compiler to reject the program).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.