Code Monkey home page Code Monkey logo

Comments (11)

CyberShadow avatar CyberShadow commented on August 21, 2024

Hmm, both interpretations are correct... What about:

case 1: // comment
case 2:

Here, there is no StatementList, only a LineComment.

What do you think?

How do other grammars solve this problem?

from tree-sitter-d.

WebFreak001 avatar WebFreak001 commented on August 21, 2024

in libdparse

case 1: // comment
case 2:
	foo();
	break;
case 3:
	bar();
	goto default;
default:
	baz();

would be

CaseStatement 1 []
CaseStatement 2 [foo(), BreakStatement]
CaseStatement 3 [bar(), GotoStatement]
DefaultStatement [baz()]
  • a bunch of fluff around each ast node. Basically libdparse differentiates between parseStatements and parseStatementsNoCaseNoDefault. The latter is used inside cases and defaults to stop at case/default statements.

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

Well, that's not the problem - the question is what to do with the comment node.

Could this be handled by an indentation rule instead of changing the tree-sitter grammar?

from tree-sitter-d.

WebFreak001 avatar WebFreak001 commented on August 21, 2024

we changed to what Roslyn does, they describe it in a bunch of detail. Basically the lexer first lexes all the tokens, then all comment and whitespace tokens, called "trivia", are moved into adjacent "source tokens". AST nodes then contain a slice of their source tokens, which in turn reference the trivia tokens, so you can reconstruct source code exactly as it was input, mostly without caring for whitespace or comments when moving stuff around.

I don't think it's a good idea to include comments inside the AST as dedicated nodes because they can basically appear anywhere and would make AST processing a whole lot more painful.

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

Thank you for the insight. This repository hosts (and this issue discusses) an implementation of a D grammar for tree-sitter. As such, we need to have comments as separate nodes in order to be able to syntax-highlight (or, here, indent) them. I may have misunderstood and I apologize if I did, but your suggestion is inapplicable to the issue at hand.

The D grammar (and therefore this tree-sitter grammar, which is generated from the D grammar) does allow comments to appear anywhere. In the D grammar this is done by describing this informally, but in tree-sitter a special feature is used ("extras"). The feature still does cause comments to appear as nodes in the tree for the reasons above.

from tree-sitter-d.

FabArd avatar FabArd commented on August 21, 2024

To define the context of the problem, we have two entities:

  • the d language and its syntax which is used by its compiler
  • tree-sitter

The objectives of each are different:

  • the syntax of the D language is to allow its compiler to generate machine code
  • tree-sitter allows the programmer to write programs to be compiled by the compiler

Both objectives are as essential as each other.

Indeed a comment is totally useless for a compiler: it will not generate any code.
On the other hand, for tree-sitter a comment is very important because it must be highlighted by the
coloring and indented so that the programmer can take it into account.

It seems to me that the problem is not the comment but where it should be placed in the AST :

As WebFreak001 mentioned each "case" should have a block even if it is empty:

CaseStatement 1 []
CaseStatement 2 [foo(), BreakStatement]
CaseStatement 3 [bar(), GotoStatement]
DefaultStatement [baz()]

So in my opinion (correct me if I'm wrong), the solution to this problem would be to systematically have a
"scope_statement_list:" node after each "case".

For the following code :

switch (X) {
    case test1: // Comment1
    case test2:
        // Comment2
    case test3:
        // Comment3;
        return true;
    default:
        return false;
}

We should have :

switch_statement:
  expression:
    primary_expression:
      identifier:
  block_statement:
    statement_list:
      case_statement:  --> case test1:
        argument_list:
          primary_expression:
            identifier:
        scope_statement_list:
          line_comment:  -- > //comment 1
      case_statement:  --> case test2:
        argument_list:
          primary_expression:
            identifier:
        scope_statement_list:
          line_comment:  -- > //comment 2
      case_statement:  --> case test3:
        argument_list:
          primary_expression:
            identifier:
        scope_statement_list:
          line_comment: -- > //comment 3
            return_statement:
              expression:
                primary_expression:
      default_statement:  --> default case
        scope_statement_list:
          return_statement:
            expression:
              primary_expression:

We have a similar problem with the 'if' without block.
But I'll open a new issue to not complicate this one.

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

So in my opinion (correct me if I'm wrong), the solution to this problem would be to systematically have a
"scope_statement_list:" node after each "case".

Indeed, but I am uneasy about this approach because 1) it deviates from the formal language grammar, and 2) it changes the meaning of the code. The presence of a statement list is significant, due to things like fallthrough and case ranges. Consider e.g.:

switch (expr)
{
// Indent this where?
case 1:
// Indent this where?
..
// Indent this where?
case 3:
// Indent this where?
case 4:
// Indent this where?
    break;

// Indent this where? This could be a description of case 5 below.
case 5:

I think an indentation rule would make more sense here, would it be possible to do that instead?

from tree-sitter-d.

FabArd avatar FabArd commented on August 21, 2024

Why did you say : "it deviates from the formal language grammar" :

11.12.1 No Implicit Fall-Through

A ScopeStatementList must either be empty, or be ended with a ContinueStatement, BreakStatement,
ReturnStatement, GotoStatement, ThrowExpression or assert(0) expression unless this is the last case.

It is clearly written : "must either be empty"
so it does not changes the meaning of the code.

Logically everything that is written after the sign ":" of a case means that they belong to the scope of this case.
In other words, the scope of a "case" starts at ":" and ends at the next case, default or "}" of the switch.

In you example the indentation should be (whith an indentation length of 4):

switch (expr)
{
    // this comment belong to the switch scope
    case 1:
        // this comment belong to case 1 scope
        ..
        // this comment belong to case 1 scope also
    case 3:
        // this comment belong to case 3 scope
    case 4:
        // this comment belong to case 4 scope
        break;

        // this comment belong to case 4 scope also
    case 5:
...

You can also have :

switch (expr)
{
    case 1: //this comment belong to case 1 scope
        //this comment belong to case 1 scope also
    case 2:
    ...
}

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

Because, depending on how you look at it, an empty statement list is not the same as the absence of a statement list. Otherwise, the grammar is redundant / ambiguous / erroneous in allowing both an empty ScopeStatementList and the absence of a ScopeStatementList, and we should fix it there.

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

Looking at this again, the above can't work because tree-sitter does not support potentially-empty rules (rules that match the empty string) and the D grammar doesn't have any such rules either.

Honestly I don't see how it's even possible to fix it in the tree-sitter (or D) grammar. The comment rules are in extras. We can't make ScopeStatementList non-optional. We don't have a mechanism of telling tree-sitter where to put the comment nodes. The only way to fix this would be using an indentation rule. If we had a mechanism that would allow moving extras nodes like comments around after the tree is built, such a fix could go there, but there is no such mechanism.

As such, as far as I can see, this is not possible to fix by changing anything in this repository. If someone would like to contribute an initial set of indentation rules, we can look at fixing it there, but currently there is nothing to fix... so I'm going to close this issue for now.

from tree-sitter-d.

CyberShadow avatar CyberShadow commented on August 21, 2024

@FabArd Would you like to submit the work you've done for integrating this grammar into editors so far? We can then iterate to improve it here.

from tree-sitter-d.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.