Code Monkey home page Code Monkey logo

Comments (4)

maxbrunsfeld avatar maxbrunsfeld commented on June 12, 2024 1

Basically, when there's an error - use indentation level as a heuristic to resume at the next declaration.

Yeah, Tree-sitter doesn't do this. I don't think that would be a helpful approach in this particular case anyway, because the error would not be detected until the middle of the second declaration, unless I'm missing something.

from tree-sitter-haskell.

maxbrunsfeld avatar maxbrunsfeld commented on June 12, 2024

Hey @dpren, it'd be great to have another set of 👀 on this parser. Glad you're trying it out!

Could you explain the behavior you expect in a bit more detail? A better error recovery?

It looks like right now, the parser can't detect an error until the end of file, because it sees the second = token as a valid variable_operator.

@rewinfrey Is it correct that a single = can be used as an operator? I think we might want to restrict the definition of variable_symbol so that this won't happen.

If I hack that change in, then the parser can detect the error a bit earlier, at the second = instead of at the EOF. We then get a better recovery:

(module [0, 0] - [3, 0]
  (function_declaration [0, 0] - [2, 9]
    (variable_identifier [0, 0] - [0, 5])
    (ERROR [0, 6] - [2, 5]
      (function_application [0, 9] - [2, 5]
        (parenthesized_expression [0, 9] - [0, 12]
          (integer [0, 10] - [0, 11]))
        (variable_identifier [2, 0] - [2, 5])))
    (function_body [2, 6] - [2, 9]
      (integer [2, 8] - [2, 9]))))

Basically, we treat the = ((1) test as an error and parse the remainder as a single declaration. That's now a pretty reasonable recovery, seeing as how the error cannot theoretically be detected until the second =.

from tree-sitter-haskell.

maxbrunsfeld avatar maxbrunsfeld commented on June 12, 2024

@dpren For some context on Tree-sitter's error recovery algorithm - It currently works by skipping one or more tokens and inserting at most one token, in some sequence that overlaps or is adjacent to the error detection point. Tree-sitter tries several ways of recovering from the error and chooses the one that ends up minimizing a pre-defined 'cost' metric.

Because it uses LR parsing, Tree-sitter always detects an error at the earliest possible point in the file, which in this case is the second = sign. So it can only perform sequences of skipping / insertion operations that touch that token. That's why it can't simply skip the redundant open-paren or insert a closing paren to match it: those operations are not adjacent to the point of error detection, so they would be prohibitively expensive to consider.

from tree-sitter-haskell.

dpren avatar dpren commented on June 12, 2024

@maxbrunsfeld Thanks for the detailed reply! I'm looking for better recovery. For context, I'm considering tree-sitter for an IDE focused parser for the Elm language.

This last section "Recovery" of this HN comment sums up what I have in mind: https://news.ycombinator.com/item?id=13918175

Basically, when there's an error - use indentation level as a heuristic to resume at the next declaration.

I'm guessing tree-sitter can't support this sort of thing? I could always re-parse sections manually, but at that point I wonder if I should just be using a different tool or going handwritten.

from tree-sitter-haskell.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.