Comments (4)
Basically, when there's an error - use indentation level as a heuristic to resume at the next declaration.
Yeah, Tree-sitter doesn't do this. I don't think that would be a helpful approach in this particular case anyway, because the error would not be detected until the middle of the second declaration, unless I'm missing something.
from tree-sitter-haskell.
Hey @dpren, it'd be great to have another set of 👀 on this parser. Glad you're trying it out!
Could you explain the behavior you expect in a bit more detail? A better error recovery?
It looks like right now, the parser can't detect an error until the end of file, because it sees the second =
token as a valid variable_operator
.
@rewinfrey Is it correct that a single =
can be used as an operator? I think we might want to restrict the definition of variable_symbol
so that this won't happen.
If I hack that change in, then the parser can detect the error a bit earlier, at the second =
instead of at the EOF. We then get a better recovery:
(module [0, 0] - [3, 0]
(function_declaration [0, 0] - [2, 9]
(variable_identifier [0, 0] - [0, 5])
(ERROR [0, 6] - [2, 5]
(function_application [0, 9] - [2, 5]
(parenthesized_expression [0, 9] - [0, 12]
(integer [0, 10] - [0, 11]))
(variable_identifier [2, 0] - [2, 5])))
(function_body [2, 6] - [2, 9]
(integer [2, 8] - [2, 9]))))
Basically, we treat the = ((1) test
as an error and parse the remainder as a single declaration. That's now a pretty reasonable recovery, seeing as how the error cannot theoretically be detected until the second =
.
from tree-sitter-haskell.
@dpren For some context on Tree-sitter's error recovery algorithm - It currently works by skipping one or more tokens and inserting at most one token, in some sequence that overlaps or is adjacent to the error detection point. Tree-sitter tries several ways of recovering from the error and chooses the one that ends up minimizing a pre-defined 'cost' metric.
Because it uses LR parsing, Tree-sitter always detects an error at the earliest possible point in the file, which in this case is the second =
sign. So it can only perform sequences of skipping / insertion operations that touch that token. That's why it can't simply skip the redundant open-paren or insert a closing paren to match it: those operations are not adjacent to the point of error detection, so they would be prohibitively expensive to consider.
from tree-sitter-haskell.
@maxbrunsfeld Thanks for the detailed reply! I'm looking for better recovery. For context, I'm considering tree-sitter for an IDE focused parser for the Elm language.
This last section "Recovery" of this HN comment sums up what I have in mind: https://news.ycombinator.com/item?id=13918175
Basically, when there's an error - use indentation level as a heuristic to resume at the next declaration.
I'm guessing tree-sitter can't support this sort of thing? I could always re-parse sections manually, but at that point I wonder if I should just be using a different tool or going handwritten.
from tree-sitter-haskell.
Related Issues (20)
- Combining characters in identifiers are not parsed correctly HOT 1
- Include . from qualified modules and variables HOT 6
- Segfault on large files (in Neovim) HOT 1
- Upgrade node-gyp dependency HOT 2
- Components parser as type when they are not HOT 1
- Include ! from strictness annotations
- exp_section_right not parsed when containing a hash HOT 3
- Incorrect parse for function with where-clause and comments HOT 4
- Can't npm install tree-sitter-haskell on Mac M3 Node.js v20.10? HOT 1
- Incorrect parse due to top-level splices HOT 11
- Failed builds due to very big file(s). HOT 3
- Qualified/unqualified module paths colored differently HOT 3
- Typed Template Haskell quotations / splices not handled correctly HOT 3
- "finally" is highlighted like a language keyword HOT 1
- Hangs when highlighting/parsing `data Aa = Bb | Cc | ` HOT 3
- Misparse of explicit-braced code
- UnicodeSyntax support HOT 10
- "undefined symbol: tree_sitter_haskell_external_scanner_create" when running "tree-sitter test" HOT 7
- Support `OverloadedRecordDot` HOT 8
- I added three more symbols for built-in syntax.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tree-sitter-haskell.