Comments (10)
there is a generic for it
I'm a little unclear on what you mean here. Are you referring to the generics in the ParseError definition, L, T, E? Or do you mean the ParseError::User field?
The ParseError generics are somewhat described here: https://docs.rs/lalrpop-util/0.12.1/lalrpop_util/enum.ParseError.html
The User field is described pretty thoroughly here.
There's of course always room for documentation improvement, but I'm not really clear on what you want to see improved here. Can you be more specific about what info you feel is missing from the docs that would be helpful?
from lalrpop.
Oh, oops. I'm talking about fully custom parse errors. Like instead of this returning ParseError::UnrecognisedToken:
let a = (3;
you can use the error recovery or something similar to make it UserError::MissedBracket.
from lalrpop.
I think I'm getting closer to understanding you. Perhaps you're talking about one of these things?
- The
grammar
statement at the top of a .lalrpop file can take arbitrary parameters, that become part of the generated parse functions, and can be accessed in the grammar. This is (among other things) used for error recovery. I do think this is fairly poorly documented right now, and could probably have improved documentation. - The ErrorRecovery struct provided by lalrpop wraps the normal ParseError enum. The ParseError enum has a User field that can be used for custom errors to your hearts content. That sounds more like what you're looking for. But I do think it's pretty thoroughly documented here. That functionality can do what you describe in terms of returning user errors. If that's what you're looking for, then I still could use more specifics about what is missing from the linked docs.
from lalrpop.
I mean like instead of how its documented (errors on the action code) you can make errors if the parser fails, so you can have more descriptive errors.
Also it'd be nice to say the rule name instead of a list of tokens, since its normally more descriptive and readable for the average user over a bunch of regex-s.
from lalrpop.
Ive looked through the source and in the ParseError there is a generic for it,
Can you point me at what specifically in the source you're looking at? I'm not finding whatever you see here.
I mean like instead of how its documented (errors on the action code) you can make errors if the parser fails, so you can have more descriptive errors.
So this is fundamentally a feature request for an ability for users to control ParseErrors? How do you envision this working?
Also it'd be nice to say the rule name instead of a list of tokens, since its normally more descriptive and readable for the average user over a bunch of regex-s.
Unless I'm missing something, this sounds like a separate unrelated request, in which case it should get its own issue to keep discussions straight.
from lalrpop.
Can you point me at what specifically in the source you're looking at? I'm not finding whatever you see here.
ParseError<L, T, E>
So this is fundamentally a feature request for an ability for users to control ParseErrors? How do you envision this working?
Each rule can add an attribute to the error with a more specific kind, depicting things like the rule it errored on, etc.
Unless I'm missing something, this sounds like a separate unrelated request, in which case it should get its own issue to keep discussions straight.
Alright, got it
from lalrpop.
ParseError<L, T, E>
Okay, that generic is only used in the User field, which is fully documented in the "fallible actions" section of the book I've linked a few times. It sounds like your issue is that you don't want to use the User field via action code as it is documented, you'd like to augment it with more information about the parser grammar.
Perhaps someone with more LR(1) knowledge than me will chime in with a better answer, but I think the fundamental issue around things like "which rule did it error out on?" is that the parser isn't necessarily parsing a particular rule. It has a stack of tokens encountered, and a collection of possible rules it could match. I'm guessing you're envisioning a situation like:
MyRule:
TokA TokB TokC TokD
where we've seen TokA
followed by TokB
, but then we see TokE
instead of TokC
The problem is that in the general case, could have:
MyRule:
TokA TokB TokC TokD
MyOtherRule:
TokA TokB TokF TokG
MyThirdRule:
TokA TokB TokH TokI
etc. So if we've seen TokA
followed by TokB
and now we see TokE
, which rule are we parsing? We have been potentially parsing all three, until the unexpected token fails to match any of them. Since we don't know the rule we're parsing, it's not clear how to apply some custom rule-specific error handling, since there are three rules, which could each have their own error handling.
I suspect the cleanest answer here is to augment your parser so that you have cases for the expected cases you would like to have special errors for, and then use the fallible actions functionality to make special errors there.
from lalrpop.
This conversation reminded me of http://moscova.inria.fr/~fpottier/menhir/manual.pdf#section.11. There has been some cool work on menhir which I think addresses this issue. Maybe there is a simplified version of what they do though(Some staticly known error messages which over approximate the set of possible errors that lead to it). I would imagine this would be a fair amount of work.
from lalrpop.
How would I do what @emm312 suggested, and is supposed to be possible:
you can use the error recovery or something similar to make it UserError::MissedBracket.
In this code for example:
DefWindow: Statement = {
"(defwindow" <n:Symbol> <a:DefWindowArg*> ")" => {
DefWindow::new(n, a).into()
},
}
I want to detect if the name (n:Symbol
) failed, and somehow get a custom error (one that could be recovered from so other "defwindow" could be contined to be validated), that the name is invalid / missing.
I tried creating a "DefWindowName" rule:
DefWindowName: Result<Symbol, crate::ast::ParseError> = {
Symbol => Ok(<>),
! => {
println!("{:?}", <>);
},
};
But the issue here is, that it get's called multiple times. Let's take this input: (defwindow :geometry "x")"
. It would get called once for the actual problem token :
, and once for the "x"
.
I'm assuming this is, because larlpop considers the error to be recovered from in the DefWindowName
(but doesn't really explain to me why it ties to parse another name...). Thus, I also tried to return it like this:
DefWindowName: Result<Symbol, crate::ast::ParseError> = {
Symbol => Ok(<>),
! =>? {
Err(ParseError::User { error: MyError }
},
};
But the issue with that code is, that I can't recover from it in the parent, e.g. DefWindow
, and the whole parsing exits.
from lalrpop.
Here is how we do it in Nickel, not sure if that would fit your use-case, but it does seem so:
- We have an error node in the AST. So that we can recover from a local error (for example in the LSP), and still continue to parse other parts of the source file: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/term/mod.rs#L220-L223
- In the grammar, instead of bailing out on the first error - as you said, this would abort the parsing, which we don't want - we have a global parameter where we put all the errors we encountered: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L75
- Whenever we encounter an error, we push the error to the global error lists, and return an AST error node without using a fallible action (so, from the point of view of LALRPOP, there's in fact to error at all): https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L438-L447
Here is an example of using this Error
rule: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L249-L251
Doing so, you accumulate all the errors, and produce a valid AST as well. Then you can decide what to do with it. In theory you don't even need to accumulate the errors and can reconstruct the list from walking the AST, however if the AST is huge this can be costly to do so (and there might not even be a single error), so it's easier to just keep a duplicate list that is easily accessible.
from lalrpop.
Related Issues (20)
- Assertion instead of error message HOT 3
- Additional bracket make generated code fail to compile. HOT 1
- Online yacc/lex grammar editor/tester HOT 2
- Grouping rules together HOT 1
- Confusion about Ambiguous regex HOT 1
- Parsing from standard input? HOT 2
- Release 0.20.1 not SemVer compatible? HOT 7
- How can I parse newline-separated code with lalrpop? HOT 2
- Lane Table not quite correct? HOT 2
- Access to span in fallible actions HOT 1
- Replace term with anstream? HOT 7
- Matching order of external tokens is not preserved during generation HOT 2
- Tokens that are not enums
- fail to enable lalr parser HOT 2
- tutorial references version 0.20.1, which was yanked from crates.io HOT 1
- Update (external library) lexer example for Logos 0.13+ HOT 1
- Default lexer edge case HOT 6
- Why is this grammar conflict? HOT 1
- Please add a option to not skip whitespaces HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lalrpop.