Code Monkey home page Code Monkey logo

Comments (10)

dburgener avatar dburgener commented on June 11, 2024

there is a generic for it

I'm a little unclear on what you mean here. Are you referring to the generics in the ParseError definition, L, T, E? Or do you mean the ParseError::User field?

The ParseError generics are somewhat described here: https://docs.rs/lalrpop-util/0.12.1/lalrpop_util/enum.ParseError.html

The User field is described pretty thoroughly here.

There's of course always room for documentation improvement, but I'm not really clear on what you want to see improved here. Can you be more specific about what info you feel is missing from the docs that would be helpful?

from lalrpop.

emm312 avatar emm312 commented on June 11, 2024

Oh, oops. I'm talking about fully custom parse errors. Like instead of this returning ParseError::UnrecognisedToken:

let a = (3;

you can use the error recovery or something similar to make it UserError::MissedBracket.

from lalrpop.

dburgener avatar dburgener commented on June 11, 2024

I think I'm getting closer to understanding you. Perhaps you're talking about one of these things?

  • The grammar statement at the top of a .lalrpop file can take arbitrary parameters, that become part of the generated parse functions, and can be accessed in the grammar. This is (among other things) used for error recovery. I do think this is fairly poorly documented right now, and could probably have improved documentation.
  • The ErrorRecovery struct provided by lalrpop wraps the normal ParseError enum. The ParseError enum has a User field that can be used for custom errors to your hearts content. That sounds more like what you're looking for. But I do think it's pretty thoroughly documented here. That functionality can do what you describe in terms of returning user errors. If that's what you're looking for, then I still could use more specifics about what is missing from the linked docs.

from lalrpop.

emm312 avatar emm312 commented on June 11, 2024

I mean like instead of how its documented (errors on the action code) you can make errors if the parser fails, so you can have more descriptive errors.
Also it'd be nice to say the rule name instead of a list of tokens, since its normally more descriptive and readable for the average user over a bunch of regex-s.

from lalrpop.

dburgener avatar dburgener commented on June 11, 2024

Ive looked through the source and in the ParseError there is a generic for it,

Can you point me at what specifically in the source you're looking at? I'm not finding whatever you see here.

I mean like instead of how its documented (errors on the action code) you can make errors if the parser fails, so you can have more descriptive errors.

So this is fundamentally a feature request for an ability for users to control ParseErrors? How do you envision this working?

Also it'd be nice to say the rule name instead of a list of tokens, since its normally more descriptive and readable for the average user over a bunch of regex-s.

Unless I'm missing something, this sounds like a separate unrelated request, in which case it should get its own issue to keep discussions straight.

from lalrpop.

emm312 avatar emm312 commented on June 11, 2024

Can you point me at what specifically in the source you're looking at? I'm not finding whatever you see here.

ParseError<L, T, E>

So this is fundamentally a feature request for an ability for users to control ParseErrors? How do you envision this working?

Each rule can add an attribute to the error with a more specific kind, depicting things like the rule it errored on, etc.

Unless I'm missing something, this sounds like a separate unrelated request, in which case it should get its own issue to keep discussions straight.

Alright, got it

from lalrpop.

dburgener avatar dburgener commented on June 11, 2024

ParseError<L, T, E>

Okay, that generic is only used in the User field, which is fully documented in the "fallible actions" section of the book I've linked a few times. It sounds like your issue is that you don't want to use the User field via action code as it is documented, you'd like to augment it with more information about the parser grammar.

Perhaps someone with more LR(1) knowledge than me will chime in with a better answer, but I think the fundamental issue around things like "which rule did it error out on?" is that the parser isn't necessarily parsing a particular rule. It has a stack of tokens encountered, and a collection of possible rules it could match. I'm guessing you're envisioning a situation like:

MyRule:
    TokA TokB TokC TokD

where we've seen TokA followed by TokB, but then we see TokE instead of TokC

The problem is that in the general case, could have:

MyRule:
    TokA TokB TokC TokD

MyOtherRule:
    TokA TokB TokF TokG

MyThirdRule:
    TokA TokB TokH TokI

etc. So if we've seen TokA followed by TokB and now we see TokE, which rule are we parsing? We have been potentially parsing all three, until the unexpected token fails to match any of them. Since we don't know the rule we're parsing, it's not clear how to apply some custom rule-specific error handling, since there are three rules, which could each have their own error handling.

I suspect the cleanest answer here is to augment your parser so that you have cases for the expected cases you would like to have special errors for, and then use the fallible actions functionality to make special errors there.

from lalrpop.

Pat-Lafon avatar Pat-Lafon commented on June 11, 2024

This conversation reminded me of http://moscova.inria.fr/~fpottier/menhir/manual.pdf#section.11. There has been some cool work on menhir which I think addresses this issue. Maybe there is a simplified version of what they do though(Some staticly known error messages which over approximate the set of possible errors that lead to it). I would imagine this would be a fair amount of work.

from lalrpop.

Elias-Graf avatar Elias-Graf commented on June 11, 2024

How would I do what @emm312 suggested, and is supposed to be possible:

you can use the error recovery or something similar to make it UserError::MissedBracket.

In this code for example:

DefWindow: Statement = {
    "(defwindow" <n:Symbol> <a:DefWindowArg*> ")" => {
        DefWindow::new(n, a).into()
    },
}

I want to detect if the name (n:Symbol) failed, and somehow get a custom error (one that could be recovered from so other "defwindow" could be contined to be validated), that the name is invalid / missing.

I tried creating a "DefWindowName" rule:

DefWindowName: Result<Symbol, crate::ast::ParseError> = {
    Symbol => Ok(<>),
    ! => {
        println!("{:?}", <>);
    },
};

But the issue here is, that it get's called multiple times. Let's take this input: (defwindow :geometry "x")". It would get called once for the actual problem token :, and once for the "x".
I'm assuming this is, because larlpop considers the error to be recovered from in the DefWindowName (but doesn't really explain to me why it ties to parse another name...). Thus, I also tried to return it like this:

DefWindowName: Result<Symbol, crate::ast::ParseError> = {
    Symbol => Ok(<>),
    ! =>? {
        Err(ParseError::User { error: MyError }
    },
};

But the issue with that code is, that I can't recover from it in the parent, e.g. DefWindow, and the whole parsing exits.

from lalrpop.

yannham avatar yannham commented on June 11, 2024

Here is how we do it in Nickel, not sure if that would fit your use-case, but it does seem so:

  1. We have an error node in the AST. So that we can recover from a local error (for example in the LSP), and still continue to parse other parts of the source file: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/term/mod.rs#L220-L223
  2. In the grammar, instead of bailing out on the first error - as you said, this would abort the parsing, which we don't want - we have a global parameter where we put all the errors we encountered: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L75
  3. Whenever we encounter an error, we push the error to the global error lists, and return an AST error node without using a fallible action (so, from the point of view of LALRPOP, there's in fact to error at all): https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L438-L447

Here is an example of using this Error rule: https://github.com/tweag/nickel/blob/d899cf5848e6ac0958062376696aa39389de580e/core/src/parser/grammar.lalrpop#L249-L251

Doing so, you accumulate all the errors, and produce a valid AST as well. Then you can decide what to do with it. In theory you don't even need to accumulate the errors and can reconstruct the list from walking the AST, however if the AST is huge this can be costly to do so (and there might not even be a single error), so it's easier to just keep a duplicate list that is easily accessible.

from lalrpop.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.