Code Monkey home page Code Monkey logo

Comments (11)

mrkkrp avatar mrkkrp commented on September 28, 2024 1

"a:x,b:err or" resulted in a better error message because the sepEndBy1 combinator did not match the separator "," after fully consuming the second key-value pair. This way sepEndBy1 as a whole succeeded and then you probably have eof somewhere, which is actually what produces the error that you see.

from megaparsec.

bristermitten avatar bristermitten commented on September 28, 2024

I wrote my own combinator

sepEndBy' :: Parser a -> Parser sep -> Parser [a]
sepEndBy' p sep = do
    x <- try p
    xs <- many (try (sep >> p))
    pure (x : xs)

and it seems to work, but not sure if this is an optimal way

from megaparsec.

mrkkrp avatar mrkkrp commented on September 28, 2024

Can you show declaration? I think the issue here is that when declaration fails it completely backtracks (I don't know why, do you wrap the whole thing with a try by chance?) What you want is to "commit" early enough to parsing a declaration, that is, you need your declaration parser to consume input so that it cannot backtrack anymore. This way sepEndBy won't be able to succeed in the way you are observing and you should get a more helpful and localized error message. Your custom combinator sepEndBy' forces at least one occurrence of p (in that it is similar to sepEndBy1) and so it appears to solve the problem for the first declaration in your input, however you may find that the second and later declarations still suffer from the same problem.

from megaparsec.

bristermitten avatar bristermitten commented on September 28, 2024

That explanation makes sense, thanks for your response. How could I make it commit?

declaration currently looks like this:

declaration :: ModuleName -> Parser (Declaration Frontend)
declaration = liftA2 (<|>) defDec letDec

defDec :: ModuleName -> Parser (Declaration Frontend)
defDec modName = do
  symbol "def"
  name <- NVarName <$> lexeme varName
  symbol ":"
  ty <- type'
  let annotation = TypeAnnotation name ty
  let declBody = ValueTypeDef (Just annotation)
  pure (Declaration modName name declBody)

letDec :: ModuleName -> Parser (Declaration Frontend)
letDec modName = do
  (name, patterns, e) <- L.nonIndented sc letRaw
  let value = Value e patterns Nothing
  pure (Declaration modName (NVarName name) value)

letRaw :: Parser (MaybeQualified VarName, [Frontend.Pattern], Frontend.Expr)
letRaw = do
    ((name, patterns), e) <- optionallyIndented letPreamble element
    pure (name, patterns, e)
  where
    letPreamble = do
        symbol "let"
        name <- lexeme varName
        patterns <- sepBy (lexeme pattern') sc
        symbol "="
        pure (name, patterns)

To allow the syntax def <name> : <type> and let <name> = <body> with some messy indentation handling ;)
Let me know if you need to see any more code. Thanks!

from megaparsec.

mitchellwrosen avatar mitchellwrosen commented on September 28, 2024

@knightzmc Check out the headed-megaparsec package for some inspiration. The technique used there, which I quite like and often copy around to various projects, is to have a parser either return a value as normal (with the usual alternative semantics) or return another parser (which effectively "commits" to the current branch, and returns the parser to run on the remainder of the input).

However, I do wonder whether another package is even necessary. Could megaparsec itself just expose a commit/cut combinator which throws away the failure continuations of a parser? (Or perhaps there's a different way of committing that's already present?)

from megaparsec.

bristermitten avatar bristermitten commented on September 28, 2024

@mitchellwrosen I had a look at that package, seems to solve my problem! But I agree that there should be an easier way of doing this which doesn't require external libraries (and a lot of boilerplate converting to and from Parsec and HeadedParsec everywhere)

from megaparsec.

mrkkrp avatar mrkkrp commented on September 28, 2024

From what I see in your code it should work as it is, but it is still incomplete, so I cannot be sure. @knightzmc Can you provide a repository with complete source code and an example input? I could then give it a try and it would perhaps be clearer what is going on there.

from megaparsec.

BlueNebulaDev avatar BlueNebulaDev commented on September 28, 2024

I'm having a similar issue.

I'd like to parse either a list of key-value pairs (separated by :), or a single value. For instance:

  • a:x,b:y should be parsed as list of two key-value pairs [("a","x"), ("b","y")].
  • x should be parsed as the single value "x".

Here is a very minimal, simplified grammar. It doesn't support white spaces and both keys and values can only be identifiers. But it's enough to show the issue.

type KeyValue = (String, String)
data Val = KeyValList [KeyValue] | Val String deriving Show

ident :: Parser String
ident = some letterChar

keyVal :: Parser KeyValue
keyVal = do
    k <- ident
    ":"
    v <- ident
    pure (k, v)

prog :: Parser Val
prog = try (KeyValList <$> keyVal `sepEndBy1` ",") <|> (Val <$> ident)

This thing handles correctly the two examples I showed above, and reports the correct error for invalid inputs like a:x,b:err or or err or.

However the error it reports for an input like a:x,b:y,c or a:x,b:y,err or,c:z is not what I would like to see:

Input: "a:x,b:y"
Output: KeyValList [("a","x"),("b","y")]
✅

Input: "x"
Output: Val "x"
✅

Input: "a:x,b:err or"
1:10:
  |
1 | a:x,b:err or
  |          ^
unexpected space
expecting ',', end of input, or letter
✅


Input: "err or"
1:4:
  |
1 | err or
  |    ^
unexpected space
expecting end of input or letter
✅


Input: "a:x,b:y,c"
1:2:
  |
1 | a:x,b:y,c
  |  ^
unexpected ':'
expecting end of input or letter
❌

Input: "a:x,b:y,err or,c:z"
1:2:
  |
1 | a:x,b:y,err or,c:z
  |  ^
unexpected ':'
expecting end of input or letter
❌

When the input is a:x,b:y,c or a:x,b:y,err or,c:z, I would like to see an error after the last , that parsed correctly.

Is there any way to get Megaparsec to report the error I wish to see?

from megaparsec.

mrkkrp avatar mrkkrp commented on September 28, 2024

@BlueNebulaDev with your definition of keyVal it cannot backtrack, since once it consumes at least one letter of ident it is already committed. Therefore somewhere after that final , it indeed fails, like you expect, but because you also have try around (KeyValList <$> keyVal `sepEndBy1` ",") that whole part backtracks to the very beginning of input. Next, ident gets a chance to run and fails with the error that you observe. You need to remove try from your definition of prog and instead think when it makes sense for keyVal to commit. It looks like as soon as there is : we can be sure that it should be a key-value pair so:

keyVal :: Parser KeyValue
keyVal = do
    k <- try (ident <* ":")
    v <- ident
    pure (k, v)

Try this and it should behave as you want in all four cases.

Still waiting a complete example from @knightzmc which should be fixable without resorting to any third-party libraries, but I'm happy to be proven wrong.

from megaparsec.

BlueNebulaDev avatar BlueNebulaDev commented on September 28, 2024

Thanks! After following your suggestion, the errors my parser is reporting are much better. I'm not sure I understand why a:x,b:err or gave the expected error even without this change though. I'll try to play with dbg to understand exactly what happens.

from megaparsec.

bristermitten avatar bristermitten commented on September 28, 2024

Hey apologies for the delay. This code is pretty outdated now but if you're still happy to take a look I'd appreciate it!

https://github.com/ElaraLang/elara/blob/3de6a66f82a86a45726dc3b7aa1286bd7aaa6209/src/Elara/Parse/Declaration.hs

from megaparsec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.