Hi! I encountered a problem when using oneOf. To make it easier

When I have done operator parsing, I usually do it by Parsing a <code class="notransla

oneOf doesn't match although one parser in the list matches about cats-parse HOT 7 CLOSED

FloWi commented on May 28, 2024

oneOf doesn't match although one parser in the list matches

from cats-parse.

Comments (7)

FloWi commented on May 28, 2024 1

Hi!
I rewrote my parser to do what I want without using oneOf like this:

  def removeBracesForOrExpression(rule: String): String = {
    //  (a|b)c ==> ac|bc
    //  (a|b)  ==> a|b
    // c(a|b)  ==> ca|cb

    val term = P.charsWhile(_.isLetter)
    val orTerm = P.repSep(term, 2, P.char('|'))
    val bracedOrTerm = orTerm.between(P.char('('), P.char(')'))

    val parser = (term.? ~ bracedOrTerm ~ term.?).map { case ((maybeLeft, orTerms), maybeRight) =>
      orTerms
        .map(orTerm => maybeLeft.getOrElse("") + orTerm + maybeRight.getOrElse(""))
        .toList
        .mkString("|")
    }
    val result = parser.parse(rule)

    result.fold(error => throw new RuntimeException(error.toString), _._2)

  }

  test("removeBracesForOrExpression") {
    assertEquals(removeBracesForOrExpression("(a|b)c"), "ac|bc")
    assertEquals(removeBracesForOrExpression("c(a|b)"), "ca|cb")
    assertEquals(removeBracesForOrExpression("(a|b)"), "a|b")
  }

I struggle to write documentation for that stuff, since I only started using parsers yesterday. So, I guess I have to leave it to the parser (p|b)ro's. ;-)

from cats-parse.

johnynek commented on May 28, 2024

Thanks for filing the issue and trying the library.

We have tests in the repo that verify the property we guarantee but it is always possible there is still some bug.

That said, my guess is this has to do with backtracking. This library does not backtrack by default. So, oneOf can only succeed if none of the items in the oneOf partially parse before failing.

If you want to check this: call .backtrack on each parser you pass into oneOf. If my guess is right, things should work.

Not backtracking by default requires you to think a bit more about how to write your parsers, but the payoff is the parse errors point to the correct offset of the parse failure and you generally get much better performance than parsers that overuse backtracking.

Lastly keep in mind oneOf is order sensitive due to this property. It may be that you can fix your problem just by reordering the arguments so that parsers you can check if they match without consuming input come first.

Hope this helps.

from cats-parse.

FloWi commented on May 28, 2024

Thanks for filing the issue and trying the library.

My pleasure. I like the design of it. Feels a bit more lightweight than Haoyi's fastparse.

Adding backtrack to each of the parsers works indeed. Would you mind elaborating that a bit?
Does it mean the individual parsers that go into oneOf are not independent of each other without calling backtrack on them? I'm trying to think of a scenario where one doesn't want the backtracking behaviour in a oneOf. Do you have an example? Please don't take it as criticism, I trying to understand what the reason for this design is and where it might be useful.

I have a suggestion: The outdated README got me a bit confused, since the method-names didn't match my autocomplete-suggestions. I found out that you changed the naming of a lot of combinators between 0.2.0 and 0.3.x, but you still list 0.2.0 as the version to use in the libraryDependencies section of the README. It'd be cool if the code example would match the syntax of the suggested library-version.

from cats-parse.

johnynek commented on May 28, 2024

As an example, consider parsing Json. Each value is either a null, true, false, a number, string, array or object. By looking at a single character, you can determine which one. You don't need any backtracking. See:

cats-parse/bench/src/main/scala/cats/parse/bench/self.scala

Line 61 in 07a5ed6

P.oneOf(str :: num :: list :: obj :: bool :: pnull :: Nil)

You can very often structure parsers this way. See the theory around LL(1) or LL(k) in general: https://en.wikipedia.org/wiki/LL_parser

The problem with backtracking by default is that you have no way to distinguish "this doesn't match" e.g. expecting an array in json but finding a ", from "this could never match" which is something like seeing $ in the stream where you expect a json value.

If you are careful, you can usually structure your parsers to not require lookahead, or require a bounded amount of lookahead (which you can model in this library by using either .soft or .backtrack in a minimal number of places).

It would be really nice to have more complete document that goes into some common issues when writing recursive parsers, how to parse comments, examples using s-expressions, json, etc... If you have any interest in sending some examples and documentation on your example, that would be great. The documentation is compiled using mdoc so we can verify it compiles.

Lastly, 0.3.x hasn't been released yet, and that's why the README hasn't been updated. We plan to do that before we release.

from cats-parse.

FloWi commented on May 28, 2024

Ah, that makes sense. Unfortunately I can't determine the parser by only looking ahead one character.
Thanks for taking the time to explain it.

I can write an example usecase "remove the braces from an expression" that illustrates the behavior. Using cats-parse to parse and modify a grammar for a parser is a bit meta for a readme ;-)

I'll try to create a PR tomorrow and then we can see if the example makes sense.

(4+2)*6 ==> 4*6+2*6
6*(4+2) ==> 6*4+6*2
(4+2)   ==> 4+2

from cats-parse.

FloWi commented on May 28, 2024

I'm reading a book right now and they mentioned the build year of a house was written in roman numbers. That could be another (better?) example where backtracking might be necessary.

III ==> 3
 IV ==> 4
XIX ==> 19
 XX ==> 20

Let me know what you think. I'm a parser n00b and don't know if I come up with a good parser for that.

from cats-parse.

johnynek commented on May 28, 2024

When I have done operator parsing, I usually do it by Parsing a List[(Operator, Term)], and then I can do

val parser: Parser[Term] = {
  ...
  val nonOp: Parser[Term] = ... // this is anything BUT operators
  
  val withOp: Parser[Term] =
    (nonOp ~ (ws.soft ~ opTerm).rep)
      .map {
        case (t, Nil) => t // no operator
        case  (t, ops) => applyPrecedence(t, ops) // a function to correctly apply precedence
    }

  withOp
}

As to your question, I think it is related. You can consider additional characters in a roman number to be an operator on the previous. There is some peeking involved to see the next character is <= the current character or not.

BTW: if you want to use backtracking, it's not immoral or something. :) I'm not trying to stop you, but usually you can structure things in a way to minimize it, and doing so will usually improve performance and the pointers you can give users into where things failed.

I'm sure you've seen a parse error that points to the totally wrong place in the code because it got confused and fell back onto trying some other thing to parse. Backtracking often makes that kind of problem worse.

from cats-parse.

oneOf doesn't match although one parser in the list matches about cats-parse HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent