Code Monkey home page Code Monkey logo

Comments (7)

FloWi avatar FloWi commented on May 28, 2024 1

Hi!
I rewrote my parser to do what I want without using oneOf like this:

  def removeBracesForOrExpression(rule: String): String = {
    //  (a|b)c ==> ac|bc
    //  (a|b)  ==> a|b
    // c(a|b)  ==> ca|cb

    val term = P.charsWhile(_.isLetter)
    val orTerm = P.repSep(term, 2, P.char('|'))
    val bracedOrTerm = orTerm.between(P.char('('), P.char(')'))

    val parser = (term.? ~ bracedOrTerm ~ term.?).map { case ((maybeLeft, orTerms), maybeRight) =>
      orTerms
        .map(orTerm => maybeLeft.getOrElse("") + orTerm + maybeRight.getOrElse(""))
        .toList
        .mkString("|")
    }
    val result = parser.parse(rule)

    result.fold(error => throw new RuntimeException(error.toString), _._2)

  }

  test("removeBracesForOrExpression") {
    assertEquals(removeBracesForOrExpression("(a|b)c"), "ac|bc")
    assertEquals(removeBracesForOrExpression("c(a|b)"), "ca|cb")
    assertEquals(removeBracesForOrExpression("(a|b)"), "a|b")
  }

I struggle to write documentation for that stuff, since I only started using parsers yesterday. So, I guess I have to leave it to the parser (p|b)ro's. ;-)

from cats-parse.

johnynek avatar johnynek commented on May 28, 2024

Thanks for filing the issue and trying the library.

We have tests in the repo that verify the property we guarantee but it is always possible there is still some bug.

That said, my guess is this has to do with backtracking. This library does not backtrack by default. So, oneOf can only succeed if none of the items in the oneOf partially parse before failing.

If you want to check this: call .backtrack on each parser you pass into oneOf. If my guess is right, things should work.

Not backtracking by default requires you to think a bit more about how to write your parsers, but the payoff is the parse errors point to the correct offset of the parse failure and you generally get much better performance than parsers that overuse backtracking.

Lastly keep in mind oneOf is order sensitive due to this property. It may be that you can fix your problem just by reordering the arguments so that parsers you can check if they match without consuming input come first.

Hope this helps.

from cats-parse.

FloWi avatar FloWi commented on May 28, 2024

Thanks for filing the issue and trying the library.

My pleasure. I like the design of it. Feels a bit more lightweight than Haoyi's fastparse.

Adding backtrack to each of the parsers works indeed. Would you mind elaborating that a bit?
Does it mean the individual parsers that go into oneOf are not independent of each other without calling backtrack on them? I'm trying to think of a scenario where one doesn't want the backtracking behaviour in a oneOf. Do you have an example? Please don't take it as criticism, I trying to understand what the reason for this design is and where it might be useful.

I have a suggestion: The outdated README got me a bit confused, since the method-names didn't match my autocomplete-suggestions. I found out that you changed the naming of a lot of combinators between 0.2.0 and 0.3.x, but you still list 0.2.0 as the version to use in the libraryDependencies section of the README. It'd be cool if the code example would match the syntax of the suggested library-version.

from cats-parse.

johnynek avatar johnynek commented on May 28, 2024

As an example, consider parsing Json. Each value is either a null, true, false, a number, string, array or object. By looking at a single character, you can determine which one. You don't need any backtracking. See:

P.oneOf(str :: num :: list :: obj :: bool :: pnull :: Nil)

You can very often structure parsers this way. See the theory around LL(1) or LL(k) in general: https://en.wikipedia.org/wiki/LL_parser

The problem with backtracking by default is that you have no way to distinguish "this doesn't match" e.g. expecting an array in json but finding a ", from "this could never match" which is something like seeing $ in the stream where you expect a json value.

If you are careful, you can usually structure your parsers to not require lookahead, or require a bounded amount of lookahead (which you can model in this library by using either .soft or .backtrack in a minimal number of places).

It would be really nice to have more complete document that goes into some common issues when writing recursive parsers, how to parse comments, examples using s-expressions, json, etc... If you have any interest in sending some examples and documentation on your example, that would be great. The documentation is compiled using mdoc so we can verify it compiles.

Lastly, 0.3.x hasn't been released yet, and that's why the README hasn't been updated. We plan to do that before we release.

from cats-parse.

FloWi avatar FloWi commented on May 28, 2024

Ah, that makes sense. Unfortunately I can't determine the parser by only looking ahead one character.
Thanks for taking the time to explain it.

I can write an example usecase "remove the braces from an expression" that illustrates the behavior. Using cats-parse to parse and modify a grammar for a parser is a bit meta for a readme ;-)

I'll try to create a PR tomorrow and then we can see if the example makes sense.

(4+2)*6 ==> 4*6+2*6
6*(4+2) ==> 6*4+6*2
(4+2)   ==> 4+2

from cats-parse.

FloWi avatar FloWi commented on May 28, 2024

I'm reading a book right now and they mentioned the build year of a house was written in roman numbers. That could be another (better?) example where backtracking might be necessary.

III ==> 3
 IV ==> 4
XIX ==> 19
 XX ==> 20

Let me know what you think. I'm a parser n00b and don't know if I come up with a good parser for that.

from cats-parse.

johnynek avatar johnynek commented on May 28, 2024

When I have done operator parsing, I usually do it by Parsing a List[(Operator, Term)], and then I can do

val parser: Parser[Term] = {
  ...
  val nonOp: Parser[Term] = ... // this is anything BUT operators
  
  val withOp: Parser[Term] =
    (nonOp ~ (ws.soft ~ opTerm).rep)
      .map {
        case (t, Nil) => t // no operator
        case  (t, ops) => applyPrecedence(t, ops) // a function to correctly apply precedence
    }

  withOp
}

As to your question, I think it is related. You can consider additional characters in a roman number to be an operator on the previous. There is some peeking involved to see the next character is <= the current character or not.

BTW: if you want to use backtracking, it's not immoral or something. :) I'm not trying to stop you, but usually you can structure things in a way to minimize it, and doing so will usually improve performance and the pointers you can give users into where things failed.

I'm sure you've seen a parse error that points to the totally wrong place in the code because it got confused and fell back onto trying some other thing to parse. Backtracking often makes that kind of problem worse.

from cats-parse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.