Comments (7)
Hi!
I rewrote my parser to do what I want without using oneOf like this:
def removeBracesForOrExpression(rule: String): String = {
// (a|b)c ==> ac|bc
// (a|b) ==> a|b
// c(a|b) ==> ca|cb
val term = P.charsWhile(_.isLetter)
val orTerm = P.repSep(term, 2, P.char('|'))
val bracedOrTerm = orTerm.between(P.char('('), P.char(')'))
val parser = (term.? ~ bracedOrTerm ~ term.?).map { case ((maybeLeft, orTerms), maybeRight) =>
orTerms
.map(orTerm => maybeLeft.getOrElse("") + orTerm + maybeRight.getOrElse(""))
.toList
.mkString("|")
}
val result = parser.parse(rule)
result.fold(error => throw new RuntimeException(error.toString), _._2)
}
test("removeBracesForOrExpression") {
assertEquals(removeBracesForOrExpression("(a|b)c"), "ac|bc")
assertEquals(removeBracesForOrExpression("c(a|b)"), "ca|cb")
assertEquals(removeBracesForOrExpression("(a|b)"), "a|b")
}
I struggle to write documentation for that stuff, since I only started using parsers yesterday. So, I guess I have to leave it to the parser (p|b)ro's. ;-)
from cats-parse.
Thanks for filing the issue and trying the library.
We have tests in the repo that verify the property we guarantee but it is always possible there is still some bug.
That said, my guess is this has to do with backtracking. This library does not backtrack by default. So, oneOf can only succeed if none of the items in the oneOf partially parse before failing.
If you want to check this: call .backtrack on each parser you pass into oneOf. If my guess is right, things should work.
Not backtracking by default requires you to think a bit more about how to write your parsers, but the payoff is the parse errors point to the correct offset of the parse failure and you generally get much better performance than parsers that overuse backtracking.
Lastly keep in mind oneOf is order sensitive due to this property. It may be that you can fix your problem just by reordering the arguments so that parsers you can check if they match without consuming input come first.
Hope this helps.
from cats-parse.
Thanks for filing the issue and trying the library.
My pleasure. I like the design of it. Feels a bit more lightweight than Haoyi's fastparse.
Adding backtrack to each of the parsers works indeed. Would you mind elaborating that a bit?
Does it mean the individual parsers that go into oneOf
are not independent of each other without calling backtrack on them? I'm trying to think of a scenario where one doesn't want the backtracking behaviour in a oneOf. Do you have an example? Please don't take it as criticism, I trying to understand what the reason for this design is and where it might be useful.
I have a suggestion: The outdated README got me a bit confused, since the method-names didn't match my autocomplete-suggestions. I found out that you changed the naming of a lot of combinators between 0.2.0 and 0.3.x, but you still list 0.2.0
as the version to use in the libraryDependencies
section of the README. It'd be cool if the code example would match the syntax of the suggested library-version.
from cats-parse.
As an example, consider parsing Json. Each value is either a null, true, false, a number, string, array or object. By looking at a single character, you can determine which one. You don't need any backtracking. See:
You can very often structure parsers this way. See the theory around LL(1) or LL(k) in general: https://en.wikipedia.org/wiki/LL_parser
The problem with backtracking by default is that you have no way to distinguish "this doesn't match" e.g. expecting an array in json but finding a "
, from "this could never match" which is something like seeing $
in the stream where you expect a json value.
If you are careful, you can usually structure your parsers to not require lookahead, or require a bounded amount of lookahead (which you can model in this library by using either .soft or .backtrack in a minimal number of places).
It would be really nice to have more complete document that goes into some common issues when writing recursive parsers, how to parse comments, examples using s-expressions, json, etc... If you have any interest in sending some examples and documentation on your example, that would be great. The documentation is compiled using mdoc so we can verify it compiles.
Lastly, 0.3.x hasn't been released yet, and that's why the README hasn't been updated. We plan to do that before we release.
from cats-parse.
Ah, that makes sense. Unfortunately I can't determine the parser by only looking ahead one character.
Thanks for taking the time to explain it.
I can write an example usecase "remove the braces from an expression" that illustrates the behavior. Using cats-parse to parse and modify a grammar for a parser is a bit meta for a readme ;-)
I'll try to create a PR tomorrow and then we can see if the example makes sense.
(4+2)*6 ==> 4*6+2*6
6*(4+2) ==> 6*4+6*2
(4+2) ==> 4+2
from cats-parse.
I'm reading a book right now and they mentioned the build year of a house was written in roman numbers. That could be another (better?) example where backtracking might be necessary.
III ==> 3
IV ==> 4
XIX ==> 19
XX ==> 20
Let me know what you think. I'm a parser n00b and don't know if I come up with a good parser for that.
from cats-parse.
When I have done operator parsing, I usually do it by Parsing a List[(Operator, Term)]
, and then I can do
val parser: Parser[Term] = {
...
val nonOp: Parser[Term] = ... // this is anything BUT operators
val withOp: Parser[Term] =
(nonOp ~ (ws.soft ~ opTerm).rep)
.map {
case (t, Nil) => t // no operator
case (t, ops) => applyPrecedence(t, ops) // a function to correctly apply precedence
}
withOp
}
As to your question, I think it is related. You can consider additional characters in a roman number to be an operator on the previous. There is some peeking involved to see the next character is <= the current character or not.
BTW: if you want to use backtracking, it's not immoral or something. :) I'm not trying to stop you, but usually you can structure things in a way to minimize it, and doing so will usually improve performance and the pointers you can give users into where things failed.
I'm sure you've seen a parse error that points to the totally wrong place in the code because it got confused and fell back onto trying some other thing to parse. Backtracking often makes that kind of problem worse.
from cats-parse.
Related Issues (20)
- A parser string interpolator
- Backtracking with context HOT 1
- Rep same char as surrounded by HOT 5
- idea for safe repetition on Parser0
- RadixNode and stringIn are a bit slow
- Is it possible to support Scala Native? HOT 6
- alternate design for voiding
- use java.util.BitSet on scalajs
- `p1.? ~ p2` != `(p1 ~ p2) | p2` for error reporting purposes HOT 9
- Remove isScalaJs/isScalaJvm on minor version update
- identifier parser HOT 2
- flakey test: X cats.parse.ParserTest.a.flatMap(b) composes as expected parser00
- add json string parsing
- add a withString combinator
- Should `Parser` have a typeclass? HOT 2
- set up github CI to run benchmarks on release
- investigate scalacheck failure 20230625T112756
- Link errors when crossbuilding for scalaJS or scalaNative HOT 3
- Recursion with `Parser0` HOT 2
- investigate scalacheck failure 20231113
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cats-parse.