Code Monkey home page Code Monkey logo

cats-parse's People

Contributors

ankitson avatar armanbilge avatar colin-m-davis avatar denisnovac avatar ghostdogpr avatar hugo-vrijswijk avatar i10416 avatar johnynek avatar larsrh avatar lenguyenthanh avatar m-combinator avatar martijnhoekstra avatar masseguillaume avatar mio-19 avatar mpilquist avatar non avatar odomontois avatar oguzhanunlu avatar regadas avatar rossabaker avatar satabin avatar scala-steward avatar slakah avatar stephenjudkins avatar typelevel-steward[bot] avatar vasilmkd avatar vlachjosef avatar xuwei-k avatar zmccoy avatar zsluedem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cats-parse's Issues

1's everywhere are cumbersome

This issue may be a bit premature/pretentious to file as an issue, but I figured it would be good to get it out early. The safety that Parser1 and its combinators are great. So great, that while trying out the library, I find I use them pretty much all the time. Using a Parser or a method that creates a Parser is an exception.

Has it been considered doing the other way around, renaming Parser to Parser0, and Parser1 to Parser, along with all 1 methods on object Parser? That would make the potentially non-consuming Parsers the exception not only in practice but also in naming.

add a FAQ entry on using fix to parse recursive structures

see:

https://matrix.to/#/!qrcPEbYoUyqhEvxImO:gitter.im/$AoSsgoWnlpnJXPAnAOyk9j49y2CFbAhUB4Ed9PmxCwo?via=gitter.im&via=matrix.org

The main issue is that if you have do this pattern:

Defer[P1].fix[Ast] { self =>
  ...

  P.oneOf1(a :: b :: c ...
}

then you have to make sure that all of a, b, c, etc... can make some progress without first using self.

For instance, you need to put all your constants first in that list. Parsing operators is its own item in the FAQ. Generally what I like to do is parse a list of (Operator, Item). so you do: (parseAtom ~ postOp.rep).map { case (a, fs) => fs.fold(a) { case (a0, (op, a1)) => addOp(a0, op, a1)`

keeping in mind operator precedence, etc... that's a whole other faq...

test cats laws

I think we are currently testing all the laws these typeclasses require, but we aren't using the cats laws package.

It might be nice to use those just to be 100% sure everything is fully lawful.

Different way of combining Parser0 and Parser

Hi all!
I'm just trying to port scala-uri from parboiled2 to cats-parse.
One thing I find rather confusing is the with1 method to make a Parser0 behave like a Parser.
I'm wondering if a different way to encode this could work, e.g.

trait LowPriorityImplicits {

  implicit class RichParser0(parser: Parser0) {
    def ~(other: Parser0): Parser0 = ???
  }
}

object Parser0 extends LowPriorityImplicits {
  implicit class EvenRicherParser(parser: Parser0) {
    def ~(other: Parser): Parser = ???
  }
}

val foo0: Parser0 = null
val foo: Parser = null
val concatted: Parser = (foo0 ~ foo)

Consistency of return types

Spun off from an http4s issue.

  • char returns a Parser1[Unit]. We know what it captured, so we don't return it.
  • charIn returns a Parser1[Char]. It captured one of a set of characters, and we want to know which.
  • ignoreChar returns a Parser1[Unit]. Arguably like charIn in that we don't know what it cpatured, but the docs tell us to call .string if we need the result.
  • string1 returns a Parser[Unit]. Seems consistent with .char.

Did we get this right, or should everything return what it captured?

oneOf doesn't match although one parser in the list matches

Hi!

I encountered a problem when using oneOf.
To make it easier to communicate I created a test that fails. You can find it here:
https://github.com/FloWi/cats-parse/blob/oneOfError/core/shared/src/test/scala/cats/parse/OneOfTest.scala

When you run the failing test, you see that the last parser in the list succeeds, but the oneOf-parser, that uses those parsers, still fails.
sbt "testOnly *OneOfTest*"
I tried to write a parser that simplifies an expression of a grammar.

    // (a|b)a --> aa|ba
    // a(a|b) --> aa|ab
    //  (a|b) -->  a|b

I'm quite new to parsers and have to idea, if this is a bug or if I did something wrong - AdventOfCode brought me to this rabbit-hole :)

Add a way to make a fully generally parser

users should be able to give us (String, Int) => Either[Error, (Int, A)] for cases where they can't express their parsing in terms of the core combinators.

Then we would at runtime check if the Int is >= the input offset, and it it is return, else report an InvariantViolationError or something for the parser (which I guess would be an epsilon error), or potentially throw an exception, since users should not be able to recover from errors like that...

An alternative is just let all bets be off, and not check that the returned Index makes sense, and just let users live with the consequences.

This is a can of worms, and maybe we should avoid adding such a function.

Add a fluent API for repeating with various options.

There are various ways to repeat stuff. With or without a separator, with a minimum number of repetitions, allowing 0 repetitions or not, gathering into some accumulator. There is also a ticket open to add a maximum: #97

That gives rise to a lot of different combinations for repeating parser constructors, with opportunity for inconsistency and not having some specific combinations of concerns.

What do you think about adding a fluent API starting from rep or rep0, and then adding combinators for min, max, separator and accumulator?

Problem with Maven RC2 release?

I got the following message trying to open Rfc5234 in IntelliJ

Error reading TASTy file: /Users/hjs/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/typelevel/cats-parse_3.0.0-RC2/0.3-3-d801d0a/cats-parse_3.0.0-RC2-0.3-3-d801d0a.jar!/cats/parse/Rfc5234.tasty

I tried other versions of cats-parse for RC2 from maven central with the same result.

As this is a bit bleeding edge it could be IntelliJ or the deployed library. Just thought I'd ping you here.

Parser0[Option[_]] ~ Parser[_] could not get (Option(_), _)

  import cats.parse.{Parser0, Parser => P, Numbers}
  private def name:P[String] = P.charIn(('a' to 'z')).rep.string
  private def alias: P[String] = name <* P.char(':')
  (alias.? ~ name).parse("abc")

the code will return Left(Error(3,NonEmptyList(InRange(3,:,:)))) , It should be Right(_, (None,abc)).
I tried (alias.?.backtrack ~ name) , (alias.?.soft ~ name), both does not work.

Bug with length restrictions on `rep`?

I am implementing RFC 8941 and noticed testing my code that length restrictions don't quite seem to work as I expected.

val signedDecIntegral: P[String] = 
   (P.char('-').?.with1 ~ digits.rep(1,12)).map { 
    case (min, i) =>
        min.map(_ => "-").getOrElse("")+i.toList.mkString
  }
val decFraction: P[String] = digits.rep(1,3).string
val sfDecimal: P[(String,String)] =
   (signedDecIntegral ~ (P.char('.') *> decFraction)).map { 
      case (dec: String,frac: String) =>
        (dec,frac.toList.mkString) //todo: keep the non-empty list?
     }

It seems like the length restrictions don't cause the expected errors

scala> decFraction.parse("12312323")
val res17: Either[cats.parse.Parser.Error,(String, String)] = Right((,12312323))
scala> sfDecimal.parseAll("12345678901234567890.22222")
val res18: Either[cats.parse.Parser.Error,(String, String)] = Right((12345678901234567890,22222))

Bug? - surroundedBy with optional whitespace fails to parse

`
/**
Demonstrates a possible bug with cats-parse.

Parses a parenthesized word list  (foo, bar, x, y),
 but fails to parse if a space precedes the final ')'.

cats-parse version 0.3.2
Scala version      3.0.0-RC2

*/

import cats.parse.Parser=>P

@main def main():Unit =

// Specific types of characters.
val whitespace = P.charIn( " \r\t\n")
val letter = P.charIn('a' to 'z')
val comma = P.char(',')
val lParen = P.char('(')
val rParen = P.char(')')

// For testing, a lowercase word.
val word = letter.rep.string

// Allow optional spaces around the list characters -  ( , )
val whitespaces0 = whitespace.rep0.void
val listStart = lParen.surroundedBy(whitespaces0).void
val listEnd = rParen.surroundedBy(whitespaces0).void
val listSeparator = comma.surroundedBy(whitespaces0).void

// Define a parenthesized list of words ... eg. (foo, bar, x, y)
val wordList = listStart ~ word.repSep0(listSeparator) ~ listEnd

// This wordlist parses fine.
val result1 = wordList.parseAll("(foo, bar, x, y)")
assert(result1.isRight)

// PROBLEM: If a space precedes the final ')', then it fails.
val result2 = wordList.parseAll("(foo, bar, x, y )")
assert(result2.isRight)

`

rep with min and max

Related to #52, question was raised about parsers that repeat a min and a max number of times. Here's a real-world case from RFC7321, which has a precision to the thousandths:

     qvalue = ( "0" [ "." 0*3DIGIT ] )
            / ( "1" [ "." 0*3("0") ] )

It's not real common, but they're out there.

Why is Parser0#repSep not possible?

Hi all,

I'm trying to port scala-uri to cats-parse.
One difficulty I'm facing is with parsing path parts including empty ones:

a/b/c
/a/b/c
a//c
/a//c

What I would like to do is something like this (simplified)

  def _path_segment: Parser0[String] = Parser.until0(charIn("/?#"))
  def _path: Parser[String] = (Parser.char('/').? ~ _path_segment.repSep0(char('/')).string

but this doesn't work, as Parser0 doesn't have repSep or repSep0 defined.
I understand that Parser0#rep is problematic, as one could easily run the empty parser infinitely, but in my naive thinking this problem shouldn't exist with repSep, right?

Or is there a nicer way to do this in cats-parse?

add a short note about how to do a release

each scala repo has a somewhat bespoke way to do a release.

Since many of us contribute on many repos, it is easy to lose track of how each works. Let's add a short md doc that explains the steps in a list.

There is the #16 plugin that drafts releases, and there is the auto publishing, and then there is the question of setting version numbers. Lastly, some repos need the mima versions to compare to updated. I'm not 100% what we need in this repo.

cc @mpilquist

implement some standard utility parsers

things like BigInt, Int, Long, Float, etc...

Also things like standard whitespace, bracketed lists, etc...

It should be possible to write a JSON parser with a pretty minimal combination of these. This helps people learn patterns but also lets us really optimize some basic blocks that people will almost always need.

Add some common cats methods directly

.as from Functor and maybe .replicateA from Applicative are useful enough that adding those methods on Parser and Parser1 might help discovery.

Since users with IDEs often rely on autocomplete, this can be useful for them.

design for scoping parsers for error reporting

Currently, when a parser fails you get a nonempty list of offsets and failed expectations.

We could also add "scope" wrapper, like: number.scope("number").orElse(str.scope("string")). How this would work is the mutable State would have a stack of these scopes, and we have:

case class ScopeParser[A](parser: Parser[A], scope: String) extends Parser[A]

in parseMut we would push the current scope onto the stack, then parse with parser, then pop it off the stack.

When we error, we take a snapshot of the current scope stack.

So, if we do this, users can have an easier time labeling parts of their parsers and seeing where things went wrong.

Fastparse does something similar.

What do you think of this design @mpilquist @non @rossabaker and really anyone who cares to comment.

No default min value on rep(1)sep

Repsep is 0 or more and rep1sep is 1 or more, but you still have to pass in a minimum, which feels unnatural, especially for repsep where the minimum is implied to be 0 anyway. Default values of 0 or 1, or overloads to the same effect would be useful.

Detect start of line

I'm trying to parse a format where different parts are separated by <start_of_line>#### fragment and so, I would like to be able to detect the <start_of_line>.

IMHO the logic should be similar to P.start | <prev_char = '\n'>.

I'm not sure if that matters, but I'm trying to parse Intellij HTTP client file format with an explicit requirement of supporting ### in the first line, so for example

###
// A basic request
http://example.com/a/

###

// A second request using the GET method
http://example.com:8080/api/html/get?id=123&value=content

Parser.oneOf could use expanded documentation of when order matters

It appears that order matters for Parser.oneOf, in cases where one parser accepts a subset of another parser.

This makes sense, however it might be good to explicitly mention in the docs the implication this has for generating parsers for a set of String values - specifically that they should be reverse sorted according to length, because it's very easy to create inconsistent parsers if the input isn't correctly prepared.

For example, parsing a truthy value for true will consistently work (or fail) depending on the order of the parsers:

import cats.parse.{Parser => P}

val buggy: P[Boolean] =
  P.oneOf(List("1", "t", "tru", "yes", "true").map(P.string(_)))
    .void
    .as[Boolean](true)

val works: P[Boolean] =
  P.oneOf(List("true", "yes", "tru", "t", "1").map(P.string(_)))
    .void
    .as[Boolean](true)

def sort(strings: List[String]): List[String] =
  strings
    .map(str => (str.length, str))
    .sorted
    .reverse
    .map(_._2)

val sorted: P[Boolean] =
  P.oneOf(sort(List("1", "t", "tru", "yes", "true")).map(P.string(_)))
    .void
    .as[Boolean](true)

List("1", "t", "tru", "true", "y")
  .foreach { input =>
    println {
      """|%-6s => %6s => %s
         |%-6s => %6s => %s
         |%-6s => %6s => %s
         |""".stripMargin.format(
           s"<$input>", "buggy", buggy.parseAll(input),
           "", "works", works.parseAll(input),
           "", "sorted", sorted.parseAll(input)
         )
    }
  }

This is particularly troublesome because the error looks like an unexpected end of string, rather than an expected end of string that didn't happen:

Left(Error(1,NonEmptyList(EndOfString(1,3))))

Scastie

Add a way to modify error messages

Some of the error messages produce results that are either hard to render, or not particularly clear. This could be improved by providing the ability to replace or map over the error.

For example, if we have this parser(which is equivalent to -\s-):

import cats.parse.{Parser => P}
val parser = P.charWhere(_.isWhitespace).surroundedBy(P.char('-'))

List(
  "- -",
  "-t-",
  "--"
).foreach { input =>
  println("%-10s \t => %s".format(s""""$input"""", parser.parseAll(input)))
}

We get errors that aren't terribly readable:

"- -"      	 => Right( )
"-t-"      	 => Left(Error(1,NonEmptyList(InRange(1,	,
), InRange(1,, ), InRange(1, , ), InRange(1,᠎,᠎), InRange(1, , ), InRange(1, , ), InRange(1,
,
), InRange(1, , ), InRange(1, , ))))
"--"       	 => Left(Error(1,NonEmptyList(InRange(1,	,
), InRange(1,, ), InRange(1, , ), InRange(1,᠎,᠎), InRange(1, , ), InRange(1, , ), InRange(1,
,
), InRange(1, , ), InRange(1, , ))))

It would be handy to do something like fastparse's opaque:

parser.opaque("whitespace")

Or a lower level map over the errors:

parser.leftMap { error =>
  case InRange(index, _) => FailWith(index, "whitespace")
  case unexpected => unexpected
}

Which could produce errors like this:

"- -"      	 => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))
"-t-"      	 => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))
"--"       	 => Left(Error(1,NonEmptyList(FailWith(1,whitespace))))

Failing test cases in main

I'm seeing two tests failing:

  • cats.parse.ParserTest.voided only changes the result
  • cats.parse.ParserTest.with1 *> and with1 <* work as expected

The errors have the form:

values are not the same
=> Diff (- obtained, + expected)
           upper = 'ﷷ'
+        ),
+        Fail(
+          offset = 0
         )

To reproduce add the following to ParserTest:

override val scalaCheckInitialSeed = "SDzb3fKPxR67aeO2sgq4BlvTm5NphF9OM4j-dSIS9RD="

I haven't dug into this -- just figured I should report it ASAP.

It's hard to run targeted tests

Because ParserTest contains so many tests, making isolated changes and quickly testing with testOnly or testQuick is slower than you'd ideally want it to be.

Splitting up ParserTest would enable a tighter test loop. Are you OK with that?

How to write recursive parsers?

How do I write recursive parsers? The following fails with a StackOverflowError. I assume somewhere I don't have something tail-recursive, but I'm not sure how to write this any differently. (Both op and condexp fail similarly below.)

package foo

import cats.parse.{Parser0, Parser, Numbers}
import cats.syntax.all._
import scala.language.postfixOps

sealed class Expr
case class Lit(x: Int) extends Expr
case class Op(left: Expr, op: String, right: Expr) extends Expr
case class Cond(cond: Expr, tr: Expr, fl: Expr) extends Expr

object testrecurse {
    import Parser._

    def expr: Parser[Expr] = recursive[Expr] { recurse =>
        def subexpr = recurse.between(char('('), char(')'))
        def lit = Numbers.digits.map(_.toInt).map(Lit(_))
//        def condexp = ((recurse <* char('?')) ~ recurse ~ (char(':') *> recurse))
//            .map { case ((cond, tr), fl) => Cond(cond, tr, fl) }           
        def op = (recurse, stringIn(List("+", "-", "*", "/")), recurse)
            .mapN(Op(_, _, _))

        oneOf(subexpr :: op :: lit :: Nil)
    }

    def main(args: Array[String]): Unit = {
        //val expr = "1?(5):2"
        val expr = "(5+3)/2"
        println(testrecurse.expr.parse(expr))
    }
}

CPU time spent in hashcode computation

Hi,
We are currently trying to replace fastparse by cats-parse and ran into the issue that parsing became 3 to 30x slower than it was with fastparse. Using a profiler, I saw that most of the CPU is actually spent on calculating hashcode, triggered by the use of .distinct in oneOf. See the following screenshots:

Screen Shot 2021-04-10 at 3 07 25 PM

Screen Shot 2021-04-10 at 3 09 53 PM

Do you think this is caused by a wrong use of cat-parse, or is it something that can improved in cats-parse itself? Parser code can be found here if it helps.

add ability to parse substrings

Currently, we can only parse from entire strings. It would be nice to be able to parse a string at a given offset and only up to a given length.

This would allow you to parse the inside of a string that might be provided by another process without having to copy.

Should be as simple as updating State.

Is it possible to implement a non-greedy repeat?

I'm trying to implement a parser that repeats parser p1 until the rest of the string matches parser p2.
My current solution is this, but it's not really elegant and needs the input string to work.

  def repeatUntil2ndParserMatches(input: String, p1: P[String], p2: P[String], maxRepetitions: Int = 100): P[String] = {
    import cats.syntax.applicative._
    LazyList
      .range(1, maxRepetitions)
      .flatMap { i =>
        println(s"trying p1 $i times")
        val newP1 = p1.backtrack.replicateA(i)
        val p1Result = newP1.parse(input)
        p1Result match {
          case Left(_) =>
            List(P.fail)

          case Right((rest, _)) =>
            val p2Result = p2.backtrack.parse(rest)
            p2Result match {
              case Left(_) => List.empty
              case Right(_) =>
                List((newP1 ~ p2).map { case (list, res2) => list.appended(res2) }.map(_.mkString))
            }
        }
      }
      .headOption match {
        case Some(value) => value
        case None        => P.fail
    }
  }

Can this be done more elegantly?
I don't know if that is a common use-case in the parser world, but do know that regex groups can be made non-greedy.
I saw #128 where repetition is being discussed - maybe that is something others might find useful.

Maybe it'd be helpful if there was a combinator similar to flatMap that provides the tuple of (remainder, matched).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.