h0tk3y / better-parse Goto Github PK

A nice parser combinator library for Kotlin

License: Apache License 2.0

Kotlin 100.00%

kotlin parser parser-combinator dsl grammar language syntax-trees

better-parse's Introduction

About me

val h0tk3y = developer {
    about {
        name = "Sergey"
        role = Developer
    }
    tech("Kotlin", "Gradle", "Java", "JVM", "compilers")
    links {
        twitter = "h07k3y"
        vk = "h0tk3y"
    }
}

better-parse's People

Contributors

Stargazers

Watchers

better-parse's Issues

implementing and

how would i design the and of a parser combinator so that i do not need to create an infinite number of classes for it
example (probably incorrect)

/*
val A = F1("1")
val B = F1("2")
val C = F1("3")

val C: F1AndF1 = A.and(B)
val D: F1AndF1 = B.and(D)

val E: F1AndF1AndF2AndF2 = C.and(D)

// ...

*/
class F1(left: Any) {
    // F1 And F1
    infix fun and(right: F1)  = F1AndF1(this, right)
}
class F1AndF1(left: F1, right: F1) {
    // F1AndF1 And F1AndF1
    infix fun and(right: F1AndF1)  = F1AndF1AndF2AndF2(this, right)
}
class F1AndF1AndF2AndF2(left: F1AndF1, right: F1AndF1) {
    // F1AndF1AndF2AndF2 And F1AndF1AndF2AndF2
    infix and(right: F1AndF1AndF2AndF2)  = F1AndF1AndF2AndF2AndF3AndF3AndF4AndF4(this, right)
}
// and so on for ever

RegexToken broken with latest js backend

Using the latest Kotlin version (1.60.20) better-parse doesn't work with js backend.
Using the example BooleanGrammar parsing always results in exception:

kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:1744 Uncaught TypeError: Cannot read properties of undefined (reading 'source')
    at preprocessRegex$outlinedJsCode$ (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:1744:1)
    at preprocessRegex (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:1691:1)
    at RegexToken_init_$Init$ (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:1705:1)
    at RegexToken_init_$Create$ (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:1709:1)
    at regexToken (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:908:1)
    at regexToken$default (kotlin_com_github_h0tk3y_betterParse_better_parse.js?6388:913:1)
    at new BooleanGrammar (CharacterCompanion.js?f640:512:1)
    at BooleanGrammar_getInstance (CharacterCompanion.js?f640:683:1)

Problem is caused by the inline js code of preprocessRegex.
It tries to access _nativePattern/nativePattern_0 of Kotlin js Regex class, but they are undefined.
Changing it to nativePattern_1 solves the issue

Building parsers for fixed-width fields

Let's say I have a String containing numbers that I want to convert into a data class. The fields are of fixed width. How should I go about building a grammar that can achieve this without having the different tokens cause issues with each other?

At the moment I'm seeing a MismatchedToken exception.

Reproducible sample case

data class Line(
    val fieldA: Int,
    val fieldB: Int,
    val fieldC: Int
)

object LineGrammar : Grammar<Line>() {
    private val fieldA by token("[0-9]{2}")
    private val fieldB by token("[0-9]{5}")
    private val fieldC by token("[0-9]{3}")

    private val line =
        fieldA and fieldB and fieldC

    override val rootParser: Parser<Line> =
        line.map { (a, b, c) ->
            Line(
                fieldA = a.text.toInt(),
                fieldB = b.text.toInt(),
                fieldC = c.text.toInt()
            )
        }
}

val line = "1234567891"
val result = LineGrammar.parseToEnd(line)
// result should be Line(fieldA = 12, fieldB = 34567, fieldC = 891)
//
// Instead seeing the following exception
// Could not parse input: MismatchedToken(expected=fieldB ([0-9]{5}), found=fieldA for "34" at 2 (1:3))
// com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: MismatchedToken(expected=fieldB ([0-9]{5}), found=fieldA for "34" at 2 (1:3))
// 	at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:70)
// 	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:30)
// 	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:69)

Make it buildable in jitpack

Because better-parse doesn't built well by jitpack, I'm forced to use the bintray's version. For that I need to always add additional repository declaration in gradle config. Because I'm building a library depending on better-parse, all my users are ALSO have to add additional repository in their configs.
This is awkward.

Парни, пожалуйста.

Patterns with flags are broken (again)

With the changes introduced in 0.3.3 (and 0.3.4), patterns created with flags are broken again (similar to #7), because they are combined using their serialized form.

Example:

private val newCommentLine by token("^\\s*--".toRegex(RegexOption.MULTILINE), ignore = true)

If patterns are transported using a serialized form, that string form must contain their flags, e.g.:

(?m:^\s*--)

Otherwise, as I said in #7, the Token factories taking a pattern should better be removed altogether.

If you decide to further support Tokens with Patterns, could you please add a test for patterns that have flags like MULTILINE or such, to prevent such breakage in the future?

"null cannot be cast to non-null type" when parsing generates > 16 token matches

This might be a duplicate of #29. But there's no solution in that issue, and it's almost three years old.

Using Kotlin 1.8.20, and better-parse 0.4.4, this code (which is supposed to parse an Android resource qualifier string, like values-en-rGB-land-v28) has a runtime error:

data class MobileCodes(val mcc: String, val mnc: String? = null)

data class Locale(val lang: String, val region: String? = null, val script: String? = null)

data class ConfigurationQualifier(val mobileCodes: MobileCodes? = null, val locale: Locale? = null)

/**
 * Parse an Android `values-*` resource directory name and extract the configuration qualifiers
 *
 * Directory name has the following components in a specific order, listed in
 * https://developer.android.com/guide/topics/resources/providing-resources#table2
 */
class ValuesParser : Grammar<ConfigurationQualifier>() {
    // Tokenizers
    private val values by literalToken("values")
    private val sep by literalToken("-")
    private val mobileCodes by regexToken("(?i:mcc\\d+)(?i:mnc\\d+)?")
    private val locale by regexToken("(?i:[a-z]{2,3})(?i:-r([a-z]{2,3}))?(?=-|$)")
    private val bcpStartTag by regexToken("(?i:b\\+[a-z]{2,3})")
    private val bcpSubtag by regexToken("(?i:\\+[a-z]+)")

    private val layoutDirection by regexToken("(?i:ldrtl|ldltr)")
    private val smallestWidth by regexToken("(?i:sw\\d+dp)")
    private val availableDimen by regexToken("(?i:[wh]\\d+dp)")
    private val screenSize by regexToken("(?i:small|normal|large|xlarge)")
    private val screenAspect by regexToken("(?i:long|notlong)")
    private val roundScreen by regexToken("(?i:round|notround)")
    private val wideColorGamut by regexToken("(?i:widecg|nowidecg)")
    private val highDynamicRange by regexToken("(?i:highdr|lowdr)")
    private val screenOrientation by regexToken("(?i:port|land)")
    private val uiMode by regexToken("(?i:car|desk|television|appliance|watch|vrheadset)")
    private val nightMode by regexToken("(?i:night|notNight)")
    private val screenDpi by regexToken("(?i:(?:l|m|h|x|xx|xxx|no|tv|any|\\d+)dpi)")
    private val touchScreen by regexToken("(?i:notouch|finger)")
    private val keyboardAvailability by regexToken("(?i:keysexposed|keyshidden|keyssoft)")
    private val inputMethod by regexToken("(?i:nokeys|qwerty|12key)")
    private val navKeyAvailability by regexToken("(?i:naxexposed|navhidden)")
    private val navMethod by regexToken("(?i:nonav|dpad|trackball|wheel)")
    private val platformVersion by regexToken("(?i:v\\d+)")

    // Parsers
    private val mobileCodesParser by mobileCodes use {
        val parts = this.text.split("-")
        MobileCodes(mcc = parts[0], mnc = parts.getOrNull(1))
    }

    private val localeParser by locale use {
        val parts = this.text.split("-r".toRegex(), 2)
        Locale(lang = parts[0], region = parts.getOrNull(1))
    }

    private val bcpLocaleParser = bcpStartTag and zeroOrMore(bcpSubtag) use {
        Locale(
            lang = this.t1.text.split("+")[1],
            script = this.t2.getOrNull(0)?.text?.split("+")?.get(1),
            region = this.t2.getOrNull(1)?.text?.split("+")?.get(1)
        )
    }

    private val qualifier = skip(values) and
        optional(skip(sep) and mobileCodesParser) and
        optional(skip(sep) and (localeParser or bcpLocaleParser)) and
        optional(skip(sep) and layoutDirection) and
        optional(skip(sep) and smallestWidth) and
        optional(skip(sep) and availableDimen) and
        optional(skip(sep) and screenSize) and
        optional(skip(sep) and screenAspect) and
        optional(skip(sep) and roundScreen) and
        optional(skip(sep) and wideColorGamut) and
        optional(skip(sep) and highDynamicRange) and
        optional(skip(sep) and screenOrientation) and
        optional(skip(sep) and uiMode) and
        optional(skip(sep) and nightMode) and
        optional(skip(sep) and screenDpi) and
        optional(skip(sep) and touchScreen) and
        optional(skip(sep) and keyboardAvailability) and
        optional(skip(sep) and inputMethod) and
        optional(skip(sep) and navKeyAvailability) and
        optional(skip(sep) and navMethod) and
        optional(skip(sep) and platformVersion)

    private val qualifierParser by qualifier use {
        // Here, the type of `this` is 
        // Tuple5<Tuple16<MobileCodes?, Locale?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?>, TokenMatch?, TokenMatch?, TokenMatch?, TokenMatch?>.`<anonymous>`(): ConfigurationQualifier
        ConfigurationQualifier(
            mobileCodes = this.t1.t1,
            locale = this.t1.t2
        )
    }

    override val rootParser by qualifierParser
}

The error is:

null cannot be cast to non-null type com.github.h0tk3y.betterParse.utils.Tuple16<app.tusky.mklanguages.MobileCodes?, app.tusky.mklanguages.Locale?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?>
java.lang.NullPointerException: null cannot be cast to non-null type com.github.h0tk3y.betterParse.utils.Tuple16<app.tusky.mklanguages.MobileCodes?, app.tusky.mklanguages.Locale?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?, com.github.h0tk3y.betterParse.lexer.TokenMatch?>
	at app.tusky.mklanguages.ValuesParser$special$$inlined$and4$2.invoke(andFunctions.kt:42)
	at app.tusky.mklanguages.ValuesParser$special$$inlined$and4$2.invoke(andFunctions.kt:41)
	at com.github.h0tk3y.betterParse.combinators.AndCombinator.tryParse(AndCombinator.kt:72)
	at com.github.h0tk3y.betterParse.combinators.MapCombinator.tryParse(MapCombinator.kt:14)
	at com.github.h0tk3y.betterParse.parser.ParserKt.tryParseToEnd(Parser.kt:18)
	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:29)
	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:70)
	at app.tusky.mklanguages.ValuesParserTest$ParseLocale.returns the expected locale(ValuesParserTest.kt:54)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
	at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
	at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
	at org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
	at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
	at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:217)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:213)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:138)
	at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:68)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
	at org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
	at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
	at org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
	at org.junit.platform.engine.support.hierarchical.ForkJoinPoolHierarchicalTestExecutorService$ExclusiveTask.compute(ForkJoinPoolHierarchicalTestExecutorService.java:202)
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:189)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)

If the number of parsers is kept to 16 or below it works. E.g., by making the following modifications to the bottom half of the file:

// Replace `qualifier` and `qualifierParser` with:

    private val qualifier = skip(values) and
        optional(skip(sep) and mobileCodesParser) and
        optional(skip(sep) and (localeParser or bcpLocaleParser)) and
        optional(skip(sep) and layoutDirection) and
        optional(skip(sep) and smallestWidth) and
        optional(skip(sep) and availableDimen) and
        optional(skip(sep) and screenSize) and
        optional(skip(sep) and screenAspect) and
        optional(skip(sep) and roundScreen) and
        optional(skip(sep) and wideColorGamut) and
        optional(skip(sep) and highDynamicRange) and
        optional(skip(sep) and screenOrientation) and
        optional(skip(sep) and uiMode) and
        optional(skip(sep) and nightMode) and
        optional(skip(sep) and screenDpi) and
        optional(skip(sep) and touchScreen) and
        optional(skip(sep) and keyboardAvailability) // and
//        optional(skip(sep) and inputMethod) and
//        optional(skip(sep) and navKeyAvailability) and
//        optional(skip(sep) and navMethod) and
//        optional(skip(sep) and platformVersion)

    private val qualifierParser by qualifier use {
        ConfigurationQualifier(
            mobileCodes = this.t1,
            locale = this.t2
        )
    }

then the string values-en-rGB-land successfully parses.

Tokenizer regression in 0.4.0

I'm trying to upgrade better-parse in https://github.com/mipt-npm/kmath/tree/extended-grammar from 0.4.0-alpha-3 to 0.4.0.

The tokens are

Before the update the string "2+2*(2+2)" was lexed like:

[num for "2" at 0 (1:1), plus for "+" at 1 (1:2), num for "2" at 2 (1:3), mul for "*" at 3 (1:4), lpar for "(" at 4 (1:5), num for "2" at 5 (1:6), plus for "+" at 6 (1:7), num for "2" at 7 (1:8), rpar for ")" at 8 (1:9)]

After the update the same string is lexed like:

[num@1 for "2" at 0 (1:1), num@2 for "+2" at 1 (1:2), num@3 for "*(2" at 3 (1:4), num@4 for "+2" at 6 (1:7), rpar@5 for ")" at 8 (1:9)]

+ in num token with regex "[\\d.]+(?:[eE]-?\\d+)?".toRegex() makes no sense, so there is regression.

By the way, replacing token function with regexToken produces same wrong behavior.

RegexToken in Kotlin 1.7.0 JS IR compiler broken due to minification of member names

When the Kotlin/JS IR compiler is enabled in Kotlin 1.7.0, minification of member names is turned on by default which breaks preprocessRegex.

As a workaround, disabling minification as described in the 1.7.0 change notes fixes the issue.

How should Grammar objects be composed?

The goal

I have a file that looks like this:

Header
Line
Line
Line
Footer

And I would like to parse it into the following container:

data class File(
  val header: Header,
  val lines: List<Line>,
  val footer: Footer
)

The problem

I have written a grammar for each of the different variants, i.e Grammar<Header, Grammar<Line>, and Grammar<Footer>. There is some similarity in how these different lines look. How can I compose these grammars together? The variants are split on a newline \n character.

Any help would be much appreciated.

Discussion About Separation Between Tokens and Parsers

I've tried to use parser combinator libraries across multiple languages, and I've never seen the kind of hard distinction between tokens and parsers this library has. Perhaps I wasn't paying attention (This library is actually the one I've been the least frustrated with), but it's interesting and comes with a set of advantages and disadvantages. I'm interested in why this decision was made. I'd also like to get feedback on my own understanding of the concepts. This might also help you write good docs, or Id be willing to write them and PR if my understanding is good enough. feel free to close and ignore if neither of these discussions interest you.

The advantage is that it's very, very clear (after a bit of conceptual learning) what each piece of a grammar is for. Tokens are specifically about character sequence recognition, while parsers are about token sequence recognition and mapping. Once you get the distinction, it's easy to write grammars.

I see two main disadvantages.

Being forced to declare tokens separately from parsers feels redundant. Consider val id by regexToken(\\w+) use { text }. This creates a token and a parser, but only registers the parser. The solution of val idToken by regexToken... val idParser by idToken use { text } is fine, but feels very clunky.
It's not very easy to combine grammars, or reuse grammars as parsers in a parent grammar, specifically because tokens are separate entities. Consider two grammars A and B. If I want a third grammar C that expresses A or B, simply doing the obvious thing of setting the rootParser to this expression is insufficient, because C doesn't have the tokens defined in A and B, and in fact has no tokens at all. this problem gets worse with more grammars and deeper nesting. It's also not clear from your docs how such a merge operation should function.

Are these assessments fair? Am I missing something?
Thanks.

Library is not visible in multiplatform project (specifically js and common modules)

I have a related question on stackoverflow.

I noticed that adding library to the jsMain module is not working, while in jvmMain a separate dependency works:

sourceSets {
        val commonMain by getting {
            dependencies {
                implementation("com.github.h0tk3y.betterParse:better-parse:0.4.4")
            }
        }
        val commonTest by getting {
            dependencies {
                implementation(kotlin("test"))
            }
        }
        val jvmMain by getting {
            dependencies {
                implementation("com.github.h0tk3y.betterParse:better-parse-jvm:0.4.4")
            }
        }
        val jvmTest by getting
        val jsMain by getting {
            dependencies {
                implementation("com.github.h0tk3y.betterParse:better-parse-js:0.4.4")
            }
        }
        val jsTest by getting
    }

Using Grammars as Parsers seems to fail.

Seeing that Grammar<T> extends Parser<T>, I figured I should be able to delegate to a Grammar<T>, such as:

val exp: Parser<Exp> by ExpGrammar() // where ExpGrammar is a Grammar<Exp>

However, it doesn't seem to behave as expected. The following is a small SSCCE to demonstrate:

package com.example

import com.github.h0tk3y.betterParse.combinators.and
import com.github.h0tk3y.betterParse.combinators.map
import com.github.h0tk3y.betterParse.combinators.separatedTerms
import com.github.h0tk3y.betterParse.combinators.skip
import com.github.h0tk3y.betterParse.grammar.Grammar
import com.github.h0tk3y.betterParse.grammar.parseToEnd
import com.github.h0tk3y.betterParse.lexer.TokenMatch
import com.github.h0tk3y.betterParse.lexer.literalToken
import com.github.h0tk3y.betterParse.lexer.regexToken
import com.github.h0tk3y.betterParse.parser.Parser

data class Inner(
    val names: List<String>,
)

data class Outer(
    val name: String,
    val inner: Inner,
)

abstract class TestGrammarBase<T> : Grammar<T>()
{
    val idToken by regexToken("\\w+")

    val spaceToken by regexToken("\\s*", true)

    val commaToken by literalToken(",")

    val lBraceToken by literalToken("{")

    val rBraceToken by literalToken("}")
}

class InnerTestGrammar : TestGrammarBase<Inner>()
{
    override val rootParser: Parser<Inner> by separatedTerms(idToken, commaToken, true) map inner@{ tokenMatches ->
        return@inner Inner(
            names = tokenMatches.map(TokenMatch::text),
        )
    }
}

class OuterTestGrammar : TestGrammarBase<Outer>()
{
    val innerTestParser by InnerTestGrammar()

    override val rootParser: Parser<Outer> by idToken and skip(lBraceToken) and innerTestParser and skip(rBraceToken) map outer@{ (tokenMatch, inner) ->
        return@outer Outer(
            name = tokenMatch.text,
            inner = inner,
        )
    }
}

fun main()
{
    val innerTest1 = "X, Y, Z"
    val outerTest1 = "A { }"
    val outerTest2 = "A { X, Y, Z }"

    val innerTestGrammar = InnerTestGrammar()
    val outerTestGrammar = OuterTestGrammar()

    innerTestGrammar.parseToEnd(innerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest1).also(::println)
    outerTestGrammar.parseToEnd(outerTest2).also(::println)
}

And the output:

Inner(names=[X, Y, Z])
Outer(name=A, inner=Inner(names=[]))
Exception in thread "main" com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: MismatchedToken(expected=rBraceToken (}), found=idToken@5 for "X" at 4 (1:5))
	at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:92)
	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:29)
	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:70)
	at com.example.__langKt.main(__lang.kt:68)
	at com.example.__langKt.main(__lang.kt)

As you can see, the third attempt to parse the grammar-combined input of A { X, Y, Z } errors out. InnerTestGrammar and OuterTestGrammar, having extended from TestGrammarBase can see the shared member tokens/parsers, but seem to get confused (or perhaps I'm confused).

Is this not an intended use of Grammar?

Skip/Ignore/Optional parts

Hi,

I have some questions about regexTokens and not parsing some parts of an input.

See file at bottom for reference.

Ignore

In main() there is a line to parse :

 rule: StartBonus = GRAPH 1:entrance(x=2, y=3)

But StartBonus may have a constructor like in entrance, which may
be empty, like in

rule: StartBonus() = GRAPH 1:entrance(x=2, y=3)

Or it can have things inside, but I do not want to parse them. I want to ignore them.

rule: StartBonus(foo) = GRAPH 1:entrance(x=2, y=3)

But when I try to add a regexp like ([^)].*) it messes up with for example lpar matching.

regexTokens

I think it's related.

At start of input file there is :

version: 0.5f

0.5f regexp (not present in file) matches a lot of things.

So in the next line

alphabetFile: missionAlphabet.xpr

will be matched by the 0.5f regexp

Sample (can be run) code :

import com.github.h0tk3y.betterParse.combinators.*
import com.github.h0tk3y.betterParse.grammar.Grammar
import com.github.h0tk3y.betterParse.grammar.parseToEnd
import com.github.h0tk3y.betterParse.grammar.parser
import com.github.h0tk3y.betterParse.lexer.literalToken
import com.github.h0tk3y.betterParse.lexer.regexToken

interface Line

data class Version(val value: String) : Line
data class AlphabetFile(val file: String) : Line

sealed class Expression
data class TileMap(val width: Int, val height: Int) : Expression()
data class Graph(val symbols: List<Symbol>) : Expression()

sealed class Symbol
data class Node(val id: Int, val name: String, val constructor: Constructor) : Symbol()
//data class Edge(val id: String, val name: String, val source: String, val target: String) : Symbol

// Assignment or BooleanExpression
sealed class Argument
data class Assignment(val variable: String, val value: Any) : Argument()

data class Constructor(val arguments: List<Argument>)

data class Start(val expression: Expression) : Line
data class Rule(val name: String, val type: Expression) : Line

data class GGrammar(
        val version: Version,
        val alphabetFile: AlphabetFile,
        val start: Start,
        val rule: List<Rule>) {

    override fun toString(): String {
        return listOf(version, alphabetFile, start, rule.joinToString("\n")).joinToString("\n")
    }
}

object GrammarParser2 : Grammar<GGrammar>() {

    // literal first
    private val versionLiteral by literalToken("version: ")
    private val version by literalToken("0.5f") // bad
    private val alphabetFileLiteral by literalToken("alphabetFile: ")
    private val alphabet by literalToken("missionAlphabet.xpr") // bad
    private val startLiteral by literalToken("start: ")
    private val graphLiteral by literalToken("GRAPH")
    private val tileMapLiteral by literalToken("TILEMAP")
    private val rule by literalToken("rule: ")

    private val colon by literalToken(":")
    private val lpar by literalToken("(")
    private val rpar by literalToken(")")
    private val set by literalToken("=")
    private val coma by literalToken(", ")

    // main grammar parsers
    val versionParser by (versionLiteral and parser(this::version)) use { Version(t2.text) }
    val alphabetParser by (alphabetFileLiteral and parser(this::alphabet)) use { AlphabetFile(t2.text) }

    val tileMapParser by (tileMapLiteral and -parser(this::ws) and parser(this::integer) and -parser(this::ws) and parser(this::integer)) use { TileMap(t2.text.toInt(), t3.text.toInt()) }

    val assignmentParser by (parser(this::word) and -set and (parser(this::word) or parser(this::integer))) use { Assignment(t1.text, t2.text) }

    val constructorParser by (lpar and separatedTerms(assignmentParser, coma) and rpar) use { Constructor(t2) }

    val nodeParser by (parser(this::integer) and -colon and parser(this::word) and constructorParser) use { Node(t1.text.toInt(), t2.text, t3 ) }

    val graphParser by (graphLiteral and -parser(this::ws) and nodeParser) use { Graph(listOf(t2)) }

    val expressionParser = (tileMapParser or graphParser)

    val startParser by (startLiteral and expressionParser ) map { Start(it.t2) }
alphabetFile: missionAlphabet.xpr
    val ruleParser by separatedTerms((-rule and parser(this::word) and set and expressionParser) use { Rule(t1.text, t3) }, parser(this::NEWLINE))

    // regex last
    private val NEWLINE by regexToken("\n")

    private val integer by regexToken("\\d+")
    private val word by regexToken("\\w+")

    private val ws by regexToken("\\s+", ignore = true)

    override val rootParser by (
                    versionParser   * -NEWLINE *
                    alphabetParser  * -NEWLINE *
                    startParser     * -NEWLINE *
                    ruleParser
            ).map {
                GGrammar(it.t1, it.t2, it.t3, it.t4)
            }
}


fun main() {

    val text = """
            version: 0.5f
            alphabetFile: missionAlphabet.xpr
            start: GRAPH 0:Start(var=12, y=20)
            rule: StartBonus = GRAPH 1:entrance(x=2, y=3)
        """.trimIndent()

    println(GrammarParser2.parseToEnd(text))
}

How to fix the build?

Hello! I'm enjoying this library very much.

However, building it gives this:

% ./gradlew clean

> Task :buildSrc:compileKotlin FAILED
e: java.lang.NoClassDefFoundError: Could not initialize class org.jetbrains.kotlin.com.intellij.pom.java.LanguageLevel
        at org.jetbrains.kotlin.com.intellij.core.CoreLanguageLevelProjectExtension.<init>(CoreLanguageLevelProjectExtension.java:26)
        at org.jetbrains.kotlin.com.intellij.core.JavaCoreProjectEnvironment.<init>(JavaCoreProjectEnvironment.java:42)
        at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCoreProjectEnvironment.<init>(KotlinCoreProjectEnvironment.kt:26)
        at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCoreEnvironment$ProjectEnvironment.<init>(KotlinCoreEnvironment.kt:118)
        at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCoreEnvironment$Companion.createForProduction(KotlinCoreEnvironment.kt:420)
        at org.jetbrains.kotlin.cli.jvm.K2JVMCompiler.createCoreEnvironment(K2JVMCompiler.kt:226)
        at org.jetbrains.kotlin.cli.jvm.K2JVMCompiler.doExecute(K2JVMCompiler.kt:152)
        at org.jetbrains.kotlin.cli.jvm.K2JVMCompiler.doExecute(K2JVMCompiler.kt:52)
        at org.jetbrains.kotlin.cli.common.CLICompiler.execImpl(CLICompiler.kt:88)
        at org.jetbrains.kotlin.cli.common.CLICompiler.execImpl(CLICompiler.kt:44)
        at org.jetbrains.kotlin.cli.common.CLITool.exec(CLITool.kt:98)
        at org.jetbrains.kotlin.incremental.IncrementalJvmCompilerRunner.runCompiler(IncrementalJvmCompilerRunner.kt:371)
        at org.jetbrains.kotlin.incremental.IncrementalJvmCompilerRunner.runCompiler(IncrementalJvmCompilerRunner.kt:105)
        at org.jetbrains.kotlin.incremental.IncrementalCompilerRunner.compileIncrementally(IncrementalCompilerRunner.kt:249)
        at org.jetbrains.kotlin.incremental.IncrementalCompilerRunner.access$compileIncrementally(IncrementalCompilerRunner.kt:38)
        at org.jetbrains.kotlin.incremental.IncrementalCompilerRunner$compile$2.invoke(IncrementalCompilerRunner.kt:80)
        at org.jetbrains.kotlin.incremental.IncrementalCompilerRunner.compile(IncrementalCompilerRunner.kt:92)
        at org.jetbrains.kotlin.daemon.CompileServiceImplBase.execIncrementalCompiler(CompileServiceImpl.kt:602)
        at org.jetbrains.kotlin.daemon.CompileServiceImplBase.access$execIncrementalCompiler(CompileServiceImpl.kt:93)
        at org.jetbrains.kotlin.daemon.CompileServiceImpl.compile(CompileServiceImpl.kt:1644)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:104)
        at java.base/java.lang.reflect.Method.invoke(Method.java:577)
        at java.rmi/sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:360)
        at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:200)
        at java.rmi/sun.rmi.transport.Transport$1.run(Transport.java:197)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:712)
        at java.rmi/sun.rmi.transport.Transport.serviceCall(Transport.java:196)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:598)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:844)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$0(TCPTransport.java:721)
        at java.base/java.security.AccessController.doPrivileged(AccessController.java:399)
        at java.rmi/sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:720)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.ExceptionInInitializerError [in thread "RMI TCP Connection(2)-127.0.0.1"]
        at org.jetbrains.kotlin.com.intellij.pom.java.LanguageLevel.<clinit>(LanguageLevel.java:25)
        ... 35 more

Can this either be fixed, or could you document what extra setup steps I need to do? I'd appreciate it!

Make it usable in Android project

I just learning Parser combinator and found your library.
I can use the library after changing the target jvm to 1.6 (my min android API level is 15).
Do you plan to make the library work for Android too?

java.lang.IllegalArgumentException: The tokens list should not be empty

#19 notwithstanding.. i have built what should be the simplest working Grammar class, alongside of the json benchmark which seems to work well.

however i am having no luck

see https://gist.github.com/jnorthrup/dd88da7d4715379bf357788e6a6ca9e3 which include a testcase and a pom to recreate a project

java.lang.IllegalArgumentException: The tokens list should not be empty

at com.github.h0tk3y.betterParse.lexer.DefaultTokenizer.<init>(DefaultTokenizer.kt:9)
at com.github.h0tk3y.betterParse.grammar.Grammar$tokenizer$2.invoke(Grammar.kt:31)
at com.github.h0tk3y.betterParse.grammar.Grammar$tokenizer$2.invoke(Grammar.kt:17)
at kotlin.SynchronizedLazyImpl.getValue(LazyJVM.kt:74)
at com.github.h0tk3y.betterParse.grammar.Grammar.getTokenizer(Grammar.kt)
at com.github.h0tk3y.betterParse.grammar.GrammarKt.tryParseToEnd(Grammar.kt:63)
at com.fnreport.sentenceTest$1.invoke(sentenceTest.kt:23)
at com.fnreport.sentenceTest$1.invoke(sentenceTest.kt:18)
at io.kotlintest.Spec$runTest$callable$1$1.invoke(Spec.kt:124)
at io.kotlintest.Spec$runTest$callable$1$1.invoke(Spec.kt:15)
at io.kotlintest.Spec$runTest$initialInterceptor$1$1.invoke(Spec.kt:116)
at io.kotlintest.Spec$runTest$initialInterceptor$1$1.invoke(Spec.kt:15)
at io.kotlintest.Spec.interceptTestCase(Spec.kt:78)
at io.kotlintest.Spec$runTest$initialInterceptor$1.invoke(Spec.kt:116)
at io.kotlintest.Spec$runTest$initialInterceptor$1.invoke(Spec.kt:15)
at io.kotlintest.Spec$runTest$callable$1.call(Spec.kt:124)

Publish Dokka generated API documentation

How are delegates selected?

Im finding that the order I declare my delegates in a parser grammar affects whether or not it parses. I have a grammar like the following:

internal class Parser: Grammar<List<Command>>() {
    internal val comments by regexToken("#.*\n", true)
    internal val str by regexToken("\".*\"")
    internal val queryType by regexToken("[A-Z]+(?:_[A-Z]+)*")
    internal val word by regexToken("[A-Za-z]+")
    internal val LPAR by literalToken("(")
    internal val RPAR by literalToken(")")
    internal val COLON by literalToken(":")
    internal val LBRACE by literalToken("{")
    internal val RBRACE by literalToken("}")
    internal val equals by literalToken("=")
    internal val ws by regexToken("\\s+",true)
    internal val newline by regexToken("[\r\n]+",true)
    internal val comma by literalToken(",")
    internal val param: Parser<ValueMetadata> by (word and -COLON and word) map { (p, t) ->
        ValueMetadata(p.text, Type.valueOf(t.text))
    }
    val params by -LPAR and separatedTerms(param, comma, true) and -RPAR
    val outputs by -LPAR and separatedTerms(param, comma, true) and -RPAR
    val cmdParser by ( -LBRACE  and queryType and -equals and str and -RBRACE )
    val funcParser: Parser<Command> by (word and params and -COLON and params and cmdParser) map {
        (name, inputs, outputs, cmdFunc) ->
        val (type,cmd) = cmdFunc
        Command(name.text,
                inputs,
                outputs,
                cmd.text.subSequence(1, cmd.length - 1).toString(),
                QueryType.valueOf(type.text)
        )
    }

    override val rootParser: Parser<List<Command>> by zeroOrMore(funcParser)
}

thats meant to parse

# Documentation that should be ignored
findFoo(test:String,entity:String):(foo:String,bar:Int) {
  SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}
# Documentation that should be ignored
findBar(test:String,entity:String):(foo:String,bar:Int) {
  SQL_QUERY = "select foo,bar from baz where z = :name and y = :entity"
}

into a list of Commands. By just switching the order of str, queryType, and word the parse will fail / pass on different test cases with errors like Could not parse input: UnparsedRemainder(startsWith=word@2 for "findFoo" at 39 (2:1))

add in maven 0.4.0

add version 0.4.0 to maven, it's not there. there are already problems with jcenter

Question: About the algorithm used

I've read the Wiki page and the paper: "Parser Combinators for Ambiguous Left-Recursive Grammars" about parser combinators, Those pages say that the naive implementation takes exponential time & space. Is this library different?

Augment arithmetic example with substitution table

It would be nice to be able to substitute pre-defined names by values in arithmetic example. We need a simple expression calculator to substitute one from ancient CLHEP library.

Parsing Markdown

is it possible to use this library to parse Markdown?

Provide a Token and Parser at the same time

It would be great if you could provide both a Token and Parser at the same time. For example, if you want to re-use elements between multiple Grammars - or you just want to simplify the file. Something like this:

// Standard reusable token for parsing an int
val ints = Token("\\d+") use { text.toInt() }

Grammar<Any>() {
  val ws by token("\\s+|,", ignore = true)
  override val rootParser by separatedTerms(ints, ws)
}

Right now it doesn't seem possible to do this since the only way to add any type of Parser is using the by keyword. I'm not sure of the best way to do this, given how the constructor works.

Perhaps a compound type, like the following:

data class Compound<T>(val token: Token, val parser: Parser<T>)

Grammar<Any>() {
  val num by compound("\\d+", { text.toInt() })
}

When delegated to in a Grammar constructor, both the token and parser would be registered. This would allow you to create an external type, which can be resued via val xyz by <other>.

Help with parser

Hi, cool and useful library, thanks!

I need help to write parser for next grammar (it's some subset of CMake grammar)

FIRST(abc efg)

S_E_C_O_N_D()

third(
    abc
    efg
)

FOURTH(
    abc.efg
)

Where FIRST, S_E_C_O_N_D, third, FOURTH some macros, and abc, efg, abc.efg macros arguments (zero or more). I write parser, but I don't understand how to separate macros and arguments tokenization.

class CMake : Grammar<Any?>() {
    val LINE_COMMENT by token("#.*", ignore = true)
    val NL by token("[\r\n]+", ignore = true)
    val WS by token("\\s+", ignore = true)
    val O_PAREN by token("\\(")
    val C_PAREN by token("\\)")
    val IDENT by token("[A-Za-z_][A-Za-z0-9_]*")
    val ARG by token("[^\\s()#]+")
    val macros by IDENT * -O_PAREN * optional(parser(this::arguments)) * -C_PAREN
    val arguments by separated(parser(this::ARG), WS, true)
    override val rootParser by oneOrMore(macros)
}

This parser throw exception Could not parse input: MismatchedToken(expected=C_PAREN (\)), found=IDENT for "abc" at 6 (1:7)), but abc is ARG, not IDENT.

Thanks for help!

Rules dependent on each other

Is it possible to define the following grammar?

val singleton by lbrace cmd rbrace; // singleton -> cmd
val expr by number or funccall or ... or singleton; // expr -> singleton
val cmd by expr semicol; // cmd -> expr
// the dependency is a cycle: cmd -> expr -> singleton -> cmd

so that this grammar accepts

{ 1; }

as an expression. I can't write the grammar in the naive way, because Kotlin rejects this code complaining "Variable cmd must be initialized". I tried lazy but the properties seem to be NULL, not properly initialized.

edit: I added lazy to every parser but it ended up infinite recursion.

Looks like bug in description

interface Item
class Number(val value: Int) : Item
class Variable(val name: String) : Item

object ItemsParser : Grammar<Item>() {
    val num by token("\\d+")
    val word by token("[A-Za-z]")
    val comma by token(",\\s+")

    val numParser = num use { Number(text.toInt()) }
    val varParser = word use { Variable(text) }

    override val <!PROPERTY_TYPE_MISMATCH_ON_OVERRIDE!>rootParser<!> = separatedTerms(numParser or varParser, comma)
}

val result: List<Item> = ItemsParser.<!TYPE_INFERENCE_EXPECTED_TYPE_MISMATCH!>parseToEnd("one, 2, three, 4, five")<!>

Some errors reported on code copied from README

Does not parse across multiple Lines

I have to apologize in advance: I have no ideas about lexing and parsing.

When I try to simple a Parser and feed it content with multiple lines, the tokenizer fails

object : Grammar<String>() {
        val singleToken by token(""".+""")
        override val rootParser: Parser<String> by zeroOrMore(singleToken) map { it.joinToString("#") }
}.parseToEnd("fuu \nbar")

com.github.h0tk3y.betterParse.parser.ParseException: Could not parse input: UnparsedRemainder(startsWith=no token matched for "bar" at 4 (1:5))
at com.github.h0tk3y.betterParse.parser.ParserKt.toParsedOrThrow(Parser.kt:66)
at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:26)

Somehow, after the new line the \G in the wrapping allInOnePattern of the DefaultTokenizer (Tokenizer.kt#L42) does not match anymore.
What am I doing wrong here?

Question: Comment Parsing

Is there a suggested approach to doing something like matching all line comments ("// ... \n") or block comments ("/* ... */"). For example a default parser which applies to the whole document?

I want to be able to process any and all comments and for example create a CommentNode with position and content for each occurrence without having to add it to each and every line definition (if that makes sense).

repo fail

im not even a little bit experienced in fixing MPP, or gradle. this is a mpp project in intellij latest. no modifications yet.

Parse without allowing ignored tokens

I want to allow whitespace most of the time, so I have token("\\s+", true). However, in a few scenarios, I want to disallow whitespace between tokens/subexpressions. A relatively easy way to do this would be nice.

The best for my use would be a version of */and that doesn't ignore anything.

[Feature] Lexical modes

It would be nice if the library supported lexical modes (at least that's what ANTLR4 calls them: https://github.com/antlr/antlr4/blob/master/doc/lexer-rules.md#lexical-modes), where the lexer keeps a stack of modes and produces different tokens based on the current mode.

For example, I have a small mini language that basically specifies boolean expressions. Now I would like to use these expressions inside a larger document language, where they can occur only in certain positions in the header of the document (after a "@require" token until the end of the line). It would easily be possible to reference the root parser of the mini-language inside the document-grammar, but since the tokens are not compatible, this fails.
I can workaround this by specifying a token that matches the whole line, and parse this separately in a second step. This, of course, is very hacky, and does not work in all situations.

It would be much nicer if I could switch the lexer to the mini-language mode when encountering a @require token, and then popping back to the default mode when encountering the EOL token.
Ideally, I could "import" the mode from another Grammar. ~~Perhaps it might even make sense to make this the preferred way to specify different modes.~~

Let me know if I should create a minimal example.

Advanced rules

I wonder if there is it possible to implement advanced rules, something like if(condition), then(this rule can be applied).

I'm new to both kotlin and parser combinators so maybe my understanding of the problem is completely wrong.

As a concrete example what I want to do is very similar to the aritmetic evaluator example:
https://github.com/h0tk3y/better-parse/blob/master/demo/src/main/kotlin/com/example/ArithmeticsEvaluator.kt
except I have many different levels of precedece on operators (not just 3: sum, sub < mul, div < pow).
(Small note here: exponentiation is right associative, not left (a^b^c = a^(b^c)))
Suppose, instead of collectiong the same level operations into the same rule (like the example), a general single rule is the only way possible, to be more general and open to the addition of new operator.

val digits by token("\\d+")
val plus by token("+")
val minus
// val ...
val operator = plus or minus or ...
// this won't work because it will match addition before others
val operation: Parser<Tree> = digits and operator and digits
// it must match an operation that has the maximum value in the chain
val operation: Parser<Tree> = not(operator with higher precedence) and digits and operator and digits and not(operator with higher precedence)
// or you must associate the digits with the operator having higher precedence
val assignLeft: Parser<Tree> = (operator with higher precedence) and digits and (operator with lower precedence)
val rightAssign: Parser<Tree> = (operator with lower precedence) and digits and (operator with higher precedence)

Is it possible or does this approach make sense?
PS: this library is awesome and the infix notation makes the rules very readable when compared with other libraries and languages. Good job, thank you!

Indentation and Dedentation

Hi, I want to make a grammar with Python-style indentation. In Python:

def stuff():
    if True:
        if True:
            for i in range(2):
                print(i)
    print("YAY")

How would I make INDENT and DEDENT tokens using better-parse? I couldn't find any better-parse examples or tests showcasing how to do this.

Note that this library is not thread safe

In grammars on the JVM, RegexTokens (at least) use stateful matchers, and therefore aren't thread-safe. It would be nice if the library either noted this, or was thread-safe (by creating a new matcher as needed- at a glance this doesn't seem to be an expensive JVM operation, I think that's the creation of the Pattern).

Publish an IR compiled JS target of better-parse

Support kotlin 1.3.40

Hi @h0tk3y, your library looks awesome, could it be updated to support the latest kotlin build?

Support kotlin 1.4.0

Could you provide a version that compiles against kotlin 1.4.0 release? The 1.4-M2 version is rejected by the compiler as "compiled by a pre-release version".
Thanks!

Broken multiplatform deploy on 0.4.0-alpha-3

Currently, the multiplatform artifact is not resolved from common. Probably something went wrong with the deploy, or older gradle metadata is not resolved from gradle 6+. The workaround is to declare platform-specific dependencies including the metadata artifact:

    commonMain {
        dependencies {
            implementation("com.github.h0tk3y.betterParse:better-parse-multiplatform:0.4.0-alpha-3")
            implementation("com.github.h0tk3y.betterParse:better-parse-multiplatform-metadata:0.4.0-alpha-3")
        }
    }
    jvmMain{
        dependencies{
            implementation("com.github.h0tk3y.betterParse:better-parse-jvm:0.4.0-alpha-3")
        }
    }
    jsMain{
        dependencies{
            implementation("com.github.h0tk3y.betterParse:better-parse-js:0.4.0-alpha-3")
        }
    }

Misreported range in SyntaxTree for optional parser

When an optional parser parses nothing, it's range becomes 0..0, because of the fallback here. This error will bubble up, e.g. when the optional is inside an "and" parser.

Example:

val integerExpression by (optional(MINUS) * INT)

If there is no - to parse, the start of the range of the SyntaxTree corresponding to a match of integerExpression will always start at 0, regardless of where the match happens. So if the input where Hello World! 5 (and imagine integerExpression is part of a larger parser that can also parse words), the actual range would be 0..14, while the expected range would be 13..14.

Do you have any pointers how this could be best resolved?

Parser works in JS development mode but not in JS production mode

I've made a toy language: https://github.com/avwie/borklang

Everything works correctly on the JVM and when running jvmTest and jsTest. Also my web project is working correctly when I use browserDevelopmentRun.

However, when I use browserDistributionRun or make a web-distributable, the tokenizer/parsers don't work correctly.

For instance:

fn fib = (n) ->  {
  if (n < 2) n else fib(n - 1) + fib(n - 2)
}

fib(10)

This returns 55 in development mode, but in production mode I get:
Invalid number format: 'fi'

I found no way on how to debug this, because I loose a lot of information with productionRun.

[BUG] literal token with DefaultTokenizer can't find a match - 'noneMatched'

I a trying to figure out how this library works but I simply can't get passed the simple literal token example.
Here is my test code:

    @Test
    fun `should match foo`() {
        val literal = literalToken("foo")
        val tokenizer = DefaultTokenizer(listOf(literal))
        val sequence: TokenMatchesSequence = tokenizer.tokenize("my foo")

        assertThat(sequence.toList()).isNotEmpty() // success
        assertThat(sequence[0]!!.input).isEqualTo("my foo") // success
        assertThat(sequence[0]!!.type).isNotEqualTo(noneMatched) // failure
        assertThat(sequence[0]!!.text).isEqualTo("foo") // never reached
    }

This fails with assertThat(sequence[0]!!.type).isNotEqualTo(noneMatched) since there is no match for the simple input of my foo and the literal token looking for foo.

Any idea what I am doing wrong?

Infos:

implementation("com.github.h1tk3y.betterParse:better-parse:0.4.4")

Greetings!

Negative lookahead?

Hi there, I have a grammar with some negative lookahead rules. From what I can tell, some other PEG parsers have a ! operator that can help with this, but it appears this library doesn't. Is this something that would be easy to implement "in userspace"? I'm not a parsing expert. Thanks!

ClassCastException when combining a Tuple2 parser

Hello,

The following code :

val typePrefix by (type * -znws * parser(this::referenceParser) * -znws * -assign)

// ...

val traitFunction    by parser(this::functionSignature) use { CyanTraitDeclaration.Element.Function(CyanFunctionDeclaration(this, null), span) }
val traitProperty    by structProperty                  use { CyanTraitDeclaration.Element.Property(ident, type, span) }
val traitElement     by (traitFunction or traitProperty)
val traitDeclaration by (typePrefix * -znws * -trait * -znws * -lcur * -znws * (separatedTerms(traitElement, newLine)
    .use { toTypedArray() }) * -znws * -rcur) use { CyanTraitDeclaration(t2, t3, span(t1, t3.last())) }

Yields the following exception :

com.github.h0tk3y.betterParse.utils.Tuple2 cannot be cast to com.github.h0tk3y.betterParse.lexer.TokenMatch
java.lang.ClassCastException: com.github.h0tk3y.betterParse.utils.Tuple2 cannot be cast to com.github.h0tk3y.betterParse.lexer.TokenMatch
	at cyan.compiler.parser.CyanModuleParser$$special$$inlined$and2Operator$2.invoke(andFunctions.kt:9)
	at cyan.compiler.parser.CyanModuleParser$$special$$inlined$and2Operator$2.invoke(andFunctions.kt)
	at com.github.h0tk3y.betterParse.combinators.AndCombinator.tryParse(AndCombinator.kt:60)
	at com.github.h0tk3y.betterParse.combinators.MapCombinator.tryParse(MapCombinator.kt:14)
	at com.github.h0tk3y.betterParse.combinators.OrCombinator.tryParse(OrCombinator.kt:13)
	at com.github.h0tk3y.betterParse.combinators.AndCombinator.tryParse(AndCombinator.kt:40)
	at com.github.h0tk3y.betterParse.combinators.AndCombinator.tryParse(AndCombinator.kt:40)
	at com.github.h0tk3y.betterParse.combinators.SeparatedCombinator.tryParse(Separated.kt:16)
	at com.github.h0tk3y.betterParse.combinators.MapCombinator.tryParse(MapCombinator.kt:14)
	at com.github.h0tk3y.betterParse.combinators.MapCombinator.tryParse(MapCombinator.kt:14)
	at com.github.h0tk3y.betterParse.combinators.AndCombinator.tryParse(AndCombinator.kt:40)
	at com.github.h0tk3y.betterParse.combinators.MapCombinator.tryParse(MapCombinator.kt:14)
	at com.github.h0tk3y.betterParse.parser.ParserKt.tryParseToEnd(Parser.kt:18)
	at com.github.h0tk3y.betterParse.parser.ParserKt.parseToEnd(Parser.kt:29)
	at com.github.h0tk3y.betterParse.grammar.GrammarKt.parseToEnd(Grammar.kt:65)
	at cyan.compiler.parser.StatementsTest.doTest(StatementsTest.kt:29)
	at cyan.compiler.parser.StatementsTest.access$doTest(StatementsTest.kt:24)
	at cyan.compiler.parser.StatementsTest$TypeDeclarations.trait with one function(StatementsTest.kt:34)
        <...>

From a quick look into the code, it seems like in andFunctions.kt this is function is called for the first * operator in typePrefix * -znws * -trait :

@JvmName("and2") inline infix fun <reified T1, reified T2, reified T3>
    AndCombinator<Tuple2<T1, T2>>.and(p3: Parser<T3>) =
    AndCombinator(consumers + p3, {
        Tuple3(it[0] as T1, it[1] as T2, it[2] as T3)
    })

But looking at the debugger, I see values packed into a tuple and not "flat mapped" (don't know if that makess sense, couldn't think of a better way to put it) :

Here, t1 of element 0 (which is the result of typePrefix) is the match for token type, and t2 is from parser(this::referenceParser). Yet, the code thinks t1 is element 0, and t2 is element 1, while they are in fact both inside element 0. This obviously causes a ClassCastException when it tries to cast element 0 into what I assume reified type parameter T1 is (TokenMatch).

I couldn't figure out which part of the library laid out those elements like that, which is where I'm stuck.

I hope this all made sense, and thanks in advance.

NoClassDefFound when used as a peer dependency in Android project

I have a library that depends on better-parse.
When I use this library in Android project, it doesn't build.
The library has a lot of other dependencies - they don't cause any problems. And also better-parse is not exposed to the project - all calls are wrapped in my library.

Exception in thread "main" java.lang.NoClassDefFoundError: com/github/h0tk3y/betterParse/grammar/Grammar

The library: https://github.com/seniorjoinu/candid-kt
The project: https://github.com/seniorjoinu/candid-kt-example

[Feature] Bind Combinator

Is there a reason there's no BindCombinator? I tried to add one but Parser.remainder is internal. Also, is there a reason Parser.remainder is internal? I'd prefer the ability to add combinators

reuse parser result in other delegate

Say I have a synthetic situation: number n says how many symbols x will be after this number. After that, I want to drop all other xs.

For example:

val digit by regexToken("\\d")
val n by literalToken("n")
val found by digit map {it.toInt()}
val result by found and (found times n) and -zeroOrMore(n)
//                       ^ can't do

How to do recursion?

Im failing to do recursion:

    val group : Parser<OdlStruct> by (groupStart and zeroOrMore(group) and groupEnd).map {  // line 58
            (name1 : String, nested : List<OdlStruct>, name2 : String ) ->
        val result = OdlStruct(name1)
        result.nested.addAll(nested)
        result
    }

on input like:

GROUP=SwathStructure
    GROUP=SWATH_1
          GROUP=Dimension
          END_GROUP=Dimension
         ...
    END_GROUP=SWATH_1
END_GROUP=SwathStructure

I get error:

Caused by: java.lang.NullPointerException: Parameter specified as non-null is null: method com.github.h0tk3y.betterParse.grammar.Grammar.getValue, parameter <this>
	at com.github.h0tk3y.betterParse.grammar.Grammar.getValue(Grammar.kt)
	at com.sunya.netchdf.parser.OdlParser.getGroup(OdlParser.kt:58)
	at com.sunya.netchdf.parser.OdlParser.<clinit>(OdlParser.kt:58)

presumably from "zeroOrMore(group)"

I guess I need a to instantiate a new Parser instead of trying to reference the group(?), but im stymied how to do that.
Maybe something simple Im missing?

Thanks for your help and the cool library.

Indentation as in YAML

Hello,
This probably isn't the place but I struggle with identation base grammar as in YAML.

If someone could provide simple solution how would you do this as in YAML. I could get my head around with example

Token patterns with flags are broken

Since 0e6eb41, Tokens defined by a Pattern or RegEx object are converted to a Token defined by a pattern String.

This breaks use-cases where the pattern object was created with flags set
(e.g. token("^--".toRegex(RegexOption.MULTILINE)))

Currently, this can only be worked around by specifying the flags inside the pattern string
(e.g. token("(?m)^--"))

Publish better-parse to Maven Central

See: https://central.sonatype.org/pages/ossrh-guide.html

This will also require signing releases 😃