gregros / parjs Goto Github PK

View Code? Open in Web Editor NEW

270.0 5.0 18.0 6.57 MB

JavaScript parser-combinator library

License: MIT License

TypeScript 99.58% JavaScript 0.42%

parser-combinators javascript typescript parser text parse parsing functional-programming

parjs's Issues

Compile bundles

Compile Javascript bundles and type definitions for users that don't use npm.

readme example fails on 0.6.0 on node

Just as an initial exploration of the lib, I tried to fire up the example at the start of the readme, and it threw, in relevant part:

TypeError: Cannot read property 'q' of undefined
    at ParjsParser.between ([PATH-TO-MY-REPO]/node_modules/parjs/dist/internal/instance.js:31:28)

on node 8.4.0, parjs 0.6.0 installed via npm, and literally just running the code from the first example in the readme, i.e.:

var Parjs = require('parjs').Parjs;
let tupleElement = Parjs.float();
let paddedElement = tupleElement.between(Parjs.spaces);
let separated = paddedElement.manySepBy(Parjs.string(","));
let surrounded = separated.between(Parjs.string("("), Parjs.string(")"));
console.log(surrounded.parse("(1,  2 , 3 )"));

In particular, it seems to be throwing on line 31 of internal/instance.js at
bet = preceding.q.then(this).then(preceding.q);

Improve regexp basic parser

The regexp basic parser (PrsRegexp) should be improved to work with sticky regexps. Right now it's a bit of a hack.

More testing is also in order.

.pipe() function causes vitest to hang infinitely

First of all, LOVE the library! This is the only combinator library in TS that scratches the nom itch.

Normally I wouldn't put this straight into "issues"; I'd post it in the discussion section, since this is definitely a vitest issue and not a parjs issue. But I thought y'all might have more insight into this than me.

When running the example from the readme, it seems like any line that includes .pipe() causes a vitest test to run forever.

Here's a repo showing off the weirdness.

I have very few guesses as to what the problem could be, but I'm pretty sure it's nothing to do with TS compilation because the same problem happens when I change the test file to be JS.

Write a new math example

The existing math example is broken (see #27; thanks @mrcosta!) so I removed it, but I think it's an important example to have.

Circular dependencies

(!) Circular dependencies
node_modules/parjs/internal/parser.js -> node_modules/parjs/internal/combinators/combinator.js -> node_modules/parjs/internal/scalar-converter.js -> node_modules/parjs/internal/parsers/string.js -> node_modules/parjs/internal/parser.js
node_modules/parjs/internal/parser.js -> node_modules/parjs/internal/combinators/combinator.js -> node_modules/parjs/internal/scalar-converter.js -> node_modules/parjs/internal/parsers/string.js -> /Users/feichao/Develop/log-viewer/node_modules/parjs/internal/parser.js?commonjs-proxy -> node_modules/parjs/internal/parser.js
node_modules/parjs/internal/parser.js -> node_modules/parjs/internal/combinators/combinator.js -> node_modules/parjs/internal/scalar-converter.js -> node_modules/parjs/internal/parsers/regexp.js -> node_modules/parjs/internal/parser.js

putState?

Is there a built-in parser, analogous to putState (formerly, setState) in Parsec that sets the user state and returns nothing? Is there a mechanism to replace user state without performing an explicit mutation/side effect (e.g., userState.tags.push(...) like in the examples in the readme)?

docs: create documentation explaining how to create new parsers

As a user of parjs, I want to create my own parsers to solve my real world parsing problems.

Add documentation explaining

how to write your own parser
how to test your parser

I think the documentation should prefer typescript but it could have a javascript example, as well.

This came up in the discussion in #64 (comment)

Where should this library go next?

Which direction should this library go next?

I can do benchmarking against other libraries and/or regular expressions and make performance improvements.
I can try to improve debugging.
I can write implementations of real parsers.

docs: add a friendly guide for users

This came up in the discussion in #64 (comment)

The thing that got me into parser combinators was FParsec, and specifically the comprehensive user’s guide they have, which actually explains how you actually use them to write parsers.

Until this exists, the library can only be used by people who already know what a parser combinator is and how to use them. This is a shame, because parser combinators are a good solution for lots of real world parsing tasks.

The `or` combinators aren't grouping the error messages.

type Op =
  | [op: "center"]
  | [op: "penup"]
  | [op: "pendown"]
  | [op: "forward", length: number]
  | [op: "backward", length: number]
  | [op: "turnleft", degree: number]
  | [op: "turnright", degree: number]
  | [op: "direction", degree: number]
  | [op: "gox", x: number]
  | [op: "goy", y: number]
  | [op: "penwidth", width: number]
  | [op: "go", data: { x: number, y: number }]
  | [op: "pencolor", data: { x: number, y: number, z: number }]


const pFloat = () => float().pipe(thenq(whitespace()))
const pComma = () => string(",").pipe(thenq(whitespace()))

const pTemplate = <T extends string>(key: T) => whitespace().pipe(qthen(key)).pipe((thenq(whitespace()))).pipe(mapConst(key))
const pZero = <T extends string>(key: T) => pTemplate(key).pipe(map((x): [T] => [x]))
const pCenter = () => pZero("center")
const pPenUp = () => pZero("penup")
const pPenDown = () => pZero("pendown")

const pOne = <T extends string>(key: T) => pTemplate(key).pipe(then(pFloat()))
const pForward = () => pOne("forward")
const pBackward = () => pOne("backward")
const pLeft = () => pOne("turnleft")
const pRight = () => pOne("turnright")
const pDirection = () => pOne("direction")
const pGoX = () => pOne("gox")
const pGoY = () => pOne("goy")
const pPenWidth = () => pOne("penwidth")

const pFloatComma = () => pFloat().pipe(thenq(pComma()))
const pXY = () => pFloatComma().pipe(then(pFloat())).pipe(map(([x, y]) => ({ x, y })))
const pXYZ = () => pFloatComma().pipe(then(pFloatComma(), pFloat())).pipe(map(([x, y, z]) => ({ x, y, z })))
const pGo = () => pTemplate("go").pipe(then(pXY()))
const pPenColor = () => pTemplate("pencolor").pipe(then(pXYZ()))

const pStatement = (): Parjser<Op> =>
  pCenter().pipe(or(pPenUp(), pPenDown())) // 0
    .pipe(or(pForward(), pBackward(), pLeft(), pRight())) // 1
    .pipe(or(pDirection(), pGoX(), pGoY(), pPenWidth())) // 1
    .pipe(or(pGo())) // XY
    .pipe(or(pPenColor())) // XYZ

I have a parser for a small language as you can see here. The problem is that the error message once it fails all the or cases is expected 'pencolor'. Am I using the parser wrong here that the error messages are getting lost?

Fparsec for example would gather the error messages in the choice combinator to make them more informative.

Hard failure when soft expected

Problem

I have an expr like:

const m = p1.pipe(maybe(), then(p2));
m.parse(onlySatisfiesP2); // Hard failure from p1, when Success expected

Reproduction

minimal demo: https://github.com/cdaringe/email-parser/blob/main/issue.ts#L56
clone and npm install (or pnpm install, as i'm using)
run the module: node -r ts-node/register issue.ts or bun issue.ts

Discussion

It seems in some circumstances Soft is promoted to Hard, but it's not clear when. Further, I expected a soft failure above, wanted the maybe() to catch, then successfully parse using the then(expr). I've tried debugging, and it's a bit wonky to debug, but I see the promotion happening in the Then instance if the Soft failure isn't the first failure. I'm trying to write an email parser, with many optional, leading expressions in the grammar. Any tips would be welcomed. Much appreciated!

uniUpper function is not defined.

Is it not yet implemented?
Also uniDigit.

.maybe() should allow falsy alternative values.

Currently using a falsy alternative value (e.g. 0, null, or empty string) in a .maybe() call throws an exception ("Uncaught Error: the inner parser must be quiet if an alternative value is not supplied.") I would argue that .maybe(0), .maybe(null) or .maybe("") should be allowable alternate values.

An good way to a parser within a ParjsAction

Suppose I want to parse comments in a language and I want to return as the value of the comment without the intial %.

I thought I might wrap a complex parser in a ParjsAction, and only mess around with the return value. But I find it difficult to write such a class:

function isOk<T>(res: Reply<T>): res is SuccessReply<T> {
    return res.kind === "OK";
}

class CommentParseAction extends ParjsAction {
    isLoud = true;
    expecting = `a comment line`;

    protected _apply(ps: ParsingState) {
        const res = Parjs.regexp(/^%[^\n]*/);
    
        if (isOk(res)) {
            ps.position = ps.input.length;
            ps.value = res.value;
        } else {
            ps.position = res.trace.position;
            ps.value = "";
        }
        ps.kind = res.kind;
}

Of course, in this simple case I could increment the position myself and do not need to wrap parsers. But for complex patterns that would be more difficult. Is there a canonical way to solve this?

refactor: replace overloaded functions with mapped types

Currently parjs has some functions that are overloaded, such as this:

export function pipe<T, T1, T2, T3, T4, T5, T6>(
    source: ImplicitParjser<T>,
    cmb1: ParjsCombinator<T, T1>,
    cmb2: ParjsCombinator<T1, T2>,
    cmb3: ParjsCombinator<T2, T3>,
    cmb4: ParjsCombinator<T3, T4>,
    cmb5: ParjsCombinator<T4, T5>,
    cmb6: ParjsCombinator<T5, T6>
): Parjser<T6>;

The goals:

see if it's possible to get rid of overloading and replace it with something else, such as a map type or a tuple type

type MapParjsers<Vs extends any[] = {
	[K in keyof Vs]: ImplicitParjser<Vs[K]>
}

export function then<Head, Tail extends any[]>(first: ImplicitParjser<Head>, ...following: MapParjsers<Tail>): ParjsCombinator<[Head, ...Tail]>

make sure to preserve the type inference! Calling these must be type safe, i.e. the return type must be perfectly inferred (right now it's good)

This came up in #64 (comment)

Example of custom parser from `README.md` doesn't work

I am trying to write a custom parser, following https://github.com/GregRos/parjs#creating-the-parser .

# test.ts
import { Parjser } from "parjs";
import { ParjserBase, ParsingState } from "parjs/internal";

export function caseString<T extends string>(str: T): Parjser<T> {
	return new (class ParseString extends ParjserBase<T> {
		expecting = `message`;
		type = "string";
		_apply(ps: ParsingState): void {
      // ...
		}
	})();
}

but I am getting the following compilation error:

$ ts-node test.ts
node_modules/ts-node/src/index.ts:859
    return new TSError(diagnosticText, diagnosticCodes, diagnostics);
           ^
TSError: ⨯ Unable to compile TypeScript:
test.ts:5:13 - error TS2351: This expression is not constructable.
  Type 'typeof ParseString' has no construct signatures.

  5  return new (class ParseString extends ParjserBase<T> {
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  6   expecting = `message`;
    ~~~~~~~~~~~~~~~~~~~~~~~~
... 
 10   }
    ~~~
 11  })();
    ~~~
test.ts:5:40 - error TS2315: Type 'ParjserBase' is not generic.

5  return new (class ParseString extends ParjserBase<T> {
                                         ~~~~~~~~~~~~~~

    at createTSError (node_modules/ts-node/src/index.ts:859:12)
    at reportTSError (node_modules/ts-node/src/index.ts:863:19)
    at getOutput (node_modules/ts-node/src/index.ts:1077:36)
    at Object.compile (node_modules/ts-node/src/index.ts:1433:41)
    at Module.m._compile (node_modules/ts-node/src/index.ts:1617:30)
    at Module._extensions..js (node:internal/modules/cjs/loader:1245:10)
    at Object.require.extensions.<computed> [as .ts] (node_modules/ts-node/src/index.ts:1621:12)
    at Module.load (node:internal/modules/cjs/loader:1069:32)
    at Function.Module._load (node:internal/modules/cjs/loader:904:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) {
  diagnosticCodes: [ 2351, 2315 ]
}

FYI, I am using the following versions:

$ ./node_modules/ts-node/dist/bin.js -vvv
ts-node v10.9.1 node_modules/ts-node
node v16.19.1
compiler v5.3.3 node_modules/typescript/lib/typescript.js

Request: a manySepBy variant that captures the separators

Currently, the manySepBy combinator would return the things that are separated while discarding the separator. I would like to propose a variant of this combinator that returns a projection that has the separator included, perhaps satisfying this kind of interface:

interface Alternation<Term, Separator> {
    firstTerm: Term;
    subsequentTerms: SubsequentTerm<Term, Separator>[]
}

interface SubsequentTerm<Term, Separator> {
   separator: Separator;
   term: Term; // on the RHS of the separator.
}

Not (not) capturing

I'd like match any char that isn't a newline.

This works as expected:

> newline().pipe(not()).parse('\n')
ParjsFailure {
  trace: {
    userState: ParserUserState {},
    position: 0,
    reason: 'not expecting: expecting newline',
    input: '\n',
    location: [Getter],
    stackTrace: [ [Not] ],
    kind: 'Soft'
  }
}

In the following example, no input is consumed:

> newline().pipe(not()).parse('a')
ParjsFailure {
  trace: {
    userState: ParserUserState {},
    position: 0,
    reason: 'parsers did not consume all input',
    input: 'a',
    location: [Getter],
    stackTrace: [],
    kind: 'Soft'
  }
}

Is the intended way to write something like this using then?

>  newline().pipe(not()).pipe(then(anyChar())).parse('a')

docs: add performance comparison to other parsing techniques

As a user who has not decided on using parjs yet, I would like to see how it compares to other parsing techniques so that I can see whether it's a good fit for my use case.

Expose TypeScript typings in the npm-published package

Title says much of it. The project advertises using TypeScript, but installing the package doesn't expose a definitions file for projects with strict typing enabled.

Parser Combinator as list

Hi there.

The typescript typings of the then combinator allow to pass up to four parsers to the combinator.
Is it possible to pass either 5 or pass a list of parsers to a then method?

Parjs.whitespaces doesn't parse tab characters

Hi,

Running

Parjs.whitespaces.parse("\t")

will fail, even though the docs state that it should parse tabs. I think the root cause is this:

parjs/src/lib/internal/implementation/functions/char-indicators.ts

Line 20 in c92628b

export const tab = 0x0008;

For some inexplicable reason, tab is defined as 0x08 instead of 0x09.

Unable to use parjs

I've tried launching basic example from docs, however my typescript environment is unable to locate parjs module after installation. I've attached basic example, just unzip and run npm i to install it's dependencies. After that any attempt to compile the project using command-line tsc call gives

Here is basic example zip file:
parjs-example.zip

It looks like a bug or am I wrong?...

Float parser: allow implicit exponentiation sign

Currently, something like 1e7 would fail the float parser where it could be interpreted as 1e+7. FWIW, JavaScript allows 1e7 so it would be nice to have this baked into the standard parser.

Write Markdown parser

Write a markdown parser using the CommonMark specification.

[math example] Expected precedence not matching

Hi, I'm trying to use the math parser in a project and I noticed that some examples don't work as expected (or am I'm using incorrectly maybe). I'm trying to fix the issue myself but I could use some help.

minimum example:

initial expression: 1 + (2) / 2
expected expression: +(/(2, 2), 1)
current result: /(+(1, 2), 2)

This also happens with the one written in the documentation: ( 1 + 2 ) * 2 * 2+( 3 * 5 ) / 2 / 2 + 0.25

Let me know if is something wrong on my side or if I can help in somehow (and thanks a lot for the library)

Regexp parser error

Trying all this leads to error

const { regexp } = require('parjs');

let TEST = regexp(new RegExp('a'));
let TEST = regexp(/a/);
let TEST = regexp('a');
TEST.parse('a');

this.reason = `expecting input matching /${origRegexp.source}/`;
                ^
TypeError: Cannot set property 'reason' of undefined
    at regexp (/home/***/www/node_modules/parjs/internal/parsers/regexp.js:19:17)
    at Object.<anonymous> (/home/***/www/parser.js:14:14)
    at Module._compile (internal/modules/cjs/loader.js:1138:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1158:10)
    at Module.load (internal/modules/cjs/loader.js:986:32)
    at Function.Module._load (internal/modules/cjs/loader.js:879:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47

Docs

Hi - great library!

…but is there any place with complete docs ?

`thenPick` Automatically Assumes `source` Consumed Input

I have a parser that starts with state().thenPick( ... followed by a parser that will either soft-fail if what it encounters doesn't fit a certain format, or hard-fail if what it finds does fit, but is already stored in the state (the whole thing should read some HTML-style headers and crash if there are duplicate keys). Anyway, thenPick is written so that, when its source succeeds but the next parser fails softly, it converts the failure to hard. So when my parser encounters the end of the header list (\r\n), the header parser fails softly, but the whole thing fails hard because of thenPick, even though state hadn't consumed any input.

I can work around it in this case by first reading the input and then using state afterward. All other such cases should be fixable by using backtrack, but it still seems wrong to me that it would behave this way. Isn't hard failure supposed to be strictly equivalent to "consumed input and then failed"? Sorry if I got it wrong, but for instance Parsec doesn't use the phrases "hard" and "soft", it just literally says "if p failed while consuming input...", so I figure that's what it means.

Improve debugging

Debugging should be improved.

For example, expecting fields should be rewritten to be more informative.

How to parse fixed part followed by several optional parts?

I am trying to parse a string that starts with a fixed part (a list of points followed by a type) and then contains several optional parts.

I came up with this code:

import {
    anyStringOf,
    float,
    noCharOf,
    regexp,
    spaces1,
} from "parjs";
import {
    between,
    many,
    manySepBy,
    map,
    maybe,
    or,
    stringify,
    then,
} from "parjs/combinators";

type Point = { x: number; y: number; z: number };
type Line = {
    type: string;
    points: Point[];
    quotedString?: string;
    leftOrRight?: "left" | "right";
    percentage?: string;
};

const maybeParse = <T>(
    cmb: Parjser<T>,
    input?: string,
    def?: T
): T | undefined => {
    if (input === undefined || input === null) {
        return def;
    }
    const { isOk, value } = cmb.parse(input);
    return isOk ? value : def;
};

const notSpaces1 = regexp(/\S+/).pipe(map(([completeMatch]) => completeMatch));

const quotedString = noCharOf('"').pipe(many(), between('"'), stringify());

const percentage = regexp(/\d+%/).pipe(stringify());

const point = regexp(/(-?\d\d)(-?\d\d)(\d\d)?/).pipe(
    or(regexp(/(-?\d\d+)\.(-?\d\d+)(?:\.(\d\d+))?/)),
    map(
        ([_, x, y, z]): Point => ({
            x: maybeParse(float(), x),
            y: maybeParse(float(), y),
            z: maybeParse(float(), z, 0),
        })
    )
);

// match: /^(-?\d\d-?\d\d(?:\d\d)?(?:--?\d\d-?\d\d(?:\d\d)?)+)\s+(\S+)\s*(?:["“](.+)["”])?\s*(left|right)?\s*(\d+%)?/
//    or: /^(-?\d\d+\.-?\d\d+(?:\.\d\d+)?(?:--?\d\d+\.-?\d\d+(?:\.\d\d+)?)+)\s+(\S+)\s*(?:["“](.+)["”])?\s*(left|right)?\s*(\d+%)?/
const line = point
    .pipe(
        manySepBy("-"),
        then(spaces1(), notSpaces1),
        map(([points, _, type]): Line => ({ type, points }))
    )
    .pipe(
        then(spaces1(), quotedString),
        maybe(),
        map(([line, _, quotedString]): Line => ({ ...line, quotedString }))
    )
    .pipe(
        then(spaces1(), anyStringOf("left", "right")),
        maybe(),
        map(
            ([line, _, leftOrRight]): Line => ({
                ...line,
                leftOrRight: leftOrRight === "left" ? "left" : "right",
            })
        )
    )
    .pipe(
        then(spaces1(), percentage),
        maybe(),
        map(([line, _, percentage]): Line => ({ ...line, percentage }))
    );

export const parse = (input: string) => line.parse(input);

But it fails with:

> p.parse("1111-2222 asht")
ParjsFailure {
  trace: {
    userState: ParserUserState {},
    position: 14,
    reason: 'failed to fulfill a predicate',
    input: '1111-2222 asht',
    location: [Getter],
    stackTrace: [
      [Must],
      [Str],
      [Then],
      [MaybeCombinator],
      [Map],
      [Then],
      [MaybeCombinator],
      [Map],
      [Then],
      [MaybeCombinator],
      [Map]
    ],
    kind: 'Hard'
  }
}

Following the example https://gregros.github.io/parjs/index.html#md:%F0%9F%98%AC-hard-failure I also tried to replace maybe with recover(() => ({ kind: ResultKind.SoftFail })), but that makes the value of line get lost between the steps, TypeScript looses any type annotations and when I run it I get TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator)).

How would I build the parser correctly?

Benchmarking

The library should be benchmarked against other libraries.

Provide es module version

I'm struggling with importing your library in a vite project. I mostly just get the error

Uncaught SyntaxError: import not found: default

Debugging feature: a zero key

The problem is that unlike, say, C#, JS doesn’t have a way of customizing how an object is visualized in the debugger. So if you want to figure out what an object is exactly you have to rummage around in various properties or using the console.

A cool trick I’ve been using at work is to put a textual description of an object into the 0 key on that object (without a matching TS signature, it’s just for runtime). It has to be a value key, so usually I generate it in the constructor for immutable objects.

The 0 key is the first key that will be printed and shown by debuggers, so it lets developers immediately see what object they’re dealing with. The trick works on lots of different IDEs and debuggers. Here is an example from JetBrains:

The description that goes into this key would be the single line description.

This came up in the discussion in #64

Run the tests on precompiled code in CI

jasmine is around 5 years old and could be upgraded from ^3.3.0 -> ^5.0.0.

If you give the 👌🏻 to work on this, I might be able to contribute this change.

I noticed in order to run the tests, I need to build the code first. This is a bit unexpected as usually I work with test frameworks that don't have this extra step (chai/mocha, jest, cypress).
Maybe there is a way to get rid of this extra step? What is your testing workflow like?

feature: Case insensitive string parsing

I encountered a use case where I would need a parser that could consume a string, but insensitive to the case of the letters it contains.

For example, I would like a parser that matches "foo", "Foo", "fOo", etc.

import { string } from "parjs";

const p = string("foo");

p.parse("foo");
// ParjsSuccess { value: 'foo', kind: 'OK' }

p.parse("Foo");
// ParjsFailure {
//   trace: {
//     userState: ParserUserState {},
//     position: 0,
//     reason: "expecting 'foo'",
//     input: 'Foo',
//     location: [Getter],
//     stackTrace: [Array],
//     kind: 'Soft'
//   }
// }

Strange behavior with "between" combinator

Something strange going on with the between combinator. Here's a small example:

let arg = P.regexp(/[a-zA-Z_]/);

let argInParens = arg.between(P.string("("), P.string(")"));

P.string("f").then(argInParens).parse("f(x)");
    // SuccessReply { value: [ 'f', [ 'f' ] ], kind: 'OK' };

I expected the reply to be ['f', ['x']] or similar. Where did the x go?

If I try to match more than one letter in arg, I get a hard fail:

let arg = P.regexp(/[a-zA-Z_]+/);

let argInParens = arg.between(P.string("("), P.string(")"));

P.string("f").then(argInParens).parse("f(xx)");
    // SuccessReply { value: [ 'f', [ 'f' ] ], kind: 'OK' };

Here's the trace:

{
  "trace": {
    "userState": {},
    "position": 3,
    "reason": "')'",
    "input": "f(xx)",
    "location": {
      "row": 0,
      "column": 3
    },
    "stackTrace": [
      {
        "isLoud": true,
        "str": ")",
        "expecting": "')'",
        "displayName": "string"
      },
      {
        "inner": {
          "isLoud": true,
          "str": ")",
          "expecting": "')'",
          "displayName": "string"
        },
        "isLoud": false,
        "expecting": "')'",
        "displayName": "quiet"
      },
      {
        "parsers": [
          {
            "inner": {
              "parsers": [
                {
                  "inner": {
                    "isLoud": true,
                    "str": "(",
                    "expecting": "'('",
                    "displayName": "string"
                  },
                  "isLoud": false,
                  "expecting": "'('",
                  "displayName": "quiet"
                },
                {
                  "isLoud": true,
                  "regexp": {},
                  "expecting": "input matching '[a-zA-Z]+'",
                  "displayName": "regexp"
                }
              ],
              "isLoud": true,
              "expecting": "'('"
            },
            "isLoud": true,
            "expecting": "'('",
            "displayName": "then"
          },
          {
            "inner": {
              "isLoud": true,
              "str": ")",
              "expecting": "')'",
              "displayName": "string"
            },
            "isLoud": false,
            "expecting": "')'",
            "displayName": "quiet"
          }
        ],
        "isLoud": true,
        "expecting": "'('"
      },
      {
        "inner": {
          "parsers": [
            {
              "inner": {
                "parsers": [
                  {
                    "inner": {
                      "isLoud": true,
                      "str": "(",
                      "expecting": "'('",
                      "displayName": "string"
                    },
                    "isLoud": false,
                    "expecting": "'('",
                    "displayName": "quiet"
                  },
                  {
                    "isLoud": true,
                    "regexp": {},
                    "expecting": "input matching '[a-zA-Z]+'",
                    "displayName": "regexp"
                  }
                ],
                "isLoud": true,
                "expecting": "'('"
              },
              "isLoud": true,
              "expecting": "'('",
              "displayName": "then"
            },
            {
              "inner": {
                "isLoud": true,
                "str": ")",
                "expecting": "')'",
                "displayName": "string"
              },
              "isLoud": false,
              "expecting": "')'",
              "displayName": "quiet"
            }
          ],
          "isLoud": true,
          "expecting": "'('"
        },
        "isLoud": true,
        "expecting": "'('",
        "displayName": "between"
      },
      {
        "parsers": [
          {
            "isLoud": true,
            "str": "f",
            "expecting": "'f'",
            "displayName": "string"
          },
          {
            "inner": {
              "parsers": [
                {
                  "inner": {
                    "parsers": [
                      {
                        "inner": {
                          "isLoud": true,
                          "str": "(",
                          "expecting": "'('",
                          "displayName": "string"
                        },
                        "isLoud": false,
                        "expecting": "'('",
                        "displayName": "quiet"
                      },
                      {
                        "isLoud": true,
                        "regexp": {},
                        "expecting": "input matching '[a-zA-Z]+'",
                        "displayName": "regexp"
                      }
                    ],
                    "isLoud": true,
                    "expecting": "'('"
                  },
                  "isLoud": true,
                  "expecting": "'('",
                  "displayName": "then"
                },
                {
                  "inner": {
                    "isLoud": true,
                    "str": ")",
                    "expecting": "')'",
                    "displayName": "string"
                  },
                  "isLoud": false,
                  "expecting": "')'",
                  "displayName": "quiet"
                }
              ],
              "isLoud": true,
              "expecting": "'('"
            },
            "isLoud": true,
            "expecting": "'('",
            "displayName": "between"
          }
        ],
        "isLoud": true,
        "expecting": "'f'",
        "displayName": "then"
      }
    ],
    "kind": "HardFail"
  }
}

Of course, it's possible (likely?) I'm misreading the docs...

Reconsider “quiet” parsers

Ages ago I had an interesting system where you could tell a parser to discard its output, which changed its compile-time type to QuietParser. These were called quiet parsers.

This means that combinators remain very flexible without forcing the user to deal with arrays of multiple outputs when they don’t want to.

For instance, one use case was to communicate to a combinator such as manySepBy whether you want to the separator’s result to be included in the output. If you did, you’d use a regular (loud) parser, while if you didn’t you’d give it a quiet one. This was an issue in #35.

This means that the result yielded by manySepBy(parser1, parser2) depended on the types of parser1 and parser2. It could be LoudParser<[Output1, Output2]> if neither was quiet, but it could also be Parser or Parser, as well as QuietParser if all inputs were quiet.

This kind of thing requires pretty powerful type system features to implement, or else code generation. I managed to make it work when the combinators were instance methods, but when I moved to using the .pipe interface it didn’t seem possible anymore.

This came up in #64 (comment)

refactor: Get rid of namespaces

Namespaces are kind of a legacy concept. They should be removed and replaced with functions in normal modules. There really isn’t any call for using them here.

ScalarConverter
StringHelpers
NumHelpers

test: add type level tests

Since there is now a new feature for creating parsers with a constant string type (#78), this feature should also have tests specifying that the correct type is returned.

There are likely many other cases where type level testing would be beneficial.

Is there like a chain combinator?

I want to use a combinator like chain, but does this library have such a function?
https://github.com/jneen/parsimmon/blob/master/API.md#parserchainnewparserfunc

If there is another way, please let me know.

How to set custom "expecting" messages?

Hi, it would be very useful to have some utility when creating parsers with the combinators, to set the .expecting message to something custom.
That way when I see my trace, I could have really useful messages in the context of my domain instead of the technical messages I see right now.

Example of current messages:

I'd like them to read expecting: "a header definition" or expecting: "an email address" etc.

Debugging feature: textual descriptions for parsers

I think it should be possible to generate a textual description of each parser. This will help when debugging, since you can print the description together with an error, and it can also be used to pretty print the parsers and generally understanding what’s happening in a complex program.

I was planning on doing this using emojis like in the README. I think they do a good job of separating different parts of the description visually.

The 🍕 emoji would indicate a building block parser (I think it’s a cute joke, and I couldn’t find anything better), and ⚙️ might indicate combinators. Then an arrow ➜ could be used to further separate combinator inputs.

The idea is not to have a parseable format, just a way of pretty printing. So humans should be able to figure it out, but it can be a little ambiguous.

I wasn’t able to develop the format fully and I don’t know how it will look for complex parsers.

In fact, ideally, there should be several formats.

A full format which prints the whole structure of the parser.
A concise single-like format.

This came up in the discussion in #64 (comment)

Main documentation describes an upcoming version instead of the current one

The master branch is currently in an inconsistent state. The readme describes features that are not currently released. Right now it's not possible to release either (#95) and this may be confusing to users.

Let's create an easy way to view the documentation for the current release by default.

Publishing new versions is broken on the master branch

I think I have resolved most build related errors. However, publishing seems broken and in a somewhat unexpected state:

as far as I can see, the yarn do-publish script was added in mikavilpas@8748b3b
it used src/publish.ts to do some source map related magic, but that script was removed in 657f9b3

Compilation error in typescript from basic use

Relevant code:

import {Parjs} from 'parjs';

let name = Parjs.anyChar();

Error:

> tsc --lib esnext lambda.ts
../node_modules/parjs/loud.d.ts:257:23 - error TS2744: Type parameter defaults can only reference previously declared
 type parameters.                                                                                                   

257     then<S1 = T, S2 = S2, S3 = S3>(parsers: [ImplicitLoudParser<S1>, ImplicitLoudParser<S2>, ImplicitLoudParser<S
3>]): LoudParser<[T, S1, S2, S3]>;                                                                                  
                          ~~

../node_modules/parjs/loud.d.ts:257:32 - error TS2744: Type parameter defaults can only reference previously declared
 type parameters.                                                                                                   

257     then<S1 = T, S2 = S2, S3 = S3>(parsers: [ImplicitLoudParser<S1>, ImplicitLoudParser<S2>, ImplicitLoudParser<S
3>]): LoudParser<[T, S1, S2, S3]>;                                                                                  
                                   ~~

lambda.ts:91:12 - error TS2349: Cannot invoke an expression whose type lacks a call signature. Type 'LoudParser<strin
g>' has no compatible call signatures.                                                                              

91 let name = Parjs.anyChar();
              ~~~~~~~~~~~~~~~


Found 3 errors.

> tsc -v
Version 3.5.1

> npm list parjs
[email protected] /home/ab/code/javascript/lambda
└── [email protected]

Releasing a new version is having challenges

Here is a sample of the current issues:

https://github.com/GregRos/parjs/actions/runs/7531667407/job/20500724390

The goal with this issue is to release the next version (1.0.0)

0.12.3 version was published without packaging

Related: #22

As per #22 (comment), it seems that 0.12.3 was incorrectly published and can't be used; downgrading to 0.12.2 fixes this issue.

Sorry for the issue duplication but I wanted to shed more light on the issue and bring this to your attention. Could you possibly pack and push a new 0.12.4 version? We'd be very grateful 🙇‍♂️

In the meantime, thank you for your hard work and for the awesome package! 🌟

test: assert parser positions after combinator failures

It's currently not clear when different combinators should "rewind" the parser state back after a failure.
This means it may be possible to accidentally break existing functionality when refactoring.

Let's fix this by adding test cases for these parser positions after failures.

gregros / parjs Goto Github PK

parjs's Issues

Problem

Reproduction

Discussion

minimum example:

Recommend Projects

Recommend Topics

Recommend Org