build: investigate adopting `unbuild`

unbuild looks very similar to what I have in my custom build/compile script and the setup relying on Rollup in general, so it should be easy to add it to the project and get everything up and running.

Would be great if I could do the same with release script...

docs: automagically build sidebar from the `content` directory structure

Right now sidebar's implementation is kinda clunky and annoying in that it is hardcoded. It would be much better to build the sidebar from the content directory, mirroring its structure and leveraging entries' front-matter.

feat: context-aware parsing

Provide means for creating context-aware (chained) parsers.

test: rewrite and refactor to use uvu test runner

Rewrite tests to use uvu test runner instead of jest.
The provided uvu/assert assertions should be enough for our purposes, but it's worth looking for alternatives.

NB. Avoid pointless duplication. Use helpers, these can be used as a base for most cases:

sigma/tests/@helpers/index.ts

Lines 39 to 55 in e76249b

    
           export function testFailure<P extends () => Parser<unknown>>(input: string, parser: P) { 
        
             const actual = run(parser(), input) 
        
             const expected = result('failure', actual.kind === 'failure' ? actual.expected : actual.value) 
        
             should.matchState(actual, expected) 
        
           } 
        
           export function testSuccess<T, P extends () => Parser<unknown>>( 
        
             input: string, 
        
             value: T, 
        
             parser: P 
        
           ) { 
        
             const actual = run(parser(), input) 
        
             const expected = result('success', value) 
        
             should.matchState(actual, expected) 
        
           }

sepBy mutates position even on no match

Describe the bug

When sepBy fails to match at all, it still updates the cursor position. This causes subsequent parsers to fail due to skipped input.

In contrast, many – which also always succeeds – does not update cursor position when it fails to match at all.

Reproduction

I can't figure out how to get past the pre-commit hooks, so here's a diff of the test and fix instead of a PR:

The test & fix

diff --git a/src/__tests__/combinators/sepBy.spec.ts b/src/__tests__/combinators/sepBy.spec.ts
index ca348b9..0cda6a1 100644
--- a/src/__tests__/combinators/sepBy.spec.ts
+++ b/src/__tests__/combinators/sepBy.spec.ts
@@ -1,4 +1,4 @@
-import { sepBy, sepBy1 } from '@combinators'
+import { sepBy, sepBy1, sequence } from '@combinators'
 import { string } from '@parsers'
 import { run, result, should, describe, it } from '@testing'

@@ -26,6 +26,17 @@ describe('sepBy', () => {

     should.matchState(actual, expected)
   })
+
+  it('should successfully continue if nothing matched', () => {
+    const parser = sequence(
+      sepBy(string('hello'), string('?')),
+      sepBy(string('bye'), string('?')),
+    )
+    const actual = run(parser, 'bye?bye?')
+    const expected = result(true, [[], ['bye', 'bye']])
+
+    should.matchState(actual, expected)
+  })
 })

 describe('sepBy1', () => {
diff --git a/src/combinators/sepBy.ts b/src/combinators/sepBy.ts
index 51b81b1..f42bc32 100644
--- a/src/combinators/sepBy.ts
+++ b/src/combinators/sepBy.ts
@@ -14,6 +14,7 @@ import type { Parser } from '@types'
 export function sepBy<T, S>(parser: Parser<T>, sep: Parser<S>): Parser<Array<T>> {
   return {
     parse(input, pos) {
       // Run the parser once to get the first value.
       const resultP = parser.parse(input, pos)

@@ -37,8 +38,8 @@ export function sepBy<T, S>(parser: Parser<T>, sep: Parser<S>): Parser<Array<T>>

       return {
         isOk: true,
-        span: [pos, resultP.pos],
-        pos: resultP.pos,
+        span: [pos, pos],
+        pos,
         value: []
       }
     }

And here's the test again for visibility, which fails on the current version (3.6.2):

  it('should successfully continue if nothing matched', () => {
    const parser = sequence(
      sepBy(string('hello'), string('?')),
      sepBy(string('bye'), string('?')),
    )
    const actual = run(parser, 'bye?bye?')
    const expected = result(true, [[], ['bye', 'bye']])

    should.matchState(actual, expected)
  })

test: test types with `expect-type`

Test utility and other types with expect-type, because we have tricky ones, with recursion and conditionals. It's (almost) never a bad thing to write a couple more tests.

docs: explain how to write custom parsers and combinators

Explain how to write custom parsers and combinators.

feat: error recovery

Provide means to recover from errors.

feat: add deno bundle

Maybe. Probably.

feat: spans

After working with some rust parser combinator libraries like chumsky, I feel like it would be really handy to provide capabilities to either produce spans by default, or allow to map with spans.

A span is simply a pair of numbers, a tuple like [start: number, end: number], which points to some range in the source code we are parsing or parsed. That is actually a must for quality error reporting and diagnostics.

Docs: `run` and `tryRun` aren't parsers

I had trouble finding the run and tryRun functions, which are located in the "Parsers" section.

I would suggest adding a "Core" section at the top of the menu, and document the core functions of the library API there.

Probably moving the code itself into a "core" folder would make sense as well?

Let me know if you'd like a PR.

docs: rewrite using VitePress

VitePress is still in alpha, but from what I've seen and actually played around, it's already okay and provides all essential and usefull stuff out-of-the-box.

Feature: grammar helper

What problem does this feature solve?

This is mainly for convenience - but it does solve the problem with the existing defer function, which relies initialization separate from creation. My proposed grammar helper would be statically type-checked, and wouldn't need an error-handler.

Describe the solution

If we look at the example for defer:

sigma/docs/docs/content/parsers/defer.md

Lines 33 to 64 in 0d96737

    
           interface NumberNode { 
        
             type: 'number' 
        
             span: Span 
        
             value: number 
        
           } 
        
           interface ListNode { 
        
             type: 'list' 
        
             span: Span 
        
             value: Array<NumberNode | ListNode> 
        
           } 
        
           const TupleList = defer<ListNode>() 
        
           const TupleNumber = defer<NumberNode>() 
        
           TupleNumber.with( 
        
             map( 
        
               integer(), 
        
               (value, span) => ({ type: 'number', span, value }) 
        
             ) 
        
           ) 
        
           TupleList.with( 
        
             map( 
        
               takeMid( 
        
                 string('('), 
        
                 sepBy(choice(TupleList, TupleNumber), string(',')), 
        
                 string(')') 
        
               ), 
        
               (value, span) => ({ type: 'list', span, value }) 
        
             ) 
        
           )

Here is that example implemented with the grammar helper:

  const tupleGrammar = grammar({
    tupleNumber(): Parser<NumberNode> {
      return map(
        integer(),
        (value, span) => ({ type: 'number', span, value })
      )
    },
    tupleList(): Parser<ListNode> {
      return map(
        takeMid(
          string('('),
          sepBy(choice(this.tupleList, this.tupleNumber), string(',')),
          string(')')
        ),
        (value, span) => ({ type: 'list', span, value })
      )
    },
  });

Here is a test demonstrating how to use the resulting grammar:

  is.equal(
    run(tupleGrammar.tupleList).with('(1,2,(3,(4,5)))'),
    {
      isOk: true,
      span: [ 0, 15 ],
      pos: 15,
      value: {
        type: 'list',
        span: [ 0, 15 ],
        value: [
          { type: 'number', span: [ 1, 2 ], value: 1 },
          { type: 'number', span: [ 3, 4 ], value: 2 },
          {
            type: 'list',
            span: [ 5, 14 ],
            value: [
              { type: 'number', span: [ 6, 7 ], value: 3 },
              {
                type: 'list',
                span: [ 8, 13 ],
                value: [
                  { type: 'number', span: [ 9, 10 ], value: 4 },
                  { type: 'number', span: [ 11, 12 ], value: 5 }
                ]
              }
            ]
          }
        ]
      }
    }
  );

And here is my preliminary implementation:

import { Parser } from "@nrsk/sigma";

type Grammar<T> = {
  [P in keyof T]: T[P] extends () => any ? ReturnType<T[P]> : never;
};

type GrammarInit<T> = T & ThisType<Grammar<T>>;

type GrammarType = {
  [name: string]: () => Parser<any>;
};

export function grammar<T extends GrammarType>(init: GrammarInit<T>): Grammar<T> {
  const grammar = {} as { [key: string]: Parser<any> };

  const initialized = {} as { [key: string]: true };
  
  for (const key in init) {
    grammar[key] = {
      parse(input, pos) {
        if (! initialized[key]) {
          initialized[key] = true;

          grammar[key] = (init[key] as any).apply(grammar);
        }
        
        return grammar[key].parse(input, pos);
      },
    } as Parser<any>;
  }

  return grammar as Grammar<T>;
}

Here is a screenshot demonstrating IDE support:

As you can see, this works with the circular references, which is possible with the magical ThisType in TS.

It's doing basically the same thing as defer for each member, so of course this works with circular references at run-time as well.

I didn't benchmark it against defer, and it might need some optimization, and the types could probably use a little work.

But what do you think, would you welcome a PR for this feature? 🙂

Dependents

What problem does this feature solve?

sigma is so well written, feature packed, and to a high level of perfection.
How come that no one uses it?
https://www.npmjs.com/package/@nrsk/sigma

feat: optional spaces and whitespaces

What problem does this feature solve?

Currently the whitespace parser is strict and requires at least one character to be matched. There are many cases where I need to wrap it into optional parser e.g. the spaces before and after argument braces are optional.

function main() {...}
function main (){...}

Describe the solution

I think aligning this parser with other API would be a little bit more convenient by matching the current pairs such as many and many1, sepBy and sepBy1:

whitespaces1 - requires a single or multiple characters, works as the current whitespace
whitespaces - requires zero or more whitespaces to match, works as optional(whitespace)
(optional) wrapWhitespaces and wrapWhitespaces1 - that should prevent writing the common pattern of surrounding whitespaces over and over again:

sequence(
    functionKeyword,
    takeMid(
        optional(whitespace),
        functionName,
        optional(whitespace)
   )
)

/* could be turned into */

sequence(
    functionKeyword,
    wrapWhitespaces(functionName)
)

docs(vitepress): explore automatic code snippet imports for signatures

Right now every signature for every parser/combinator is copy-pasted from the source code. This could be "automated" to some extent by leveraging code snippet imports in VitePress.

Optimizations

I was curious to see how Sigma would stack up against other parsers - the biggest benchmark I know is Chevrotain's, so I added Sigma's JSON parser example to it:

mindplay-dk/chevrotain@d8fd236

although it is 4 times slower than Chevrotain, Sigma is definitely in the lead 🙂

Chevrotain is the fastest JS parser library I know of, so it would probably be difficult to beat.

and of course, this is without making any attempts to optimize Sigma's implementation of the JSON parser at all.

I did a quick profile, and sequence looks like the biggest bottleneck at the moment:

it might be worth optimizing and benchmarking further - Chevrotain might be worth referencing for optimizations as well, I know the author put a lot of work into that.

as previously mentioned, performance is not the main reason I picked this library - but I do think it's important, and if there are any "easy wins", it might be worth while investigating this a bit further.

I might take a closer look at some point - just leaving this here for now. 🙂

feat: add common parsers

To avoid reinventing the wheel every time, several vital parsers could be implemented and provided out-of-the-box, such as:

whitespace and optional whitespace.
letter (single) and letters (multiple), probably should be added along with #3.
char could be an ASCII char code or Unicode code point probably.
float and int (both signed and unsigned) with support for scientific form.
...

feat: negative/positive look-ahead/behind, non-consuming surrounding context parsers, negation parsers

There are multiple things described in this issue, but I think it's better to keep them close.

I'm doing some free text parsing migrating from the regular expressions based solution. I faced an issue that I couldn't fully specify the surrounding context. when is the only parser that could work with context but documentation doesn't say anything whether it consumes input or not (I assume it does which raises a question how it is different from sequence). Also when allows to specify preceding context only and I'm not sure what should I use to specify the context after the target parser.

It would be nice to have non-consuming parser (context) and negation parser (not) so it would be possible to specify surrounding context e.g.

// context(before, target, optional after)

const helloWorld = string('hello world');
const strictHelloWorld = context(
    not(letter()),
    helloWorld,
    not(whole())
);

const parser1 = sequence(any(), strictHelloWorld, any());
const parser2 = sequence(any(), helloWorld, any());

run(parser1).with('1hello worldA'); // result.isOk = true, value = ['1', 'hello world', 'A'];
run(parser1).with('Ahello world1'); // result.isOk = false

run(parser2).with('1hello worldA'); // result.isOk = true, value = ['1', 'hello world', 'A'];
run(parser2).with('Ahello world1'); // result.isOk = true, value = ['A', 'hello world', '1'];

Also it would be nice to have some default non-consuming parsers such as wordBoundary. What is more this will require polyfilling JS \b since it's not unicode friendly.

More examples can be found in the codesandbox.

test: investigate adopting `vitest` instead of `uvu`

feat: add unicode support

Pretty self-explanatory.

feat: find a match in free text, get consumed input and leftovers

I'm currently migrating parser from regular expressions based solution. I faced an issue that I couldn't just run parser over free text and I need to specify everything around this parser somehow. What is more one of the requirement that I have is to get the matched string out of the input (it is used to remove it from the free text later).

I'm currently using something like that to get the desired output:

import * as s from '@nrsk/sigma';

interface ExtractedResult<T> {
    consumed: string;
    rest: string;
    value: T;
}

export function extract<T>(input: string, targetParser: s.Parser<T>): ExtractedResult<T> {
    const wrappedParser = s.map(
        s.sequence(
            s.takeUntil(s.any(), targetParser),
            s.rest(),
        ),
        ([[before, value], after]) => ([
            before.join(''),
            value,
            after,
        ] as const)
    );

    const result = s.run(wrappedParser).with(input);
    if (result.isOk) {
        const [before, value, after] = result.value;
        const consumed = input.replace(before, '').replace(after, '');
        const rest = input.replace(consumed, '');

        return {
            value,
            consumed,
            rest,
        };
    } else {
        throw new Error(result.expected);
    }
};

What can be done instead?

While consumed and rest could be covered by span feature, there is stil a need to make parser work with additional input like regular expressions do:

const parser = freeText(string('hello world'));
const re = /hello world/;

run(parser).with('RANDOM TEXT hello world RANDOM TEXT') // result.isOk = true, value = 'hello world'
re.exec('RANDOM TEXT hello world RANDOM TEXT') // matches: ['hello world']

run(parser).with('hello world') // result.isOk = true, value = 'hello world'
re.exec('hello world') // matches: ['hello world']

docs(vitepress): automate sidebar construction

Right now, every time we want to add a new parser/combinator and write a documentation, we need to manually add an entry to the sidebar:

sigma/docs/content/.vitepress/config.ts

Lines 135 to 186 in 9dc2b1c

    
           return [ 
        
             Sidebar.group('Introduction', '/introduction', [ 
        
               Sidebar.item('Getting started', '/getting-started') 
        
             ]), 
        
             Sidebar.group('Guides', '/guides', [ 
        
               Sidebar.item('Primitives and composites', '/primitives-and-composites') 
        
             ]), 
        
             Sidebar.group('Combinators', '/combinators', [ 
        
               Sidebar.item('chainl', '/chainl'), 
        
               Sidebar.item('choice', '/choice'), 
        
               Sidebar.item('error', '/error'), 
        
               Sidebar.item('many', '/many'), 
        
               Sidebar.item('many1', '/many1'), 
        
               Sidebar.item('map', '/map'), 
        
               Sidebar.item('mapTo', '/mapTo'), 
        
               Sidebar.item('optional', '/optional'), 
        
               Sidebar.item('sepBy', '/sepBy'), 
        
               Sidebar.item('sepBy1', '/sepBy1'), 
        
               Sidebar.item('sequence', '/sequence'), 
        
               Sidebar.item('skipUntil', '/skipUntil'), 
        
               Sidebar.item('takeLeft', '/takeLeft'), 
        
               Sidebar.item('takeMid', '/takeMid'), 
        
               Sidebar.item('takeRight', '/takeRight'), 
        
               Sidebar.item('takeSides', '/takeSides'), 
        
               Sidebar.item('takeUntil', '/takeUntil'), 
        
               Sidebar.item('when', '/when') 
        
             ]), 
        
             Sidebar.group('Parsers', '/parsers', [ 
        
               Sidebar.item('any', '/any'), 
        
               Sidebar.item('binary', '/binary'), 
        
               Sidebar.item('defer', '/defer'), 
        
               Sidebar.item('eof', '/eof'), 
        
               Sidebar.item('eol', '/eol'), 
        
               Sidebar.item('float', '/float'), 
        
               Sidebar.item('hex', '/hex'), 
        
               Sidebar.item('integer', '/integer'), 
        
               Sidebar.item('letter', '/letter'), 
        
               Sidebar.item('letters', '/letters'), 
        
               Sidebar.item('noneOf', '/noneOf'), 
        
               Sidebar.item('nothing', '/nothing'), 
        
               Sidebar.item('octal', '/octal'), 
        
               Sidebar.item('oneOf', '/oneOf'), 
        
               Sidebar.item('regexp', '/regexp'), 
        
               Sidebar.item('rest', '/rest'), 
        
               Sidebar.item('run', '/run'), 
        
               Sidebar.item('string', '/string'), 
        
               Sidebar.item('tryRun', '/tryRun'), 
        
               Sidebar.item('ustring', '/ustring'), 
        
               Sidebar.item('whitespace', '/whitespace'), 
        
               Sidebar.item('whole', '/whole') 
        
             ]) 
        
           ]

This is at the very least inconvenient. Scanning specific directories and extracting titles from .md files' frontmatter would be enough, since the structure is pretty flat and simple.

feat(combinators/not): add `not` combinator

docs: tsdoc annotations

TSDoc annotations for better DX.

bug: `choice` incorrectly infers a type if given a spreaded array of parsers

When choice is given a spreaded array, e.g.:

choice(...['one', 'two'].map(string))

it incorrectly infers Parser<never> instead of Parser<string>. This is the problem with ToUnion type helper:

// Ok
type U1 = [Parser<string>, Parser<number>, Parser<boolean>]
type R1 = ToUnion<U1> // type R1 = string | number | boolean

// Wrong
type U2 = Array<Parser<string>>
type R2 = ToUnion<U2> // type R2 = never

This is why we need #48...

Result model: span vs pos

Hey,

I was looking over these types:

sigma/src/types/library.ts

Lines 22 to 36 in 261f2f8

    
           /** Represents failed execution. */ 
        
           export type Failure = { 
        
             readonly isOk: false 
        
             readonly span: Span 
        
             readonly pos: number 
        
             readonly expected: string 
        
           } 
        
           /** Represents successful execution. */ 
        
           export type Success<T> = { 
        
             readonly isOk: true 
        
             readonly span: Span 
        
             readonly pos: number 
        
             readonly value: T 
        
           }

I see that spans were added on later.

Maybe I'm missing something, but I was wondering:

In a Success, isn't pos always going to be identical to span[0]?
In a Failure, isn't span[0] always going to be identical to span[1] as well as to pos?

A failure always happens at one specific position, does it not? When would the span mean anything?

And a success always has both a start and an end - even if these are identical, that would signify a zero-length match, so both values are always meaningful, right?

Any particular reason you wouldn't just deprecate or remove pos?

Or just have plain start and end properties for Success, and pos for Failure? Is there any practical advantage to having those tuples? maybe for source maps or something? Does it matter if those properties have the same names/types?

Just wondering.

This library looks amazing btw. 😄

ustring() docs/questions

What is the ustring parser for? Assuming you load a valid unicode text file (which in this day and age is every text file) wouldn't it just match everything?

Best guess, this is for something like validating correct binary encoding of JSON files? But is it actually possible to load a non-valid unicode text file into a string in JavaScript?

I was expecting I'd use this for, say, keywords.

But then the "success" example in the documentation says:

Note that the index is 12, which is correct, since every hieroglyph here takes 3 bytes.

String operations in JS generally operate in ranges of code points:

So these numbers aren't useful for error reporting, or any subsequent string operation in JS really.

Text editors usually measure positions in code points as well:

The documentation itself explains:

This parser is very similar to the string parser, except it takes a bit hacky (though performant) approach, that is based on counting length of the given match string in bytes. It then subslices and compares string slice with that match string.

"hacky though performant", but it seems like this is doing a lot of unnecessary work to figure out a string position that isn't useful for most common use cases, like just matching a keyword or symbol, isn't it?

~~What I was expecting was a simpler parser that would use String.prototype.includes, which ought to be the fastest native way to check for a specific string at a specific offset, I think?~~

EDIT: oh, whoops, now I get it! I avoided string, because it it specifically says this will match "ASCII", which is incorrect. It would in fact match whatever Unicode characters you put in the string. Looks like a documentation problem.

~~But I don't see any other parser for simple strings - and nothing relevant in the codebase calling includes.~~

EDIT: looks like maybe there is room for a small optimization here to avoid copying.

~~It's also difficult to think of a name for such a parser, now that string is taken.~~ 😅

(I know I'm submitting a lot of feedback! I am already somewhat invested in this lovely library, and I do want to help out - if you want me to submit PRs for anything, let me know.)

EDIT: let me know if you'd like me to correct the documentation and/or try the minor optimization/simplification with includes in the string parser.

feat: error recovery, handling and mapping

Right now there're no "errors" per se, i.e. all sigma provides users with is text messages and ability to re-map those messages to something custom. This is a shame and should be improved.

Implementation of spans in #34 should help a bit, but we will also need to extend Parser<T> signature with a second generic parameter E, so parsers could bear error type information. All parsers and combinators will be changed accordingly, although I'm pretty sure there'll be hurdles here and there.

Additionally, there should be added two combinators:

mapErr(parser, fn) - this combinator will map error from E to some other type using given fn.
mapOrElse(parser, okFn, errFn) - this combinator will conditionally apply okFn or errFn depending on the parser's result.

It also makes sense to implement error recovery along with the stuff above. Hopefully, it will be enough to provide a single combinator:

recovery(parser, fn) - this combinator takes a parser and a recovery function fn that should produce another parser; similar to when combinator, but it acts only on failures.

Making parsers named (adding name property to Parser<T>) wouldn't hurt as well.

refactor: stabilize public API & restructure sources

Get rid of aliases for combinators and parsers.
Rename some combinators to match parsec (or parser-combinators), and purescript-parsing conventions.
Restructure sources and lift everything up from the internal directory.
Adjust exports and re-export everything in the package's entry point (index.ts).

feat: make regexp parser throw an error if the global flag is missing

In this issue I've learnt that the g flag must be used in order to make regexp parser work correctly.

Should regexp parser check itself whether the g flag was used? It would be a minor improvement but it will prevent many mistakes. I think there is always a chance to forget adding this flag even if you know about this rule.

Maybe the paragraph in the documentation explaining this requirement should be emphasized to make this moment more obvious. But I don't think that documentation itself would be enough and this rule should be validated programmatically.

Adding this check somewhere would be a great addition:

if (!re.global) throw new Error('regexp parser must use g flag to process input correctly');

sigma/src/parsers/regexp.ts

Line 17 in 567b6f9

export function regexp(re: RegExp, expected: string): Parser<string> {

I can submit a PR myself if you agree with me on this point!

Postinstall fails on 3.6.3

Describe the bug

npm install
npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE   package: '@nrsk/[email protected]',
npm WARN EBADENGINE   required: { node: '>=18.16.0 <=20' },
npm WARN EBADENGINE   current: { node: 'v18.15.0', npm: '9.5.0' }
npm WARN EBADENGINE }
npm ERR! code 1
npm ERR! path /Users/redacted/project/node_modules/@nrsk/sigma
npm ERR! command failed
npm ERR! command sh -c npm run install:benchmarks && npm run install:docs
npm ERR! > @nrsk/[email protected] install:benchmarks
npm ERR! > cd benchmarks && npm i
npm ERR! sh: line 0: cd: benchmarks: No such file or directory

Repro steps

Install 3.6.3 locally. I originally discovered the bug on trying to deploy to render since it caught the latest version due to my package.json having ^3.6.2 for sigma.

System information

macOS Ventura 13.2.1
Shell: zsh 5.8.1

perf: explore further performance improvements

Explore memoizing techniques.
Explore LL edge cases.

docs(types): thoroughly document user-facing types

Library types:

Failure and Success, as mentioned in #82;
Parser<T> union variants.

Check utility types' docs.

docs: add `consuming` and `non-consuming` labels/badges

Something similar to primitive and composite badges. These should be added to tsdocs as well.

perf: add benchmarks

Add benchmarks and comparison with other similar libraries.

docs: standalone documentation

Standalone documentation built with any SSG that will be deployed with Vercel.
Create a subdomain: https://sigma.vm.codes

feat: add esm bundle

Reject ancient tech, embrace modern stuff.

feat: poc

Proof of concept implementation with basic combinators and capabilities.

feat: throwing runner

Add tryRun helper, that will, unlike run, throw in case of failure.

defer() should error?

Just wondering, I noticed this:

sigma/src/parsers/defer.ts

Lines 72 to 77 in 0d96737

    
           return { 
        
             isOk: false, 
        
             span: [pos, pos], 
        
             pos, 
        
             expected: `Deferred parser wasn't initialized.` 
        
           }

I'm not sure this makes sense as a parser error?

It's not that parsing failed - it's that there is something wrong with your code.

So I think maybe it would be more appropriate to throw an error here?

You might have had some sort of reason for this - it looks a little off to me, so I figured I'd ask. 🙂

Feature: error printer

What do you think about including a formatError function?

I wrote a function that takes the input string and a Failure instance, and prints out the surrounding +/- n lines with line-numbers etc.

I don't know if it's wise to include this in the library, since parser errors aren't the only type of errors you'd want to print - though, on the other hand, if the function was just something like formatError(input: string, offset: number, expectation: string), then this could be used to inject whatever expectation you want, to print out semantic errors and so on.

Not 100% sure if this is a good idea, as error formats can be quite different in different projects - but on the other hand, it might be nice to have something you can use just to get your project started and get something useful on the screen without doing a lot of ground work?

Up to you, but I do have something I could PR, if you'd like. 🙂

feat: number parsers

What should be done

Add a new parser: hexadecimal. Should parse a positive whole number in the hexadecimal system. The number should be prefixed with 0x or 0X. Ex: 0xDF, 0X1F.
Add a new parser: octal. Should parse a positive whole number in the octal system. The number should be prefixed with 0o or 0O. Ex: 0o230, 0O11.
Add a new parser: binary. Should parse a positive whole number in the binary system. The number should be prefixed with 0b or 0B. Ex: 0b1111, 0B1111.
Add a new parser: whole. Should parse a positive whole number in the decimal system. Ex: 0, 1, 2.
Rename int to integer. The same as whole, except it can be prefixed with a minus - sign.

Notes

All parsers return a parsed string, parsing into actual numbers should be done in the userland on demand.
If possible, do not use regular expressions for parsing numbers, do benchmarks. Regular expressions are probably faster than comparing chars one-by-one.
Do not use factory functions like here to reduce the number of calls.

	export function testFailure<P extends () => Parser<unknown>>(input: string, parser: P) {
	const actual = run(parser(), input)
	const expected = result('failure', actual.kind === 'failure' ? actual.expected : actual.value)

	should.matchState(actual, expected)
	}

	export function testSuccess<T, P extends () => Parser<unknown>>(
	input: string,
	value: T,
	parser: P
	) {
	const actual = run(parser(), input)
	const expected = result('success', value)

	should.matchState(actual, expected)
	}

	interface NumberNode {
	type: 'number'
	span: Span
	value: number
	}

	interface ListNode {
	type: 'list'
	span: Span
	value: Array<NumberNode \| ListNode>
	}

	const TupleList = defer<ListNode>()
	const TupleNumber = defer<NumberNode>()

	TupleNumber.with(
	map(
	integer(),
	(value, span) => ({ type: 'number', span, value })
	)
	)

	TupleList.with(
	map(
	takeMid(
	string('('),
	sepBy(choice(TupleList, TupleNumber), string(',')),
	string(')')
	),
	(value, span) => ({ type: 'list', span, value })
	)
	)

	return [
	Sidebar.group('Introduction', '/introduction', [
	Sidebar.item('Getting started', '/getting-started')
	]),
	Sidebar.group('Guides', '/guides', [
	Sidebar.item('Primitives and composites', '/primitives-and-composites')
	]),
	Sidebar.group('Combinators', '/combinators', [
	Sidebar.item('chainl', '/chainl'),
	Sidebar.item('choice', '/choice'),
	Sidebar.item('error', '/error'),
	Sidebar.item('many', '/many'),
	Sidebar.item('many1', '/many1'),
	Sidebar.item('map', '/map'),
	Sidebar.item('mapTo', '/mapTo'),
	Sidebar.item('optional', '/optional'),
	Sidebar.item('sepBy', '/sepBy'),
	Sidebar.item('sepBy1', '/sepBy1'),
	Sidebar.item('sequence', '/sequence'),
	Sidebar.item('skipUntil', '/skipUntil'),
	Sidebar.item('takeLeft', '/takeLeft'),
	Sidebar.item('takeMid', '/takeMid'),
	Sidebar.item('takeRight', '/takeRight'),
	Sidebar.item('takeSides', '/takeSides'),
	Sidebar.item('takeUntil', '/takeUntil'),
	Sidebar.item('when', '/when')
	]),
	Sidebar.group('Parsers', '/parsers', [
	Sidebar.item('any', '/any'),
	Sidebar.item('binary', '/binary'),
	Sidebar.item('defer', '/defer'),
	Sidebar.item('eof', '/eof'),
	Sidebar.item('eol', '/eol'),
	Sidebar.item('float', '/float'),
	Sidebar.item('hex', '/hex'),
	Sidebar.item('integer', '/integer'),
	Sidebar.item('letter', '/letter'),
	Sidebar.item('letters', '/letters'),
	Sidebar.item('noneOf', '/noneOf'),
	Sidebar.item('nothing', '/nothing'),
	Sidebar.item('octal', '/octal'),
	Sidebar.item('oneOf', '/oneOf'),
	Sidebar.item('regexp', '/regexp'),
	Sidebar.item('rest', '/rest'),
	Sidebar.item('run', '/run'),
	Sidebar.item('string', '/string'),
	Sidebar.item('tryRun', '/tryRun'),
	Sidebar.item('ustring', '/ustring'),
	Sidebar.item('whitespace', '/whitespace'),
	Sidebar.item('whole', '/whole')
	])
	]

	/** Represents failed execution. */
	export type Failure = {
	readonly isOk: false
	readonly span: Span
	readonly pos: number
	readonly expected: string
	}

	/** Represents successful execution. */
	export type Success<T> = {
	readonly isOk: true
	readonly span: Span
	readonly pos: number
	readonly value: T
	}

	return {
	isOk: false,
	span: [pos, pos],
	pos,
	expected: `Deferred parser wasn't initialized.`
	}

norskeld / sigma Goto Github PK

sigma's Issues

Describe the bug

Reproduction

What problem does this feature solve?

Describe the solution

What problem does this feature solve?

What problem does this feature solve?

Describe the solution

What can be done instead?

Describe the bug

Repro steps

System information

What should be done

Notes

Recommend Projects

Recommend Topics

Recommend Org