Code Monkey home page Code Monkey logo

sigma's Introduction

๐จ sigma

Build/Test Coverage NPM Supported Node Versions Bundlephobia Tree Shaking Semantic Release

TypeScript parser combinator library for building fast and convenient parsers.

Features

  • Capable of parsing LL grammars using recursive descent with backtracking.
  • Ergonomic API with excellent TypeScript support.
  • Zero dependencies. Supports tree shaking.
  • Performant enough to beat similar parser combinator libraries.

All-in-all, Sigma is easy to use and extend, reasonably fast and convenient, but a bit limited regarding what types of grammars it can parse.

Docs

You can find the documentation here. If you want to contribute, feel free to check out the source code.

Installation

Node

Just use your favorite package manager.

npm i @nrsk/sigma

Deno

You can import the library via Skypack (note the ?dts query parameter, this is to pull types):

import { ... } from 'https://cdn.skypack.dev/@nrsk/sigma?dts'
import { ... } from 'https://cdn.skypack.dev/@nrsk/sigma/parsers?dts'
import { ... } from 'https://cdn.skypack.dev/@nrsk/sigma/combinators?dts'

Example

Below is an example of parsing nested tuples like (1, 2, (3, 4)) into an AST.

Click to show the tuples example.
import { choice, map, optional, sepBy, sequence, takeMid } from '@nrsk/sigma/combinators'
import { defer, integer, run, string, whitespace } from '@nrsk/sigma/parsers'
import type { Span } from '@nrsk/sigma'

/* AST. */

interface NumberNode {
  type: 'number'
  span: Span
  value: number
}

interface ListNode {
  type: 'list'
  span: Span
  value: Array<NumberNode | ListNode>
}

/* Mapping functions to turn parsed string values into AST nodes. */

function toNumber(value: number, span: Span): NumberNode {
  return {
    type: 'number',
    span,
    value
  }
}

function toList(value: Array<NumberNode | ListNode>, span: Span): ListNode {
  return {
    type: 'list',
    span,
    value
  }
}

/* Parsers. */

const OpenParen = string('(')
const CloseParen = string(')')
const Space = optional(whitespace())
const Comma = sequence(Space, string(','), Space)

const TupleNumber = defer<NumberNode>()
const TupleList = defer<ListNode>()

TupleNumber.with(
  map(
    integer(),
    toNumber
  )
)

TupleList.with(
  map(
    takeMid(
      OpenParen,
      sepBy(choice(TupleList, TupleNumber), Comma),
      CloseParen
    ),
    toList
  )
)

Then we simply run the root parser, feeding it with text:

run(TupleList).with('(1, 2, (3, 4))')

And in the end we get the following output with the AST, which can then be manipulated if needed:

{
  isOk: true,
  span: [ 0, 14 ],
  pos: 14,
  value: {
    type: 'list',
    span: [ 0, 14 ],
    value: [
      { type: 'number', span: [ 1, 2 ], value: 1 },
      { type: 'number', span: [ 4, 5 ], value: 2 },
      {
        type: 'list',
        span: [ 7, 13 ],
        value: [
          { type: 'number', span: [ 8, 9 ], value: 3 },
          { type: 'number', span: [ 11, 12 ], value: 4 }
        ]
      }
    ]
  }
}

Development

Fork, clone, then instead of npm install run:

npm run install:all

Note

This will install dependencies for the package itself, and also for docs and benchmarks packages. This is due to limitations of the current repository setup and needed to avoid problems with eslint that runs on pre-commit hook.

This project follows the conventional commits spec and uses a slightly modified commitlint preset for automatic linting commits and generating changelog.

License

MIT.

sigma's People

Contributors

mindplay-dk avatar norskeld avatar semantic-release-bot avatar thefedaikin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

sigma's Issues

feat: make regexp parser throw an error if the global flag is missing

In this issue I've learnt that the g flag must be used in order to make regexp parser work correctly.

Should regexp parser check itself whether the g flag was used? It would be a minor improvement but it will prevent many mistakes. I think there is always a chance to forget adding this flag even if you know about this rule.

Maybe the paragraph in the documentation explaining this requirement should be emphasized to make this moment more obvious. But I don't think that documentation itself would be enough and this rule should be validated programmatically.

Adding this check somewhere would be a great addition:

if (!re.global) throw new Error('regexp parser must use g flag to process input correctly');

export function regexp(re: RegExp, expected: string): Parser<string> {

I can submit a PR myself if you agree with me on this point!

feat: find a match in free text, get consumed input and leftovers

I'm currently migrating parser from regular expressions based solution. I faced an issue that I couldn't just run parser over free text and I need to specify everything around this parser somehow. What is more one of the requirement that I have is to get the matched string out of the input (it is used to remove it from the free text later).

I'm currently using something like that to get the desired output:

import * as s from '@nrsk/sigma';

interface ExtractedResult<T> {
    consumed: string;
    rest: string;
    value: T;
}

export function extract<T>(input: string, targetParser: s.Parser<T>): ExtractedResult<T> {
    const wrappedParser = s.map(
        s.sequence(
            s.takeUntil(s.any(), targetParser),
            s.rest(),
        ),
        ([[before, value], after]) => ([
            before.join(''),
            value,
            after,
        ] as const)
    );

    const result = s.run(wrappedParser).with(input);
    if (result.isOk) {
        const [before, value, after] = result.value;
        const consumed = input.replace(before, '').replace(after, '');
        const rest = input.replace(consumed, '');

        return {
            value,
            consumed,
            rest,
        };
    } else {
        throw new Error(result.expected);
    }
};

What can be done instead?

While consumed and rest could be covered by span feature, there is stil a need to make parser work with additional input like regular expressions do:

const parser = freeText(string('hello world'));
const re = /hello world/;

run(parser).with('RANDOM TEXT hello world RANDOM TEXT') // result.isOk = true, value = 'hello world'
re.exec('RANDOM TEXT hello world RANDOM TEXT') // matches: ['hello world']

run(parser).with('hello world') // result.isOk = true, value = 'hello world'
re.exec('hello world') // matches: ['hello world']

feat: poc

Proof of concept implementation with basic combinators and capabilities.

feat: negative/positive look-ahead/behind, non-consuming surrounding context parsers, negation parsers

There are multiple things described in this issue, but I think it's better to keep them close.

I'm doing some free text parsing migrating from the regular expressions based solution. I faced an issue that I couldn't fully specify the surrounding context. when is the only parser that could work with context but documentation doesn't say anything whether it consumes input or not (I assume it does which raises a question how it is different from sequence). Also when allows to specify preceding context only and I'm not sure what should I use to specify the context after the target parser.

It would be nice to have non-consuming parser (context) and negation parser (not) so it would be possible to specify surrounding context e.g.

// context(before, target, optional after)

const helloWorld = string('hello world');
const strictHelloWorld = context(
    not(letter()),
    helloWorld,
    not(whole())
);

const parser1 = sequence(any(), strictHelloWorld, any());
const parser2 = sequence(any(), helloWorld, any());

run(parser1).with('1hello worldA'); // result.isOk = true, value = ['1', 'hello world', 'A'];
run(parser1).with('Ahello world1'); // result.isOk = false

run(parser2).with('1hello worldA'); // result.isOk = true, value = ['1', 'hello world', 'A'];
run(parser2).with('Ahello world1'); // result.isOk = true, value = ['A', 'hello world', '1'];

Also it would be nice to have some default non-consuming parsers such as wordBoundary. What is more this will require polyfilling JS \b since it's not unicode friendly.

More examples can be found in the codesandbox.

feat: spans

After working with some rust parser combinator libraries like chumsky, I feel like it would be really handy to provide capabilities to either produce spans by default, or allow to map with spans.

A span is simply a pair of numbers, a tuple like [start: number, end: number], which points to some range in the source code we are parsing or parsed. That is actually a must for quality error reporting and diagnostics.

docs(vitepress): automate sidebar construction

Right now, every time we want to add a new parser/combinator and write a documentation, we need to manually add an entry to the sidebar:

return [
Sidebar.group('Introduction', '/introduction', [
Sidebar.item('Getting started', '/getting-started')
]),
Sidebar.group('Guides', '/guides', [
Sidebar.item('Primitives and composites', '/primitives-and-composites')
]),
Sidebar.group('Combinators', '/combinators', [
Sidebar.item('chainl', '/chainl'),
Sidebar.item('choice', '/choice'),
Sidebar.item('error', '/error'),
Sidebar.item('many', '/many'),
Sidebar.item('many1', '/many1'),
Sidebar.item('map', '/map'),
Sidebar.item('mapTo', '/mapTo'),
Sidebar.item('optional', '/optional'),
Sidebar.item('sepBy', '/sepBy'),
Sidebar.item('sepBy1', '/sepBy1'),
Sidebar.item('sequence', '/sequence'),
Sidebar.item('skipUntil', '/skipUntil'),
Sidebar.item('takeLeft', '/takeLeft'),
Sidebar.item('takeMid', '/takeMid'),
Sidebar.item('takeRight', '/takeRight'),
Sidebar.item('takeSides', '/takeSides'),
Sidebar.item('takeUntil', '/takeUntil'),
Sidebar.item('when', '/when')
]),
Sidebar.group('Parsers', '/parsers', [
Sidebar.item('any', '/any'),
Sidebar.item('binary', '/binary'),
Sidebar.item('defer', '/defer'),
Sidebar.item('eof', '/eof'),
Sidebar.item('eol', '/eol'),
Sidebar.item('float', '/float'),
Sidebar.item('hex', '/hex'),
Sidebar.item('integer', '/integer'),
Sidebar.item('letter', '/letter'),
Sidebar.item('letters', '/letters'),
Sidebar.item('noneOf', '/noneOf'),
Sidebar.item('nothing', '/nothing'),
Sidebar.item('octal', '/octal'),
Sidebar.item('oneOf', '/oneOf'),
Sidebar.item('regexp', '/regexp'),
Sidebar.item('rest', '/rest'),
Sidebar.item('run', '/run'),
Sidebar.item('string', '/string'),
Sidebar.item('tryRun', '/tryRun'),
Sidebar.item('ustring', '/ustring'),
Sidebar.item('whitespace', '/whitespace'),
Sidebar.item('whole', '/whole')
])
]

This is at the very least inconvenient. Scanning specific directories and extracting titles from .md files' frontmatter would be enough, since the structure is pretty flat and simple.

feat: error recovery, handling and mapping

Right now there're no "errors" per se, i.e. all sigma provides users with is text messages and ability to re-map those messages to something custom. This is a shame and should be improved.

Implementation of spans in #34 should help a bit, but we will also need to extend Parser<T> signature with a second generic parameter E, so parsers could bear error type information. All parsers and combinators will be changed accordingly, although I'm pretty sure there'll be hurdles here and there.

Additionally, there should be added two combinators:

  • mapErr(parser, fn) - this combinator will map error from E to some other type using given fn.
  • mapOrElse(parser, okFn, errFn) - this combinator will conditionally apply okFn or errFn depending on the parser's result.

It also makes sense to implement error recovery along with the stuff above. Hopefully, it will be enough to provide a single combinator:

  • recovery(parser, fn) - this combinator takes a parser and a recovery function fn that should produce another parser; similar to when combinator, but it acts only on failures.

Making parsers named (adding name property to Parser<T>) wouldn't hurt as well.

docs: rewrite using VitePress

VitePress is still in alpha, but from what I've seen and actually played around, it's already okay and provides all essential and usefull stuff out-of-the-box.

Feature: error printer

What do you think about including a formatError function?

I wrote a function that takes the input string and a Failure instance, and prints out the surrounding +/- n lines with line-numbers etc.

I don't know if it's wise to include this in the library, since parser errors aren't the only type of errors you'd want to print - though, on the other hand, if the function was just something like formatError(input: string, offset: number, expectation: string), then this could be used to inject whatever expectation you want, to print out semantic errors and so on.

Not 100% sure if this is a good idea, as error formats can be quite different in different projects - but on the other hand, it might be nice to have something you can use just to get your project started and get something useful on the screen without doing a lot of ground work?

Up to you, but I do have something I could PR, if you'd like. ๐Ÿ™‚

sepBy mutates position even on no match

Describe the bug

When sepBy fails to match at all, it still updates the cursor position. This causes subsequent parsers to fail due to skipped input.

In contrast, many โ€“ which also always succeeds โ€“ does not update cursor position when it fails to match at all.

Reproduction

I can't figure out how to get past the pre-commit hooks, so here's a diff of the test and fix instead of a PR:

The test & fix
diff --git a/src/__tests__/combinators/sepBy.spec.ts b/src/__tests__/combinators/sepBy.spec.ts
index ca348b9..0cda6a1 100644
--- a/src/__tests__/combinators/sepBy.spec.ts
+++ b/src/__tests__/combinators/sepBy.spec.ts
@@ -1,4 +1,4 @@
-import { sepBy, sepBy1 } from '@combinators'
+import { sepBy, sepBy1, sequence } from '@combinators'
 import { string } from '@parsers'
 import { run, result, should, describe, it } from '@testing'

@@ -26,6 +26,17 @@ describe('sepBy', () => {

     should.matchState(actual, expected)
   })
+
+  it('should successfully continue if nothing matched', () => {
+    const parser = sequence(
+      sepBy(string('hello'), string('?')),
+      sepBy(string('bye'), string('?')),
+    )
+    const actual = run(parser, 'bye?bye?')
+    const expected = result(true, [[], ['bye', 'bye']])
+
+    should.matchState(actual, expected)
+  })
 })

 describe('sepBy1', () => {
diff --git a/src/combinators/sepBy.ts b/src/combinators/sepBy.ts
index 51b81b1..f42bc32 100644
--- a/src/combinators/sepBy.ts
+++ b/src/combinators/sepBy.ts
@@ -14,6 +14,7 @@ import type { Parser } from '@types'
 export function sepBy<T, S>(parser: Parser<T>, sep: Parser<S>): Parser<Array<T>> {
   return {
     parse(input, pos) {
       // Run the parser once to get the first value.
       const resultP = parser.parse(input, pos)

@@ -37,8 +38,8 @@ export function sepBy<T, S>(parser: Parser<T>, sep: Parser<S>): Parser<Array<T>>

       return {
         isOk: true,
-        span: [pos, resultP.pos],
-        pos: resultP.pos,
+        span: [pos, pos],
+        pos,
         value: []
       }
     }

And here's the test again for visibility, which fails on the current version (3.6.2):

  it('should successfully continue if nothing matched', () => {
    const parser = sequence(
      sepBy(string('hello'), string('?')),
      sepBy(string('bye'), string('?')),
    )
    const actual = run(parser, 'bye?bye?')
    const expected = result(true, [[], ['bye', 'bye']])

    should.matchState(actual, expected)
  })

test: rewrite and refactor to use uvu test runner

Rewrite tests to use uvu test runner instead of jest.
The provided uvu/assert assertions should be enough for our purposes, but it's worth looking for alternatives.

NB. Avoid pointless duplication. Use helpers, these can be used as a base for most cases:

export function testFailure<P extends () => Parser<unknown>>(input: string, parser: P) {
const actual = run(parser(), input)
const expected = result('failure', actual.kind === 'failure' ? actual.expected : actual.value)
should.matchState(actual, expected)
}
export function testSuccess<T, P extends () => Parser<unknown>>(
input: string,
value: T,
parser: P
) {
const actual = run(parser(), input)
const expected = result('success', value)
should.matchState(actual, expected)
}

feat: add common parsers

To avoid reinventing the wheel every time, several vital parsers could be implemented and provided out-of-the-box, such as:

  • whitespace and optional whitespace.
  • letter (single) and letters (multiple), probably should be added along with #3.
  • char could be an ASCII char code or Unicode code point probably.
  • float and int (both signed and unsigned) with support for scientific form.
  • ...

bug: `choice` incorrectly infers a type if given a spreaded array of parsers

When choice is given a spreaded array, e.g.:

choice(...['one', 'two'].map(string))

it incorrectly infers Parser<never> instead of Parser<string>. This is the problem with ToUnion type helper:

// Ok
type U1 = [Parser<string>, Parser<number>, Parser<boolean>]
type R1 = ToUnion<U1> // type R1 = string | number | boolean

// Wrong
type U2 = Array<Parser<string>>
type R2 = ToUnion<U2> // type R2 = never

This is why we need #48...

feat: optional spaces and whitespaces

What problem does this feature solve?

Currently the whitespace parser is strict and requires at least one character to be matched. There are many cases where I need to wrap it into optional parser e.g. the spaces before and after argument braces are optional.

function main() {...}
function main (){...}

Describe the solution

I think aligning this parser with other API would be a little bit more convenient by matching the current pairs such as many and many1, sepBy and sepBy1:

  • whitespaces1 - requires a single or multiple characters, works as the current whitespace
  • whitespaces - requires zero or more whitespaces to match, works as optional(whitespace)
  • (optional) wrapWhitespaces and wrapWhitespaces1 - that should prevent writing the common pattern of surrounding whitespaces over and over again:
sequence(
    functionKeyword,
    takeMid(
        optional(whitespace),
        functionName,
        optional(whitespace)
   )
)

/* could be turned into */

sequence(
    functionKeyword,
    wrapWhitespaces(functionName)
)

Postinstall fails on 3.6.3

Describe the bug

npm install
npm WARN EBADENGINE Unsupported engine {
npm WARN EBADENGINE   package: '@nrsk/[email protected]',
npm WARN EBADENGINE   required: { node: '>=18.16.0 <=20' },
npm WARN EBADENGINE   current: { node: 'v18.15.0', npm: '9.5.0' }
npm WARN EBADENGINE }
npm ERR! code 1
npm ERR! path /Users/redacted/project/node_modules/@nrsk/sigma
npm ERR! command failed
npm ERR! command sh -c npm run install:benchmarks && npm run install:docs
npm ERR! > @nrsk/[email protected] install:benchmarks
npm ERR! > cd benchmarks && npm i
npm ERR! sh: line 0: cd: benchmarks: No such file or directory

Repro steps

Install 3.6.3 locally. I originally discovered the bug on trying to deploy to render since it caught the latest version due to my package.json having ^3.6.2 for sigma.

System information

  • macOS Ventura 13.2.1
  • Shell: zsh 5.8.1

Docs: `run` and `tryRun` aren't parsers

I had trouble finding the run and tryRun functions, which are located in the "Parsers" section.

I would suggest adding a "Core" section at the top of the menu, and document the core functions of the library API there.

Probably moving the code itself into a "core" folder would make sense as well?

Let me know if you'd like a PR.

ustring() docs/questions

What is the ustring parser for? Assuming you load a valid unicode text file (which in this day and age is every text file) wouldn't it just match everything?

Best guess, this is for something like validating correct binary encoding of JSON files? But is it actually possible to load a non-valid unicode text file into a string in JavaScript?

I was expecting I'd use this for, say, keywords.

But then the "success" example in the documentation says:

Note that the index is 12, which is correct, since every hieroglyph here takes 3 bytes.

String operations in JS generally operate in ranges of code points:

image

So these numbers aren't useful for error reporting, or any subsequent string operation in JS really.

Text editors usually measure positions in code points as well:

image

The documentation itself explains:

This parser is very similar to the string parser, except it takes a bit hacky (though performant) approach, that is based on counting length of the given match string in bytes. It then subslices and compares string slice with that match string.

"hacky though performant", but it seems like this is doing a lot of unnecessary work to figure out a string position that isn't useful for most common use cases, like just matching a keyword or symbol, isn't it?

What I was expecting was a simpler parser that would use String.prototype.includes, which ought to be the fastest native way to check for a specific string at a specific offset, I think?

EDIT: oh, whoops, now I get it! I avoided string, because it it specifically says this will match "ASCII", which is incorrect. It would in fact match whatever Unicode characters you put in the string. Looks like a documentation problem.

But I don't see any other parser for simple strings - and nothing relevant in the codebase calling includes.

EDIT: looks like maybe there is room for a small optimization here to avoid copying.

It's also difficult to think of a name for such a parser, now that string is taken. ๐Ÿ˜…

(I know I'm submitting a lot of feedback! I am already somewhat invested in this lovely library, and I do want to help out - if you want me to submit PRs for anything, let me know.)

EDIT: let me know if you'd like me to correct the documentation and/or try the minor optimization/simplification with includes in the string parser.

Result model: span vs pos

Hey,

I was looking over these types:

/** Represents failed execution. */
export type Failure = {
readonly isOk: false
readonly span: Span
readonly pos: number
readonly expected: string
}
/** Represents successful execution. */
export type Success<T> = {
readonly isOk: true
readonly span: Span
readonly pos: number
readonly value: T
}

I see that spans were added on later.

Maybe I'm missing something, but I was wondering:

  1. In a Success, isn't pos always going to be identical to span[0]?
  2. In a Failure, isn't span[0] always going to be identical to span[1] as well as to pos?

A failure always happens at one specific position, does it not? When would the span mean anything?

And a success always has both a start and an end - even if these are identical, that would signify a zero-length match, so both values are always meaningful, right?

Any particular reason you wouldn't just deprecate or remove pos?

Or just have plain start and end properties for Success, and pos for Failure? Is there any practical advantage to having those tuples? maybe for source maps or something? Does it matter if those properties have the same names/types?

Just wondering.

This library looks amazing btw. ๐Ÿ˜„

Feature: grammar helper

What problem does this feature solve?

This is mainly for convenience - but it does solve the problem with the existing defer function, which relies initialization separate from creation. My proposed grammar helper would be statically type-checked, and wouldn't need an error-handler.

Describe the solution

If we look at the example for defer:

interface NumberNode {
type: 'number'
span: Span
value: number
}
interface ListNode {
type: 'list'
span: Span
value: Array<NumberNode | ListNode>
}
const TupleList = defer<ListNode>()
const TupleNumber = defer<NumberNode>()
TupleNumber.with(
map(
integer(),
(value, span) => ({ type: 'number', span, value })
)
)
TupleList.with(
map(
takeMid(
string('('),
sepBy(choice(TupleList, TupleNumber), string(',')),
string(')')
),
(value, span) => ({ type: 'list', span, value })
)
)

Here is that example implemented with the grammar helper:

  const tupleGrammar = grammar({
    tupleNumber(): Parser<NumberNode> {
      return map(
        integer(),
        (value, span) => ({ type: 'number', span, value })
      )
    },
    tupleList(): Parser<ListNode> {
      return map(
        takeMid(
          string('('),
          sepBy(choice(this.tupleList, this.tupleNumber), string(',')),
          string(')')
        ),
        (value, span) => ({ type: 'list', span, value })
      )
    },
  });

Here is a test demonstrating how to use the resulting grammar:

  is.equal(
    run(tupleGrammar.tupleList).with('(1,2,(3,(4,5)))'),
    {
      isOk: true,
      span: [ 0, 15 ],
      pos: 15,
      value: {
        type: 'list',
        span: [ 0, 15 ],
        value: [
          { type: 'number', span: [ 1, 2 ], value: 1 },
          { type: 'number', span: [ 3, 4 ], value: 2 },
          {
            type: 'list',
            span: [ 5, 14 ],
            value: [
              { type: 'number', span: [ 6, 7 ], value: 3 },
              {
                type: 'list',
                span: [ 8, 13 ],
                value: [
                  { type: 'number', span: [ 9, 10 ], value: 4 },
                  { type: 'number', span: [ 11, 12 ], value: 5 }
                ]
              }
            ]
          }
        ]
      }
    }
  );

And here is my preliminary implementation:

import { Parser } from "@nrsk/sigma";

type Grammar<T> = {
  [P in keyof T]: T[P] extends () => any ? ReturnType<T[P]> : never;
};

type GrammarInit<T> = T & ThisType<Grammar<T>>;

type GrammarType = {
  [name: string]: () => Parser<any>;
};

export function grammar<T extends GrammarType>(init: GrammarInit<T>): Grammar<T> {
  const grammar = {} as { [key: string]: Parser<any> };

  const initialized = {} as { [key: string]: true };
  
  for (const key in init) {
    grammar[key] = {
      parse(input, pos) {
        if (! initialized[key]) {
          initialized[key] = true;

          grammar[key] = (init[key] as any).apply(grammar);
        }
        
        return grammar[key].parse(input, pos);
      },
    } as Parser<any>;
  }

  return grammar as Grammar<T>;
}

Here is a screenshot demonstrating IDE support:

image

As you can see, this works with the circular references, which is possible with the magical ThisType in TS.

It's doing basically the same thing as defer for each member, so of course this works with circular references at run-time as well.

I didn't benchmark it against defer, and it might need some optimization, and the types could probably use a little work.

But what do you think, would you welcome a PR for this feature? ๐Ÿ™‚

Optimizations

I was curious to see how Sigma would stack up against other parsers - the biggest benchmark I know is Chevrotain's, so I added Sigma's JSON parser example to it:

mindplay-dk/chevrotain@d8fd236

image

although it is 4 times slower than Chevrotain, Sigma is definitely in the lead ๐Ÿ™‚

Chevrotain is the fastest JS parser library I know of, so it would probably be difficult to beat.

and of course, this is without making any attempts to optimize Sigma's implementation of the JSON parser at all.

I did a quick profile, and sequence looks like the biggest bottleneck at the moment:

image

it might be worth optimizing and benchmarking further - Chevrotain might be worth referencing for optimizations as well, I know the author put a lot of work into that.

as previously mentioned, performance is not the main reason I picked this library - but I do think it's important, and if there are any "easy wins", it might be worth while investigating this a bit further.

I might take a closer look at some point - just leaving this here for now. ๐Ÿ™‚

defer() should error?

Just wondering, I noticed this:

return {
isOk: false,
span: [pos, pos],
pos,
expected: `Deferred parser wasn't initialized.`
}

I'm not sure this makes sense as a parser error?

It's not that parsing failed - it's that there is something wrong with your code.

So I think maybe it would be more appropriate to throw an error here?

You might have had some sort of reason for this - it looks a little off to me, so I figured I'd ask. ๐Ÿ™‚

feat: number parsers

What should be done

  • Add a new parser: hexadecimal. Should parse a positive whole number in the hexadecimal system. The number should be prefixed with 0x or 0X. Ex: 0xDF, 0X1F.

  • Add a new parser: octal. Should parse a positive whole number in the octal system. The number should be prefixed with 0o or 0O. Ex: 0o230, 0O11.

  • Add a new parser: binary. Should parse a positive whole number in the binary system. The number should be prefixed with 0b or 0B. Ex: 0b1111, 0B1111.

  • Add a new parser: whole. Should parse a positive whole number in the decimal system. Ex: 0, 1, 2.

  • Rename int to integer. The same as whole, except it can be prefixed with a minus - sign.

Notes

  • All parsers return a parsed string, parsing into actual numbers should be done in the userland on demand.
  • If possible, do not use regular expressions for parsing numbers, do benchmarks. Regular expressions are probably faster than comparing chars one-by-one.
  • Do not use factory functions like here to reduce the number of calls.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.