jazzleware / jazzle-parser Goto Github PK

relatively small and ridiculously fast parser for all versions of ECMAScript/Javascript, written in the greatest common divisor of all versions of ECMAScript

License: MIT License

JavaScript 100.00%

parser performance alternative esprima-alternative embedded fast low-power acorn-alternative ecmascript ecmascript-syntax

jazzle-parser's Introduction

jazzle (a.k.a jsRube)

A small, simple, and ridiculously fast parser for all versions of ECMAScript/Javascript, written in plain ECMAScript3, on which I have been working on and off since September 2015, under codename 'lube'.

A bug in v8 (and consequently in node) made it very difficult to run on node versions 5 and below. The bug has been resolved, and it now runs smoothly (and fater than any other parser I know of) in node v6.2.0+. Please bear this notice in mind while trying to use this parser.

#Features It always records the location data, range data, and raw value of every node, and still it parses jQuery-1.4.2 2x or 3.5x faster than esprima 2.7.2, depending, respectively, on whether the latter doesn't record the location/ranges or it does. Funnily enough, it does all the above while keeping track of as much early errors as I could find in the spec.

It is almost completely esprima-compatible (except when things get annoying, in which case it is acorn-compatible).

#Future

cleaner source
tolerant parsing
even lighter weight
descriptive errors
more comments
finer grained control over parsing (via more options, possibly)
a demo website
standalone regex verifier (currently, regex verification is accomplished by means of the underlying engine's RegExp constructor, which, while not a defendable approach, is the most straightforward; needless to mention, it's also currently the sole approach)

#Using in the browser Include the file ./dist/jazzle.js in a <script> tag. It exposes the Parser constructor, and parse utility function. One use case could be:

var code = 'sample(code);';
var result;

result = new Parser(code, false).parseProgram();

// or alternatively
result = parse(code, false)

NOTE in ES versions before ES2015, any given source was treated as a 'script'; in ES2015 and above, this is not the case anymore -- sources can be parsed as scripts and as modules. You have to explicitly tell the parser if you want it to parse your code as a 'module' rather than a 'script' by sending the value true as the second argument to the Parser constructor or the parse method:

var code = 'import * as a from "l"';
// please note the `true` there; it tells the parser to treat the code as module code;
// because `import`s are module-specific source elements, 
var result = new Parser(code, /*-->*/true/*<--*/).parseProgram();

#Building In jazzle repository's root, run the build script, i.e., ./builder/run.js;

node ./builder/run.js

It bundles the sources under the 'src' directory in to a single file, to be found under dist/jazzle.js. It also runs a self-test after bundling is complete; the parser should only be used if the test stage passes without any errors.

#Quick Testing Even though a thorough test is performed during the build process (that is, while building via ./builder/run.js), quick tests occasionally come in handy. To run quick tests, do:

node ./test/run.js

#Benchmarking Before beginning to run a benchmark, make sure you have 'esprima', 'acorn', and 'benchmark' packages installed; if it is not the case, install them this way:

npm install esprima@latest
npm install acorn@latest
npm install benchmark@latest

Then run the actual benchmarking facility this way:

node ./bench/run.js

This will feed the corpus located under sources into each parser, asks them to parse each file while recording node location data, collects the timings for each parser, and prints the results.

#Using jazzle via npm First,

npm install jazzle

Then:

var jazzle = require( 'jazzle' );
console.log( jazzle.parse('var v = "hi !";') );

jazzle-parser's People

Contributors

Stargazers

Watchers

Forkers

zetlen gitter-badger

jazzle-parser's Issues

send as few arguments to `err` as possible

this.err currently receives more info via errParams than the baseline error reporter needs.
this was for the upcoming tolerant mode, but tolerant subsystem is going to get implemented in a totally new way so sending those extra params will no longer be necessary.

Typo at `src/[email protected]#366`

While working on the 0.6 version I noticed what I think is a typo at src/[email protected]#366:

this.err('switch.has.no.opening.curly', startc, stratLoc));

I suppose that the third argument should be startLoc instead of stratLoc.

is there a more straightforward (and possibly more lightweight) approach for tracking the so-called "tricky" nodes?

Hi
Before we begin, let's go over a few definitions.

A "potpat" is a node that has the potential of becoming an assignment- and/or a binding-pattern -- these nodes areMemberExpression, AssignmentExpression, ObjectExpression, Property (almost), Identifier, and ArrayExpression; please not though, that while [a] is a potpat, -[a] is not, because it can't appear at left of an assignment.

A "parpos" node is a node that can be an arrow parameter. in ([a, { b = [[e], l] = 12}]), a and b are parpos nodes; in -([a]), there is no parpos, because the - behind the ( will make it impossible for the paren to serve as the parameter list of an arrow function.

A "tricky" situation (as I have no other name for it) is, broadly, a situation in which the node is probably an error, but its erroneousness can not be ascertained.

consider the following cases:

"use strict";
[ eval, // not yet an error, but as a part of a "potpat", it might well turn into an error.
  arguments = 12 // an error, but may not be the first one, so it will not immediately throw
]
; // <-- now we are sure about `arguments = 12` being an actual error

[ eval,
  arguments = 12
]
= // <-- now we are sure about `eval` being an actual error; `arguments = 12` is no longer the first error
12;

[{ a= b }, // possible error #1
 [ arguments ] = 12, // possible error #2
]
= // raise #2
12;

(
   12, // surprisingly, it is not considered tricky, because it is not potpat
   [(a)], // possible error #1 -- if it is a parpos, `(a)` will be an invalid parameter
   e * 12, // this is not tricky either
   {a=b} // possible error #2 -- if it isn't a parpos, it will be an unsatisfied assignment
)
; // raise #2

(
    [(a)], // possible error #1
   12,
    e * 12, 
    {a=b} // possible error #2
) => /* <-- raise #1 */ 12
// please note that, in the case above, if `12` had come first, it'd have been the error finally raised --
// "possible" errors, like their name suggests, are raised if only no error has happened before them.

function* l() {
   "use strict";
   (a=yield, // possible error #1-- it is a parpos node, and if it turns out to be an actual param, 
                 // it is not allowed to contain a yield expression
    arguments=12 // possible error #2
    )
    ; // raise #2

    (a=yield, // possible error #1
     arguments = 12 // possible error #2
    ) => /* <-- raise #1 */ 'l';
}

There is also an extreme case; considering we are in a generator, what should the error for the following code be?

(yield)=>12

rather simple -- it should be "yield can not be an array parameter when in a generator".

But what about this one (again considering we are inside a generator)?

(yield = 12) => 12

This one should raise the same error as above; but this means we should actually postpone an outright error -- a syntactic one (rather than a semantic one, which would've been easier to deal with) --until the ) is reached.
But of course things are not that hard, and the "syntactical" error we are worrying about is only contextually considered a syntax error, since in a non-generator, non-strict context, yield = 12 is indeed allowed.

The case is dismissed though -- yield is an actual keyword inside a generator, so (yield)=>12 and (yield=12)=>12 will be just as erroneous as (while)=>12 and (while=12)=>12.

That makes for the foreword.

Jazzle is currently using multiple variables (firstYS, parenYS, firstElemWithYS, firstParen, firstUnassignable, firstNonTailRest, firstEA, firstEAContainer 💦) to track all the issue above, and while it does a decent job tracking all these tricky cases, it still looks like to be doing it in a more complicated fashion than it actually should.
I believe a more straightforward (and more lightweight) approach has got to be found.

As an analogy, there are two ways of counting sheep.

One is to count the hooves, and divide the result by 4; this approach needs top-notch counting skills.
The other is to count the heads; even I can do it.

But looks like jazzle in its current state is counting the hooves.

the submodules for `export/import` need some re-evaluation

aside from cleaning them up, early errors are getting neglected in the meantime (duplicate exports, etc.)
also check whether they are indeed appearing as module-level constructs.

bugs, missing features and spec violation

@icefapper

Import and Export is broken. Does not handle early errors. E.g

var a, b; export default a; export { b as default };
export { a, b as c }
export default 1; export default 2;
export { a as default }
export { a }

All this fails. See also #12

Async parses for ES6
Exponent parses for ES6
ESTree violation. Author call it "extra information".

And does not handle

({a({e: a.b}){}})
(function* ({e: a.b}) {})
(function ({e: a.b}) {})

And most of the todos in the code haven't been fixed for months

Build system introduces redundant boilerplate

I noticed that the build system has created a lot of boilerplate and indirection in the bundle. It's effectively a concatenation script, but with some special semantics that are plainly very unidiomatic, and introduce numerous otherwise-unnecessary closures at load time.

To give an example, here's a snippet from your src/[email protected]

this.err = function(errorType, errParams) {
  errParams = this.normalize(errParams);
  return this.errorListener.onErr(errorType, errParams);
};

You could just as easily do this instead:

Parser.prototype.err = function(errorType, errParams) {
  errParams = this.normalize(errParams);
  return this.errorListener.onErr(errorType, errParams);
};

Doing this would let you just make your build script a glorified cat program, that just happens to wrap everything in an IIFE.

This would be much simpler to write and maintain, even if you decided to keep the naming convention. In fact, UglifyJS2 already does similar.

accept options in relevant locations

that is, the exported parse function and the Parser constructor; the options ought to be esprima/acorn compatible, making jazzle a transparent drop-in.

take the website online

jazzle needs its own website, among other things, in order to get recognized.
jazzle.org has to go online as soon as possible.

location issues

@icefapper Hi! I tried this module. Awsome work!

But is there any way I can turn of this misleading location stuff ? In Acorn and Esprima that is off by default, and can be activated through options.

And I also noticed that the location is not compatible with either Acorn or Esprima. Is this something that will be fixed?

Missing features

@icefapper

object spread (Acorn have had this for a year at least)
dynamic import
new template features
JSX (a must to have)

`CTX_PAT|CTX_NO_SIMPLE_ERR` hasn't been used in certain locations where it must

it has got to be sent to every parseExpr and parseNonSeqExpr that is not a sub-expr -- top-level exprs in other words.
otherwise there are going to be inconsistencies while parsing them, especially while tracking the so-called "tricky" cases.

About 0.6-dev

@icefapper I noticed the effort you put into splitting the codebase into smaller components but unfortunately, in my opinion, it's not completely on the mark yet.

For now you should only think about node's environment and focus on making the best use of its module system. A practical example of this would be what I've done in the parser folder where:

Prototype functions are defined one-per-file in parser/parse and parser/util.
Shorthands that are meant to be used by only a function (like #asArrowFuncArgList and #asArrowFuncArg being called only by Parser#parseArrow) are now bound to that function module's scope.
The constructor is a function defined in parser/constructor.js.
The parser's entry point is parser/index.js which automatically binds the prototype to its constructor.

Pros & Cons

Pros

Hierarchical organization of modules
No global space pollution (like with the _class)
Better scope management thanks to modules (you expose only what you need)
Natively supported by node (no need for compilation/transpilation of any sort)

Cons

The browser version requires to be generated via browserify

Additional notes

While the use of browserify might look a bit restrictive, it is very advantageous instead. Why? It can generate UMD builds!. This way jsRube would work as intended independently of how it is loaded in the browser (be it a src tag, an AMD module loader a CommonJS module loader or whatever).

By the way, if you want to discuss privately, I'm always avaiable at [email protected].

`new` submodule requires a serious scrutiny

almost all other submodules have been rewritten. this one requires something along those lines too -- things CONTEXT_UNASSIGNABLE_CONTAINER are still in there even though they've been swept out for quite some time actually.

What should the error reported by `function *l() { (yield=12)=>12; }` be?

Hello
as it currently stands, this results in a 'unexpected =`; i know it is being rather too pedantic, and that it might require extra work to achieve it, but I believe it should be 'invalid parameter name: yield' instead.

Tests are going to fails without being adjusted first

Hello
Just wantd to say jsRube's AST are slightly different from those of esprima; for example, while jsRube keeps nodes' start and end locations ('loc') in 'start' and 'end', respectively, esprima keeps them in 'ranges'. The code at the beginning of the function 'compare' in module './util.js' is actually the 'adjuster' code. Thanks a lot reading this far.