jcoglan / canopy Goto Github PK

A parser compiler for Java, JavaScript, Python, Ruby

License: Mozilla Public License 2.0

Ruby 9.95% HTML 0.59% JavaScript 60.12% Makefile 0.60% SCSS 1.60% Java 15.78% Python 11.35%

canopy's Issues

Labeled element not being passed to action handler as property of element

I have this rule that I would expect to provide 'Expression' as the property 'expression' on the appropriate element passed to the action handler, however it isn't showing up. Plenty of other rules have property labels on them and they show up just fine -- is there something I've missed here?

  ExpressionStatement    <- !("{" / FunctionToken) expression:Expression EOS %makeExpressionStatement

Java parser throws exceptions like crazy

Taking your url.peg and…

import url.*;

public class UrlTest {
    public static void main(String[] argv) {
            try {
                    URL.parse(argv[0]);
                    System.out.println("parsed ok: " + argv[0]);
            } catch (ParseError e) {
                    System.out.println("parse error (" + e + "): " + argv[0]);
            }
    }
}

… and calling it as…

java UrlTest foo

… throws:

Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end 4, length 3
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
    at url.Grammar._read_scheme(Grammar.java:135)
[…]

There are many unchecked String.substring() uses, for starters…

Fails with "invalid options argument"

When running canopy results in an error:

canopy mygrammar.peg --lang js
invalid options argument
TypeError: invalid options argument
    at optsArg (MY_DIR/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
    at mkdirp (MY_DIR/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
    at write (MY_DIR/node_modules/canopy/bin/canopy:32:5)
    at Object.<anonymous> (MY_DIR/node_modules/canopy/bin/canopy:37:3)
    at Module._compile (internal/modules/cjs/loader.js:1147:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1167:10)
    at Module.load (internal/modules/cjs/loader.js:996:32)
    at Function.Module._load (internal/modules/cjs/loader.js:896:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at /usr/local/lib/node_modules/npm/node_modules/libnpx/index.js:268:14

$ npm --version
6.13.7
$ node --version
v13.11.0
$ head node_modules/canopy/node_modules/mkdirp/package.json
{
  "_args": [
    [
      "[email protected]",
...

My assumption is that mkdirp has a breaking change in 1.0.0, and canopy has no version enforced in the package.json.

python ParseError reported location should calculate special character such as '\t'

Line 3: expected [A-Za-z0-9], "\t"
                ifindex 3
           ^

because there's 2 '\t' before ifindex, the actual location of error is somewhere later.

Feasibility of agumenting actions with enter/exit functions for each node

It would be great if, like in a visitor for babeljs, there was an option to write an enter/exit function for each node. I wonder if that'd be a simple matter or a complicated affair to implement.

My first thought is that it wouldn't be able to do it directly as it traverses the tree because PEG works by matching all the way to the end of the present rule, thereby returning the parsed result starting at the deepest node.

But I wonder how difficult it would be to retain the tree after parse and then apply a depth first traversal of actions in a manner similar to babel.

And as a side note, I wonder if an api similar to babel's path api would map onto a traversal process like that.

Discarding white-space (or any other matched rule) by not a creating node.

Consider a simple grammar for a language which expects white-space as a delimiter.
e.g. let @x = 5 let @q= 10

# for illustration only!
grammar SimpleLang
   script          <- kw+
   kw              <- ws* let ws*
   let             <- "let" ws+ var ws* "=" ws* value
   var             <- "@" [A-Za-z] [A-Za-z0-9]* 
   value           <-  string / number / var / objname
   string          <- "\"" ("\\" . / [^"])* "\""
   number          <- [1-9] [0-9]*
   objname         <- [A-Za-z] [A-Za-z0-9]*
   ws              <- [\s]

In order to reduce the number of white-space TreeNodes to one per white-space sequence we can create rules for white-space as:

ws        <- [\s]*
wsp       <- [\s]+

However, the white-space itself is of no importance to the parser, it just uses memory and consumes time being ignored when executing nodes.

Is it wise to add a symbol on the parser rule which means consume but do not add a TreeNode for this match?
e.g.

# "@" means consume but do not add a parser node.
@ws        <- [\s]*
@wsp       <- [\s]+

Another use may be for language specific annotation.

# an action which adds an annotation in a separate programmer defined structure but does not add a parser node.
@annotate  <- "@@" comment %make_annotation

Note: this is different to using the current muting '@' as that mutes parts of an expression rather than an entire rule. Yes, it would be possible to add muting to all uses of wp or wps, but that makes the grammar appear messy.

Thoughts?

I was unable to test code for url.peg

I tried going through the examples of javascript usage here:
http://canopy.jcoglan.com/langs/javascript.html

but i get the following error:

?> canopy url.peg --lang js
invalid options argument
TypeError: invalid options argument
    at optsArg (/usr/local/lib/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
    at mkdirp (/usr/local/lib/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
    at write (/usr/local/lib/node_modules/canopy/bin/canopy:32:5)
    at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:37:3)
    at Module._compile (internal/modules/cjs/loader.js:1139:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1159:10)
    at Module.load (internal/modules/cjs/loader.js:988:32)
    at Function.Module._load (internal/modules/cjs/loader.js:896:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47

can anyone tell me what the issue here is?

choice_expression as actionable_expression

Is there a reason why a choice_expression is not an actionable_expression?

`make test` does not work so as the installation instruction

I tired to follow the installation instruction and the step make test did not work. So I had to manually copy src to lib

rubtsov$ make test
npm test
npm ERR! Missing script: "test"
npm ERR!
npm ERR! To see a list of scripts, run:
npm ERR!   npm run

npm ERR! A complete log of this run can be found in:
npm ERR!     /Users/rubtsov/.npm/_logs/2021-11-28T20_35_43_434Z-debug.log
make: *** [test] Error 1

Project status / roadmap

What's the status of this project?

There has been no release and almost no commits since July 2015 - is that because it's "done" (e.g. feature-complete and working) or because you abandoned it?

I'm looking at this as an alternative to PEG.js, which I dislike because of the hard coupling to JS and inline JS statements - what you're doing makes more sense to me, and the project appears to be fairly complete?

If so, perhaps consider tagging a 1.0 release?

Also, have you benchmarked this? e.g. your JSON parser vs PEG.js's JSON parser? It's really important to have some idea about how something like this performs before committing to it.

I'm thinking about building a simple language workbench for this, a bit like the one PEG.js has, but maybe with the addition of pass/fail test-cases, along the lines of regex101.com.

For the record, I don't really know what I'm doing, I'm still pretty new to parsers in general - have been playing with PEG.js and PetitParser (which I ported to PHP and partially to JS as well) and also experimented with my own implementation of a parser combinator library. I really like the idea of combinators and skipping the compile-step, but they appear to perform horribly, so now I'm interested in this :-)

[java] interface declaration and call arguments mismatch

Hi.

Failing grammar excerpt:

true_           <- "true"  %makeTrue

Grammar workaround:

true_           <- "true" sub_elements_workaround %makeTrue
sub_elements_workaround <- "%DUMMY%"?

Action implementation:

    public TrueTreeNode makeTrue(String input, int start, int end, List<TreeNode> elements) {

Without sub_elements_workaround compilation fails because the interface (public interface Actions) says the action method parameters include input, start, end and elements while the generated parser calls the action only with input, start and end.

Installation via npm does not work

Hello, I tried to install canopy via npm install -g canopy about a week ago and got the app that does not work. The build from the git sources worked well for me. Could you please update the npm package?

Develop/debug grammar online

With the grammar shown bellow that's based on https://github.com/jcoglan/canopy/blob/main/src/meta_grammar.peg adapted to be accepted by https://github.com/yhirose/cpp-peglib and it's online playground https://yhirose.github.io/cpp-peglib/ we can develop/debug a canopy grammar with instant feedback.

Copy and paste the grammar shown bellow on the Grammar editor at https://yhirose.github.io/cpp-peglib/ then copy and paste you canopy grammar on the Source code editor and click on the button Parse in the upper right corner.

# From https://github.com/jcoglan/canopy/blob/main/src/meta_grammar.peg
# grammar Canopy.MetaGrammar

# ==============================================================================

grammar               <-  _* grammar_name (_* rule)+ _*  #%grammar

grammar_name          <-  'grammar' ":"? _+ object_identifier

rule                  <-  identifier assignment parsing_expression #%rule

assignment            <-  _+ "<-" _+

# ==============================================================================

~_                     <-  [ \t\n\r] / comment

comment               <-  "#" [^\n]*

object_identifier     <-  <identifier ("." identifier)*>

identifier            <-  <[a-zA-Z_] [a-zA-Z0-9_]*>

integer               <-  <[1-9] [0-9]*>

# ==============================================================================

parsing_expression    <-  choice
                       /  choice_part

choice_part           <-  action_expression
                       /  typed_expression
                       /  sequence
                       /  sequence_element

sequence_element      <-  predicated_atom
                       /  repeated_atom
                       /  maybe_atom
                       /  atom

atom                  <-  reference
                       /  terminal
                       /  paren_expression

terminal              <-  literal_string
                       /  ci_string
                       /  char_class
                       /  any_char

# ==============================================================================

action_expression     <-  actionable _+ action_tag #%action

actionable            <-  sequence
                       /  repeated_atom
                       /  maybe_atom
                       /  terminal
                       /  "(" _* actionable _* ")" #%paren_expr

action_tag            <-  "%" identifier

# ==============================================================================

typed_expression      <-  typable _+ type_tag #%extension

typable               <-  sequence
                       /  sequence_element

type_tag              <-  "<" object_identifier ">"

# ==============================================================================

choice                <-  choice_part (_* "/" _* choice_part)+ #%choice

# ==============================================================================

sequence              <-  sequence_part (_+ sequence_part)+ #%sequence

sequence_part         <-  mute? label? sequence_element #%sequence_part

mute                  <-  "@"

label                 <-  identifier ":"

# ==============================================================================

repeated_atom         <-  atom _* quantifier #%repeat

quantifier            <-  "*" / "+" / "{" _* numeric_quantifier _* "}"

numeric_quantifier    <-  integer (_* "," _* integer?)?

# ==============================================================================

paren_expression      <-  "(" _* parsing_expression _* ")" #%paren_expr

predicated_atom       <-  ("&" / "!") _* atom #%predicate

maybe_atom            <-  atom _* "?" #%maybe

reference             <-  identifier !assignment #%reference

literal_string        <-  '"' ("\\" . / [^"])* '"' #%string
                       /  "'" ("\\" . / [^'])* "'" #%string

ci_string             <-  "`" ("\\" . / [^`])* "`" #%ci_string

char_class            <-  "[" "^"? ("\\" . / [^\]])+ "]" #%char_class

any_char              <-  "." #%any_char

When running canopy I get TypeError: mkdirp is not a function

node v20.0.0
npm v9.6.4

MacOS 13.3.1

Canopy installed with 'npm install -g canopy' - from lib/package.json: "version":"0.4.0",

% canopy url.peg --lang python
mkdirp is not a function
TypeError: mkdirp is not a function

If I change either:
1: const mkdirp = require('mkdirp')

const {mkdirp} = require('mkdirp')

2: await mkdirp(dirname(name))

await mkdirp.mkdirp(dirname(name))

all is well.

I'm sure this is a js question, not really a canopy question, but I'd like to understand what's going on!

Cheers

Is it possible to match more general regular expressions?

Is there any support for more generic regular expressions than the ones mentioned in the documentation? For example,

EOL <- \s*((#[^\f\r\n]*)?[\f\r\n])\s*

might be used to parse end-of-lines with optional comments following a # character, or

FLOAT <- (\.[0-9]+|[0-9]+\.[0-9]*)([Ee]-?[0-9]+)?

might be used to parse floating-point numbers.

Problems declaring identifier syntax

I have a PEG grammar with a rule:

identifier   <- "`" [a-zA-Z_][a-zA-Z0-9_.]* "`"

Basically, an identifier can't start with a digit or period. But when I try to compile it I get:

Line 4: expected "?", "*", "+", [\s], "#", [a-zA-Z_]
identifier   <- "`" [a-zA-Z_][a-zA-Z0-9_.]* "`"

Now, I can insert a space to make it a sequence, but I don't want whitespace in my identifiers:

identifier   <- "`" [a-zA-Z_] [a-zA-Z0-9_.]* "`"

Same with the backticks, actually. I wish I could simply specify a regex without having to make it a sequence at all. It's a lexographic element...

Comments

Is there a way to add comments to a PEG file?

Question: how to match on end of string?

How can I match on the end of input (or end of input string)?

in .peg file I have this end_of_string <- [$]+, but it does not seem to match the 'end of string'.

Is there a better way? (sorry to make an issue, I did look for the answer in the docs and on stackoverflow, first)

Parsing spaces in a string with Choices

Hi, I love your work and I have question about parse a string which has spaces using Choices.
For example, I have create a grammar

grammar Weather
      root  <-  'the weather is' ('good' / 'nice')

This will not correctly parse the sentence "the weather is good" because there is a space between "the weather is" and "good". I have to add a space after the weather is in the grammar in order to parse the sentence.

grammar Weather
      root  <-  'the weather is ' ('good' / 'nice')

I'm wondering if there is a better way of doing this.

[java] StringIndexOutOfBoundsException

Hi.

Offending generated code:

            if (offset < inputSize) {
                chunk0 = input.substring(offset, offset + 4);
            }

The code above throws StringIndexOutOfBoundsException when the parser is towards the end of input.

I guess should be if (offset + 4 < ...) ...

This happens when trying to match a literal string in the grammar.

Might help:

groovy:000> "abc".substring(1,2)
===> b
groovy:000> "abc".substring(1,20)
ERROR java.lang.StringIndexOutOfBoundsException:
String index out of range: 20
        at java_lang_String$substring$2.call (Unknown Source)

Suport composing/importing grammars

One of the advantages of PEG-based grammars is that they are composable.
It would be very nice if canopy supports either composing or importing grammars.

I have a start of this implementation at https://github.com/byteit101/canopy/tree/multi_grammar but I wasn't sure how to expose actions for Java (multiple interfaces?), but otherwise it should "work". Importing should also be doable, but I haven't implemented that yet.

Add -h and --help flags to the command line tool

I was trying to figure out what kind of options the command line tool has and my first instinct was just to add "-h" or "--help" to it like most command line tools would support however this just gives a very uninformative error message.

More general it would be nice to display a short usage message when supplying incorrect command line arguments

$ canopy -h           
The "to" argument must be of type string. Received undefined
TypeError [ERR_INVALID_ARG_TYPE]: The "to" argument must be of type string. Received undefined
    at new NodeError (node:internal/errors:372:5)
    at validateString (node:internal/validators:120:11)
    at relative (node:path:1192:5)
    at resolvePaths (/usr/local/lib/node_modules/canopy/bin/canopy:27:15)
    at main (/usr/local/lib/node_modules/canopy/bin/canopy:39:33)
    at run (/usr/local/lib/node_modules/canopy/bin/canopy:64:11)
    at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:72:1)
    at Module._compile (node:internal/modules/cjs/loader:1099:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)

$ canopy --help
The "to" argument must be of type string. Received undefined
TypeError [ERR_INVALID_ARG_TYPE]: The "to" argument must be of type string. Received undefined
    at new NodeError (node:internal/errors:372:5)
    at validateString (node:internal/validators:120:11)
    at relative (node:path:1192:5)
    at resolvePaths (/usr/local/lib/node_modules/canopy/bin/canopy:27:15)
    at main (/usr/local/lib/node_modules/canopy/bin/canopy:39:33)
    at run (/usr/local/lib/node_modules/canopy/bin/canopy:64:11)
    at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:72:1)
    at Module._compile (node:internal/modules/cjs/loader:1099:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:975:32)

Question about the license of the generated code

First off, I'm pretty impressed with your tool. However, my employer is someone scared off by the GPL license of Canopy itself. But it is unclear to me if the code generated by Canopy is also licensed under GPL or something else.

Could you please clarify this for me?

Error when trying to assign makeProgram action to Program grammar rule

Why would I encounter this error when I try to assign an action onto my program node?

SyntaxError: Line 166: expected [\s], "#", "<-", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
  Program         <- SourceElements %makeProgram

Grammar up to that point is as follows

  Start <- (__) Program
  Program <- SourceElements %makeProgram (my action) 
  SourceElements <- Statement
  Statement <- ExpressionStatement
  .... so on until an Identifier. is the rest relevant?

add commonjs support

If you aren't supporting commonjs already, could you add support for it? I'd like to use your parser/generator on a node project

Indented parsing

Hi i was wondering if its possible to also do indented parsing with canopy? thanks a lot

Feature: discard operator for terminals?

Played around with this on friday, and the biggest distraction while trying to develop a grammar, was whitespace and comment nodes, which generate a lot of noise.

It's pretty common to need to allow optional whitespace (and/or comments) in or around various statements and blocks, and I suppose there could be other types of terminals (commas or other separators, string delimiters, etc.) which could be usefully discarded.

How about adding a discard operator?

Maybe something like a prefix operator @ to "mute" an expression - here's a small portion of JSON grammar with whitespace and commas muted:

grammar CanopyJson

document  <-  @__ (object / array) @__

object    <-  "{" pair (@"," pair)* "}" / "{" @__ "}"

...

__        <-  [\s]*

Would that work?

Don't throw a bounds exception when reporting Java errors at EOF.

This fixes an off-by-one for the Java sources. If the condition is <=, the body will read after the end of the array and throw a bounds exception. This only occurs in the situation of printing an exception that has occurred at exactly the last character.

diff --git a/node_modules/canopy/templates/java/Parser.java b/node_modules/canopy/templates/java/Parser.java
index 8d38b7a..46563ed 100644
--- a/node_modules/canopy/templates/java/Parser.java
+++ b/node_modules/canopy/templates/java/Parser.java
@@ -27,7 +27,7 @@ public class {{name}} extends Grammar {
         String[] lines = input.split("\n");
         int lineNo = 0, position = 0;

-        while (position <= offset) {
+        while (position < offset) {
             position += lines[lineNo].length() + 1;
             lineNo += 1;
         }

Error thrown for some rules with single elemets

These work:

grammar test
number <-  [0-9]? "foo" %action

grammar test
number <-  [0-9]+ %action

grammar test
number <-  [0-9]?

But this fails with a grammar parsing error

grammar test
number <-  [0-9]? %action

$ canopy grammar.peg
Line 2: expected [\s], "#", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
number <-  [0-9]? %action
                  ^
SyntaxError: Line 2: expected [\s], "#", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
number <-  [0-9]? %action
                  ^
    at Parser.parse (/usr/lib/node_modules/canopy/lib/canopy.js:2788:11)
    at Object.parse (/usr/lib/node_modules/canopy/lib/canopy.js:2794:19)
    at Canopy.Compiler.parseTree (/usr/lib/node_modules/canopy/lib/canopy.js:4527:20)
    at Canopy.Compiler.toSource (/usr/lib/node_modules/canopy/lib/canopy.js:4539:10)
    at Object.compile (/usr/lib/node_modules/canopy/lib/canopy.js:19:21)
    at Object.<anonymous> (/usr/lib/node_modules/canopy/bin/canopy:25:24)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)

Access to rule name from nodes?

E.g. how do I know what type of an element was matched if the parse tree does not contain the matched rule name?

Let's take a hypothetical programming language, which I'm trying to parse:

class Dog
  property dog_name

  def bark()
    puts "Woof!"
  end

  def sit()

  end
end

And when parsing the class definition, class can contain 0..N (propertyDefinition OR methodDefinition).

When I get the parse tree and am handling the class definition node, it's sub-elements are (ideally):

property definition
method definition
method definition

Sure, I could detect the type of node by seeing if the object has key "property_name" (=> it's a property definition) or if it has key "method_name" (=> it's a method definition), but that's just ugly.

I noticed there is _types field in the parser and in parse(..., options), but it seems to be unimplemented.

Any advice on this?

Rules unable to use lookahead only

When defining a rule such as:

end_of_statement <- ';' / eol / &']'

Then the parser (in python and javascript, at least) will hang forever. It only seems to do this when it actually encounters that character, however. It also occurs with negative lookaheads.

TypeError when invoking canopy

Hi -

I installed canopy using npm. I tried one of the simple examples to make sure it compiled to Python. I created the digits.peg file in the working directory and got the following error.

Any ideas?
Thanks.

F:\Users\otheruser\Documents\Python Projects\canopy_practice>canopy digits.peg --lang python invalid options argument TypeError: invalid options argument at optsArg (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\node_modules\mkdirp\lib\opts-arg.js:13:11) at mkdirp (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\node_modules\mkdirp\index.js:11:10) at write (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\bin\canopy:32:5) at Object.<anonymous> (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\bin\canopy:37:3) at Module._compile (internal/modules/cjs/loader.js:776:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:787:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12) at Function.Module._load (internal/modules/cjs/loader.js:585:3) at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)

Case Ignore or multiple string options are not working

I am trying to evaluate a boolean expression using the library. So I am supporting either AND or and or & or &&.

But while defining the peg file i is not supported for case-insensitive and neither "AND"/ "and" / "&" / "&&"

Even this is not working. When giving multiple options parsing is failing

Unable to assign action to choice nodes?

Hi,

It seems like the following doesn't parse in the meta grammar:

rule1 <- option1 / option2 %my_action

option1 <- "one"

option2 <- "two"

Is this by design or by accident? ;)

I looked at the parser code, but since I've only started looking at it today, I'm not really confident I understand it yet.

Cheers,

ast

Publish new release to npm

Hi!

Last npm release was published 6 years ago, and quite a few things was fixed since then. Could you please publish new release?

Thanks for awesome library!

ParseError with wildcard characters

This isn't really a bug/issue. More of a question:

I have a bunch of legal text (first few lines):

84R1861 MK-D By: Whitmire S.B. No. 107 A BILL TO BE ENTITLED AN ACT relating to the removal, discipline, or transfer of a public school student. BE IT ENACTED BY THE

Then my grammar:

grammar TXBILL

txbill <- .* `A BILL TO BE ENTITLED` .*

I'd expect this to capture 'A BILL TO BE ENTITLED' and then everything to the left and right of it, yet I get a parsing error on the last line of the text.

For reference, here'e that last line:

If this Act does not receive the vote necessary for immediate effect, this Act takes effect September 1, 2015.

I might just be doing something stupid... Any guidance would be super appreciated though :)

Pass cross-references / named nodes to actions

the doc says that cross-references / named nodes are an easier way to traverse the tree than using the elements array, but they are not available within %actions. Perhaps they could be passed as an extra argument to the action e.g.:

action(input, start, end, elements, node) {
   // node.Identifier 
}

alternatively, the action could receive the current TreeNode as this:

action(input, start, end, elements) {
   // this.Identifier 
}

[python] adding methods to nodes fails with alternatives

Basing this on the example grammar

grammar Choice
  root  <-  "a" "b" "c" <Foo> / "1" "2" "3" <Bar>

no matter what Foo and Bar define, this will fail on input "123" with:

  File "grammar.py", line 95, in _read_root
    address0.__class__ = type(cls0.__name__ + 'Foo', (cls0, self._types.Foo), {})
TypeError: Cannot create a consistent method resolution
order (MRO) for bases Foo, object

Looking at the generated grammar the problem seems to be that it is trying to set address.class before checking for FAILURE on the match (in this case the match on "abc" fails, a Foo node should not be created, but it is trying to assign that type. Testing for FAILURE before attempting to set the class seems to work

TypeError: Invalid options argument

Hi, I was trying to run canopy words.peg --lang js on a file I made containing

grammar Words
  root  <-  first:"foo" second:"bar" <Extension>

But everytime I run it I get this error

TypeError: invalid options argument
    at optsArg (/usr/lib/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
    at mkdirp (/usr/lib/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
    at write (/usr/lib/node_modules/canopy/bin/canopy:32:5)
    at Object.<anonymous> (/usr/lib/node_modules/canopy/bin/canopy:37:3)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:79:12)
    at node:internal/main/run_main_module:17:47

My javascript file looks like this

var words = require('./words');

var types = {
  Extension: {
    convert: function() {
      return this.first.text + this.second.text.toUpperCase();
    }
  }
};

words.parse('foobar', {types: types}).convert()
// -> 'fooBAR'

It's the exact same as the example at the end of the page http://canopy.jcoglan.com/langs/javascript.html

What I tried:

A different .peg file (same error)
Reinstalling canopy from npm (same error)

Add comment to generated parsers: "Generated from .peg file using Canopy"

I just started a project with a generated parser. I didn't know initially that it wasn't hand-written.

It would be very helpful for the generated parser file to start with a comment like:

# Generated from `foo.peg` using Canopy: http://canopy.jcoglan.com/

Question about Improving a grammar

Hi there! So I'm really new to grammar and canopy as well. I wrote this PEG file based on several things that I read and the online resources. However, it seems that the final result (I'm using Ruby as the output) is not performing very well. Does anyone know if there is was for me to improve its performance?

https://gist.github.com/crashtech/721a2258335d0d8f7e5a399a58ebc4aa

Can we have predicates?

I love these. They are useful to parse, for example, indentation based languages.

In Peggy.js these are the &{...} and &(...) syntaxes.

Nice project, your docs are great.

Javascript RangeError: Maximum call stack size exceeded

Used Grammar:

query	        <-	term (operation_ term)* _
term	        <-	field _ "(" searchterm ")"
searchterm	<-	word ( cword / word)*

cword	        <-	operation "(" searchterm ")"
word	        <-	not string
operation	<-	(operation)?
operation_	<-	_ ("AND" / "OR" / "&&" / "||") __ not
not	        <-	_ "NOT "? _
field	        <-	not "text:" / "category:" / "requestID:"
string	        <-	_ [a-zA-Z0-9]+ [a-zA-Z0-9*?]* _
_	        <-	[ \t\r\n]*
__	        <-	[ \t\r\n]+

Query to parse

text: (			# invalid to grammar, but check runs
text: ( anychar		# "RangeError: Maximum call stack size exceeded" is raised (chrome)

I have no idea why there should be a recursion problem at this point. If the "searchterm" is removed within the "cword" than it works again, but this should not be a problem.

Allow Java actions to throw ParserError?

There's currently no way for Java actions to throw an error to signal failure, which has been problematic in a larger project I'm implementing with this great piece of software.

I've worked around this by patching the action invocations to throw. (Edited for the newest version.)

diff --git a/node_modules/canopy/lib/builders/java.js b/node_modules/canopy/lib/builders/java.js
index 9a446c4..bc32f0c 100644
--- a/node_modules/canopy/lib/builders/java.js
+++ b/node_modules/canopy/lib/builders/java.js
@@ -132,7 +132,7 @@ class Builder extends Base {

   method_ (name, args, block) {
     this._newline()
-    this._line('TreeNode ' + name + '() {', false)
+    this._line('TreeNode ' + name + '() throws ParseError {', false)
     this._scope(block)
     this._line('}', false)
   }
diff --git a/node_modules/canopy/templates/java/Actions.java b/node_modules/canopy/templates/java/Actions.java
index 74fd824..1f9aa07 100644
--- a/node_modules/canopy/templates/java/Actions.java
+++ b/node_modules/canopy/templates/java/Actions.java
@@ -2,6 +2,6 @@ import java.util.List;

 public interface Actions {
 {{#each actions}}
-    public TreeNode {{this}}(String input, int start, int end, List<TreeNode> elements);
+    public TreeNode {{this}}(String input, int start, int end, List<TreeNode> elements) throws ParseError;
 {{/each}}
 }

This has worked very well in having the actions throw nested errors, with the top-level wrapping in a ParseError to pass to the caller.

Since existing implements without the throw are more specific than the interface, this will not affect existing code.

If you'd like any other changes to merge a throwable into the mainline, please let me know what I can do!

Regenerating code from a parse tree?

Pretty simple, I have generated a parse tree and made a few edits to the leaves of the parse tree within the valid ranges of the grammar. Now I want to regenerate the code from the parse tree.

Examples of the other generator parsers that have reverse functionality:
https://github.com/tjvr/nearley-reverse
https://github.com/lark-parser/lark
Is this possible with Canopy? If it's not implemented yet, I'd appreciate help in tackling this. Thanks.

Make the package web-friendly

Just out of boredom, I started building a playground for Canopy.

Unfortunately, the package isn't "web friendly", so it's not really possible at this time.

The trouble is the Base class, which is hard coded with dependencies on NodeJS's fs and path - ideally, a package shouldn't have dependencies on NodeJS. (or any specific JS environment, if possible.)

Some ways around this dependency:

Convert templates to .js and require them (a bit clumsy to maintain because of \ escapes.)
Convert templates to .js at build-time (but you don't currently have any scripts in package.json)
Introduce a compiler (Parcel or Rollup, etc.)

The templates are an implementation detail, so they probably should just be built-in?

You also might consider using modern string templates - this is kind of option 1, but this would enable you to get rid of the template engine dependency as well, and formalize the template arguments, which might be the simplest, most elegant solution.

So instead of templates like this:

if (typeof require === 'function' && typeof exports === 'object') {
  Object.assign(exports, exported);
} else {
  var ns = (typeof this === 'undefined') ? window : this;
{{#each namespace}}
  ns = ns.{{this}} = ns.{{this}} || {};
{{/each}}
  ns.{{name}} = exported;
}

Instead, you'd have an actual function, like this:

module.exports = ({ name, namespaces }) => `
if (typeof require === 'function' && typeof exports === 'object') {
  Object.assign(exports, exported);
} else {
  var ns = (typeof this === 'undefined') ? window : this;
  ${namespaces
    .map(namespace => `ns = ns.{{namespace}} = ns.{{namespace}} || {};`)
    .join("\n  ")}
  ns.${name} = exported;
}
`

The templates, with extensions like .js, .rb and .java aren't valid code in the first place, because of the placeholders.

You could probably make these functions compute/return the filename as well - that way it's all neatly tugged away with no dependency on the file-system; only the CLI front-end would depend on the filesystem, which is appropriate, since that's specifically designated for Node only, while the library code would be free from any dependencies on more than the standard JS run-time.

Thoughts? 🙂

A suggestion regarding %make node builders

First off, thanks a lot for sharing this development, that is the best I have found for this purpose, and the fact that it addresses various target environments is really awesome.
I have a suggestion : it could be a good idea, and quite a simple one, to allow for a generic %make function, that would receive the node type as an argument. For the moment, we need to define as many %make_xxxx functions as there are node types to be handled, but all my %make_xxxx functions do exactly the same thing: they instanciate a custom Node object, the only difference being the node type. So it's like:

class Actions(object):

    def make_number(self, input, start, end, elements):
        return ParseNode('number', input, start, end, elements)

    def make_float(self, input, start, end, elements):
        return ParseNode('float', input, start, end, elements)

    def make_variable(self, input, start, end, elements):
        return ParseNode('variable', input, start, end, elements)

    def make_expression(self, input, start, end, elements):
        return ParseNode('expression', input, start, end, elements)

    # and so on, for about thirty node types.

I suspect it will be quite common to instanciate the same object type, with the node type as an argument. So if the node type was among the arguments of the %make functions, we could define a single %make function.

Incorrect call to java.lang.String#substring()

In the Java grammar, the generated code will try to cut a substring as in the following pseudocode:

if (offset < input.length)
    result := input.substring(offset, offset + xxx);

The substring may clearly be called if offset + xxx exceeds the input's length. In Java, substring will throw an exception if this is the case, which will cause the code to abort if this condition is met.

The condition is met trivially by having a grammar with an expected string larger than the given one at the end of the file.

I've patched this in canopy.js as in the npm module, but am not sure how to do this for the source.

diff --git a/node_modules/canopy/lib/canopy.js b/node_modules/canopy/lib/canopy.js
index a575820..ae3c9a9 100644
--- a/node_modules/canopy/lib/canopy.js
+++ b/node_modules/canopy/lib/canopy.js
@@ -3186,7 +3186,7 @@ Canopy.extend(Canopy, {

     chunk_: function(length) {
       var chunk = this.localVar_('chunk', this.null_()), input = 'input', of = 'offset';
-      this.if_(of + ' < inputSize', function(builder) {
+      this.if_(of + ' + ' + length + ' <= inputSize', function(builder) {
         builder._line(chunk + ' = ' + input + '.substring(' + of + ', ' + of + ' + ' + length + ')');
       });
       return chunk;

In this patch, the full string size is tested before taking the conditional. This fixes the generated code for me in several grammars.

installation via npm is broken

the 0.3.0 version available on NPM is currently broken. the issue is that its package.json puts no version bounds on mkdirp, so it gets mkdirp==1.0.x, which had a breaking API change from mkdirp==0.5.x, the version canopy 0.3.0 used. this was actually fixed in fc11abe, but never made it into a release.

as a workaround, i put up a mirror that installs correctly at lubieowoce/canopy-compat, but it'd be nice to have an official release that can be installed via the usual channels :)

How to designate which nodes are created?

Aha - one should use actions to pass their desired node back to the parent tree.

jcoglan / canopy Goto Github PK

canopy's Issues

Recommend Projects

Recommend Topics

Recommend Org