jcoglan / canopy Goto Github PK
View Code? Open in Web Editor NEWA parser compiler for Java, JavaScript, Python, Ruby
Home Page: http://canopy.jcoglan.com
License: Mozilla Public License 2.0
A parser compiler for Java, JavaScript, Python, Ruby
Home Page: http://canopy.jcoglan.com
License: Mozilla Public License 2.0
I have this rule that I would expect to provide 'Expression' as the property 'expression' on the appropriate element passed to the action handler, however it isn't showing up. Plenty of other rules have property labels on them and they show up just fine -- is there something I've missed here?
ExpressionStatement <- !("{" / FunctionToken) expression:Expression EOS %makeExpressionStatement
Taking your url.peg
and…
import url.*;
public class UrlTest {
public static void main(String[] argv) {
try {
URL.parse(argv[0]);
System.out.println("parsed ok: " + argv[0]);
} catch (ParseError e) {
System.out.println("parse error (" + e + "): " + argv[0]);
}
}
}
… and calling it as…
java UrlTest foo
… throws:
Exception in thread "main" java.lang.StringIndexOutOfBoundsException: begin 0, end 4, length 3
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
at java.base/java.lang.String.substring(String.java:1874)
at url.Grammar._read_scheme(Grammar.java:135)
[…]
There are many unchecked String.substring()
uses, for starters…
When running canopy results in an error:
canopy mygrammar.peg --lang js
invalid options argument
TypeError: invalid options argument
at optsArg (MY_DIR/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
at mkdirp (MY_DIR/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
at write (MY_DIR/node_modules/canopy/bin/canopy:32:5)
at Object.<anonymous> (MY_DIR/node_modules/canopy/bin/canopy:37:3)
at Module._compile (internal/modules/cjs/loader.js:1147:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1167:10)
at Module.load (internal/modules/cjs/loader.js:996:32)
at Function.Module._load (internal/modules/cjs/loader.js:896:14)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
at /usr/local/lib/node_modules/npm/node_modules/libnpx/index.js:268:14
$ npm --version
6.13.7
$ node --version
v13.11.0
$ head node_modules/canopy/node_modules/mkdirp/package.json
{
"_args": [
[
"[email protected]",
...
My assumption is that mkdirp has a breaking change in 1.0.0, and canopy has no version enforced in the package.json.
Line 3: expected [A-Za-z0-9], "\t"
ifindex 3
^
because there's 2 '\t' before ifindex
, the actual location of error is somewhere later.
It would be great if, like in a visitor for babeljs, there was an option to write an enter/exit function for each node. I wonder if that'd be a simple matter or a complicated affair to implement.
My first thought is that it wouldn't be able to do it directly as it traverses the tree because PEG works by matching all the way to the end of the present rule, thereby returning the parsed result starting at the deepest node.
But I wonder how difficult it would be to retain the tree after parse and then apply a depth first traversal of actions in a manner similar to babel.
And as a side note, I wonder if an api similar to babel's path api would map onto a traversal process like that.
Consider a simple grammar for a language which expects white-space as a delimiter.
e.g. let @x = 5 let @q= 10
# for illustration only!
grammar SimpleLang
script <- kw+
kw <- ws* let ws*
let <- "let" ws+ var ws* "=" ws* value
var <- "@" [A-Za-z] [A-Za-z0-9]*
value <- string / number / var / objname
string <- "\"" ("\\" . / [^"])* "\""
number <- [1-9] [0-9]*
objname <- [A-Za-z] [A-Za-z0-9]*
ws <- [\s]
In order to reduce the number of white-space TreeNodes to one per white-space sequence we can create rules for white-space as:
ws <- [\s]*
wsp <- [\s]+
However, the white-space itself is of no importance to the parser, it just uses memory and consumes time being ignored when executing nodes.
Is it wise to add a symbol on the parser rule which means consume but do not add a TreeNode for this match?
e.g.
# "@" means consume but do not add a parser node.
@ws <- [\s]*
@wsp <- [\s]+
Another use may be for language specific annotation.
# an action which adds an annotation in a separate programmer defined structure but does not add a parser node.
@annotate <- "@@" comment %make_annotation
Note: this is different to using the current muting '@' as that mutes parts of an expression rather than an entire rule. Yes, it would be possible to add muting to all uses of wp or wps, but that makes the grammar appear messy.
Thoughts?
I tried going through the examples of javascript usage here:
http://canopy.jcoglan.com/langs/javascript.html
but i get the following error:
?> canopy url.peg --lang js
invalid options argument
TypeError: invalid options argument
at optsArg (/usr/local/lib/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
at mkdirp (/usr/local/lib/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
at write (/usr/local/lib/node_modules/canopy/bin/canopy:32:5)
at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:37:3)
at Module._compile (internal/modules/cjs/loader.js:1139:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1159:10)
at Module.load (internal/modules/cjs/loader.js:988:32)
at Function.Module._load (internal/modules/cjs/loader.js:896:14)
at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
at internal/main/run_main_module.js:17:47
can anyone tell me what the issue here is?
Is there a reason why a choice_expression is not an actionable_expression?
I tired to follow the installation instruction and the step make test
did not work. So I had to manually copy src to lib
rubtsov$ make test
npm test
npm ERR! Missing script: "test"
npm ERR!
npm ERR! To see a list of scripts, run:
npm ERR! npm run
npm ERR! A complete log of this run can be found in:
npm ERR! /Users/rubtsov/.npm/_logs/2021-11-28T20_35_43_434Z-debug.log
make: *** [test] Error 1
What's the status of this project?
There has been no release and almost no commits since July 2015 - is that because it's "done" (e.g. feature-complete and working) or because you abandoned it?
I'm looking at this as an alternative to PEG.js, which I dislike because of the hard coupling to JS and inline JS statements - what you're doing makes more sense to me, and the project appears to be fairly complete?
If so, perhaps consider tagging a 1.0 release?
Also, have you benchmarked this? e.g. your JSON parser vs PEG.js's JSON parser? It's really important to have some idea about how something like this performs before committing to it.
I'm thinking about building a simple language workbench for this, a bit like the one PEG.js has, but maybe with the addition of pass/fail test-cases, along the lines of regex101.com.
For the record, I don't really know what I'm doing, I'm still pretty new to parsers in general - have been playing with PEG.js and PetitParser (which I ported to PHP and partially to JS as well) and also experimented with my own implementation of a parser combinator library. I really like the idea of combinators and skipping the compile-step, but they appear to perform horribly, so now I'm interested in this :-)
Hi.
Failing grammar excerpt:
true_ <- "true" %makeTrue
Grammar workaround:
true_ <- "true" sub_elements_workaround %makeTrue
sub_elements_workaround <- "%DUMMY%"?
Action implementation:
public TrueTreeNode makeTrue(String input, int start, int end, List<TreeNode> elements) {
Without sub_elements_workaround
compilation fails because the interface (public interface Actions
) says the action method parameters include input, start, end and elements while the generated parser calls the action only with input, start and end.
Hello, I tried to install canopy via npm install -g canopy
about a week ago and got the app that does not work. The build from the git sources worked well for me. Could you please update the npm package?
With the grammar shown bellow that's based on https://github.com/jcoglan/canopy/blob/main/src/meta_grammar.peg adapted to be accepted by https://github.com/yhirose/cpp-peglib and it's online playground https://yhirose.github.io/cpp-peglib/ we can develop/debug a canopy grammar with instant feedback.
Copy and paste the grammar shown bellow on the Grammar
editor at https://yhirose.github.io/cpp-peglib/ then copy and paste you canopy grammar on the Source code
editor and click on the button Parse
in the upper right corner.
# From https://github.com/jcoglan/canopy/blob/main/src/meta_grammar.peg
# grammar Canopy.MetaGrammar
# ==============================================================================
grammar <- _* grammar_name (_* rule)+ _* #%grammar
grammar_name <- 'grammar' ":"? _+ object_identifier
rule <- identifier assignment parsing_expression #%rule
assignment <- _+ "<-" _+
# ==============================================================================
~_ <- [ \t\n\r] / comment
comment <- "#" [^\n]*
object_identifier <- <identifier ("." identifier)*>
identifier <- <[a-zA-Z_] [a-zA-Z0-9_]*>
integer <- <[1-9] [0-9]*>
# ==============================================================================
parsing_expression <- choice
/ choice_part
choice_part <- action_expression
/ typed_expression
/ sequence
/ sequence_element
sequence_element <- predicated_atom
/ repeated_atom
/ maybe_atom
/ atom
atom <- reference
/ terminal
/ paren_expression
terminal <- literal_string
/ ci_string
/ char_class
/ any_char
# ==============================================================================
action_expression <- actionable _+ action_tag #%action
actionable <- sequence
/ repeated_atom
/ maybe_atom
/ terminal
/ "(" _* actionable _* ")" #%paren_expr
action_tag <- "%" identifier
# ==============================================================================
typed_expression <- typable _+ type_tag #%extension
typable <- sequence
/ sequence_element
type_tag <- "<" object_identifier ">"
# ==============================================================================
choice <- choice_part (_* "/" _* choice_part)+ #%choice
# ==============================================================================
sequence <- sequence_part (_+ sequence_part)+ #%sequence
sequence_part <- mute? label? sequence_element #%sequence_part
mute <- "@"
label <- identifier ":"
# ==============================================================================
repeated_atom <- atom _* quantifier #%repeat
quantifier <- "*" / "+" / "{" _* numeric_quantifier _* "}"
numeric_quantifier <- integer (_* "," _* integer?)?
# ==============================================================================
paren_expression <- "(" _* parsing_expression _* ")" #%paren_expr
predicated_atom <- ("&" / "!") _* atom #%predicate
maybe_atom <- atom _* "?" #%maybe
reference <- identifier !assignment #%reference
literal_string <- '"' ("\\" . / [^"])* '"' #%string
/ "'" ("\\" . / [^'])* "'" #%string
ci_string <- "`" ("\\" . / [^`])* "`" #%ci_string
char_class <- "[" "^"? ("\\" . / [^\]])+ "]" #%char_class
any_char <- "." #%any_char
node v20.0.0
npm v9.6.4
MacOS 13.3.1
Canopy installed with 'npm install -g canopy' - from lib/package.json: "version":"0.4.0",
% canopy url.peg --lang python
mkdirp is not a function
TypeError: mkdirp is not a function
If I change either:
1: const mkdirp = require('mkdirp')
to
const {mkdirp} = require('mkdirp')
or
2: await mkdirp(dirname(name))
to
await mkdirp.mkdirp(dirname(name))
all is well.
I'm sure this is a js question, not really a canopy question, but I'd like to understand what's going on!
Cheers
Is there any support for more generic regular expressions than the ones mentioned in the documentation? For example,
EOL <- \s*((#[^\f\r\n]*)?[\f\r\n])\s*
might be used to parse end-of-lines with optional comments following a #
character, or
FLOAT <- (\.[0-9]+|[0-9]+\.[0-9]*)([Ee]-?[0-9]+)?
might be used to parse floating-point numbers.
I have a PEG grammar with a rule:
identifier <- "`" [a-zA-Z_][a-zA-Z0-9_.]* "`"
Basically, an identifier can't start with a digit or period. But when I try to compile it I get:
Line 4: expected "?", "*", "+", [\s], "#", [a-zA-Z_]
identifier <- "`" [a-zA-Z_][a-zA-Z0-9_.]* "`"
Now, I can insert a space to make it a sequence, but I don't want whitespace in my identifiers:
identifier <- "`" [a-zA-Z_] [a-zA-Z0-9_.]* "`"
Same with the backticks, actually. I wish I could simply specify a regex without having to make it a sequence at all. It's a lexographic element...
Is there a way to add comments to a PEG file?
How can I match on the end of input (or end of input string)?
in .peg file I have this end_of_string <- [$]+
, but it does not seem to match the 'end of string'.
Is there a better way? (sorry to make an issue, I did look for the answer in the docs and on stackoverflow, first)
Hi, I love your work and I have question about parse a string which has spaces using Choices.
For example, I have create a grammar
grammar Weather
root <- 'the weather is' ('good' / 'nice')
This will not correctly parse the sentence "the weather is good" because there is a space between "the weather is" and "good". I have to add a space after the weather is
in the grammar in order to parse the sentence.
grammar Weather
root <- 'the weather is ' ('good' / 'nice')
I'm wondering if there is a better way of doing this.
Hi.
Offending generated code:
if (offset < inputSize) {
chunk0 = input.substring(offset, offset + 4);
}
The code above throws StringIndexOutOfBoundsException when the parser is towards the end of input.
I guess should be if (offset + 4 < ...) ...
This happens when trying to match a literal string in the grammar.
Might help:
groovy:000> "abc".substring(1,2)
===> b
groovy:000> "abc".substring(1,20)
ERROR java.lang.StringIndexOutOfBoundsException:
String index out of range: 20
at java_lang_String$substring$2.call (Unknown Source)
One of the advantages of PEG-based grammars is that they are composable.
It would be very nice if canopy supports either composing or importing grammars.
I have a start of this implementation at https://github.com/byteit101/canopy/tree/multi_grammar but I wasn't sure how to expose actions for Java (multiple interfaces?), but otherwise it should "work". Importing should also be doable, but I haven't implemented that yet.
I was trying to figure out what kind of options the command line tool has and my first instinct was just to add "-h" or "--help" to it like most command line tools would support however this just gives a very uninformative error message.
More general it would be nice to display a short usage message when supplying incorrect command line arguments
$ canopy -h
The "to" argument must be of type string. Received undefined
TypeError [ERR_INVALID_ARG_TYPE]: The "to" argument must be of type string. Received undefined
at new NodeError (node:internal/errors:372:5)
at validateString (node:internal/validators:120:11)
at relative (node:path:1192:5)
at resolvePaths (/usr/local/lib/node_modules/canopy/bin/canopy:27:15)
at main (/usr/local/lib/node_modules/canopy/bin/canopy:39:33)
at run (/usr/local/lib/node_modules/canopy/bin/canopy:64:11)
at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:72:1)
at Module._compile (node:internal/modules/cjs/loader:1099:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:975:32)
$ canopy --help
The "to" argument must be of type string. Received undefined
TypeError [ERR_INVALID_ARG_TYPE]: The "to" argument must be of type string. Received undefined
at new NodeError (node:internal/errors:372:5)
at validateString (node:internal/validators:120:11)
at relative (node:path:1192:5)
at resolvePaths (/usr/local/lib/node_modules/canopy/bin/canopy:27:15)
at main (/usr/local/lib/node_modules/canopy/bin/canopy:39:33)
at run (/usr/local/lib/node_modules/canopy/bin/canopy:64:11)
at Object.<anonymous> (/usr/local/lib/node_modules/canopy/bin/canopy:72:1)
at Module._compile (node:internal/modules/cjs/loader:1099:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:975:32)
First off, I'm pretty impressed with your tool. However, my employer is someone scared off by the GPL license of Canopy itself. But it is unclear to me if the code generated by Canopy is also licensed under GPL or something else.
Could you please clarify this for me?
Why would I encounter this error when I try to assign an action onto my program node?
SyntaxError: Line 166: expected [\s], "#", "<-", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
Program <- SourceElements %makeProgram
Grammar up to that point is as follows
Start <- (__) Program
Program <- SourceElements %makeProgram (my action)
SourceElements <- Statement
Statement <- ExpressionStatement
.... so on until an Identifier. is the rest relevant?
If you aren't supporting commonjs already, could you add support for it? I'd like to use your parser/generator on a node project
Hi i was wondering if its possible to also do indented parsing with canopy? thanks a lot
Played around with this on friday, and the biggest distraction while trying to develop a grammar, was whitespace and comment nodes, which generate a lot of noise.
It's pretty common to need to allow optional whitespace (and/or comments) in or around various statements and blocks, and I suppose there could be other types of terminals (commas or other separators, string delimiters, etc.) which could be usefully discarded.
How about adding a discard operator?
Maybe something like a prefix operator @
to "mute" an expression - here's a small portion of JSON grammar with whitespace and commas muted:
grammar CanopyJson
document <- @__ (object / array) @__
object <- "{" pair (@"," pair)* "}" / "{" @__ "}"
...
__ <- [\s]*
Would that work?
This fixes an off-by-one for the Java sources. If the condition is <=
, the body will read after the end of the array and throw a bounds exception. This only occurs in the situation of printing an exception that has occurred at exactly the last character.
diff --git a/node_modules/canopy/templates/java/Parser.java b/node_modules/canopy/templates/java/Parser.java
index 8d38b7a..46563ed 100644
--- a/node_modules/canopy/templates/java/Parser.java
+++ b/node_modules/canopy/templates/java/Parser.java
@@ -27,7 +27,7 @@ public class {{name}} extends Grammar {
String[] lines = input.split("\n");
int lineNo = 0, position = 0;
- while (position <= offset) {
+ while (position < offset) {
position += lines[lineNo].length() + 1;
lineNo += 1;
}
These work:
grammar test
number <- [0-9]? "foo" %action
grammar test
number <- [0-9]+ %action
grammar test
number <- [0-9]?
But this fails with a grammar parsing error
grammar test
number <- [0-9]? %action
$ canopy grammar.peg
Line 2: expected [\s], "#", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
number <- [0-9]? %action
^
SyntaxError: Line 2: expected [\s], "#", [a-zA-Z_], "(", "&", "!", '"', "'", "`", "[", ".", "<", "/"
number <- [0-9]? %action
^
at Parser.parse (/usr/lib/node_modules/canopy/lib/canopy.js:2788:11)
at Object.parse (/usr/lib/node_modules/canopy/lib/canopy.js:2794:19)
at Canopy.Compiler.parseTree (/usr/lib/node_modules/canopy/lib/canopy.js:4527:20)
at Canopy.Compiler.toSource (/usr/lib/node_modules/canopy/lib/canopy.js:4539:10)
at Object.compile (/usr/lib/node_modules/canopy/lib/canopy.js:19:21)
at Object.<anonymous> (/usr/lib/node_modules/canopy/bin/canopy:25:24)
at Module._compile (internal/modules/cjs/loader.js:689:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
at Module.load (internal/modules/cjs/loader.js:599:32)
at tryModuleLoad (internal/modules/cjs/loader.js:538:12)
E.g. how do I know what type of an element was matched if the parse tree does not contain the matched rule name?
Let's take a hypothetical programming language, which I'm trying to parse:
class Dog
property dog_name
def bark()
puts "Woof!"
end
def sit()
end
end
And when parsing the class definition, class can contain 0..N (propertyDefinition OR methodDefinition).
When I get the parse tree and am handling the class definition node, it's sub-elements are (ideally):
Sure, I could detect the type of node by seeing if the object has key "property_name" (=> it's a property definition) or if it has key "method_name" (=> it's a method definition), but that's just ugly.
I noticed there is _types field in the parser and in parse(..., options), but it seems to be unimplemented.
Any advice on this?
When defining a rule such as:
end_of_statement <- ';' / eol / &']'
Then the parser (in python and javascript, at least) will hang forever. It only seems to do this when it actually encounters that character, however. It also occurs with negative lookaheads.
Hi -
I installed canopy using npm. I tried one of the simple examples to make sure it compiled to Python. I created the digits.peg file in the working directory and got the following error.
Any ideas?
Thanks.
F:\Users\otheruser\Documents\Python Projects\canopy_practice>canopy digits.peg --lang python invalid options argument TypeError: invalid options argument at optsArg (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\node_modules\mkdirp\lib\opts-arg.js:13:11) at mkdirp (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\node_modules\mkdirp\index.js:11:10) at write (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\bin\canopy:32:5) at Object.<anonymous> (C:\Users\username\AppData\Roaming\npm\node_modules\canopy\bin\canopy:37:3) at Module._compile (internal/modules/cjs/loader.js:776:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:787:10) at Module.load (internal/modules/cjs/loader.js:653:32) at tryModuleLoad (internal/modules/cjs/loader.js:593:12) at Function.Module._load (internal/modules/cjs/loader.js:585:3) at Function.Module.runMain (internal/modules/cjs/loader.js:829:12)
I am trying to evaluate a boolean expression using the library. So I am supporting either AND or and or & or &&.
But while defining the peg file i is not supported for case-insensitive and neither "AND"/ "and" / "&" / "&&"
Even this is not working. When giving multiple options parsing is failing
Hi,
It seems like the following doesn't parse in the meta grammar:
rule1 <- option1 / option2 %my_action
option1 <- "one"
option2 <- "two"
Is this by design or by accident? ;)
I looked at the parser code, but since I've only started looking at it today, I'm not really confident I understand it yet.
Cheers,
ast
Hi!
Last npm release was published 6 years ago, and quite a few things was fixed since then. Could you please publish new release?
Thanks for awesome library!
This isn't really a bug/issue. More of a question:
I have a bunch of legal text (first few lines):
84R1861 MK-D By: Whitmire S.B. No. 107 A BILL TO BE ENTITLED AN ACT relating to the removal, discipline, or transfer of a public school student. BE IT ENACTED BY THE
Then my grammar:
grammar TXBILL
txbill <- .* `A BILL TO BE ENTITLED` .*
I'd expect this to capture 'A BILL TO BE ENTITLED' and then everything to the left and right of it, yet I get a parsing error on the last line of the text.
For reference, here'e that last line:
If this Act does not receive the vote necessary for immediate effect, this Act takes effect September 1, 2015.
I might just be doing something stupid... Any guidance would be super appreciated though :)
the doc says that cross-references / named nodes are an easier way to traverse the tree than using the elements
array, but they are not available within %actions
. Perhaps they could be passed as an extra argument to the action e.g.:
action(input, start, end, elements, node) {
// node.Identifier
}
alternatively, the action could receive the current TreeNode as this
:
action(input, start, end, elements) {
// this.Identifier
}
Basing this on the example grammar
grammar Choice
root <- "a" "b" "c" <Foo> / "1" "2" "3" <Bar>
no matter what Foo and Bar define, this will fail on input "123" with:
File "grammar.py", line 95, in _read_root
address0.__class__ = type(cls0.__name__ + 'Foo', (cls0, self._types.Foo), {})
TypeError: Cannot create a consistent method resolution
order (MRO) for bases Foo, object
Looking at the generated grammar the problem seems to be that it is trying to set address.class before checking for FAILURE on the match (in this case the match on "abc" fails, a Foo node should not be created, but it is trying to assign that type. Testing for FAILURE before attempting to set the class seems to work
Hi, I was trying to run canopy words.peg --lang js
on a file I made containing
grammar Words
root <- first:"foo" second:"bar" <Extension>
But everytime I run it I get this error
TypeError: invalid options argument
at optsArg (/usr/lib/node_modules/canopy/node_modules/mkdirp/lib/opts-arg.js:13:11)
at mkdirp (/usr/lib/node_modules/canopy/node_modules/mkdirp/index.js:11:10)
at write (/usr/lib/node_modules/canopy/bin/canopy:32:5)
at Object.<anonymous> (/usr/lib/node_modules/canopy/bin/canopy:37:3)
at Module._compile (node:internal/modules/cjs/loader:1101:14)
at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
at Module.load (node:internal/modules/cjs/loader:981:32)
at Function.Module._load (node:internal/modules/cjs/loader:822:12)
at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:79:12)
at node:internal/main/run_main_module:17:47
My javascript file looks like this
var words = require('./words');
var types = {
Extension: {
convert: function() {
return this.first.text + this.second.text.toUpperCase();
}
}
};
words.parse('foobar', {types: types}).convert()
// -> 'fooBAR'
It's the exact same as the example at the end of the page http://canopy.jcoglan.com/langs/javascript.html
What I tried:
I just started a project with a generated parser. I didn't know initially that it wasn't hand-written.
It would be very helpful for the generated parser file to start with a comment like:
# Generated from `foo.peg` using Canopy: http://canopy.jcoglan.com/
Hi there! So I'm really new to grammar and canopy as well. I wrote this PEG file based on several things that I read and the online resources. However, it seems that the final result (I'm using Ruby as the output) is not performing very well. Does anyone know if there is was for me to improve its performance?
https://gist.github.com/crashtech/721a2258335d0d8f7e5a399a58ebc4aa
I love these. They are useful to parse, for example, indentation based languages.
In Peggy.js these are the &{...}
and &(...)
syntaxes.
Nice project, your docs are great.
Used Grammar:
query <- term (operation_ term)* _
term <- field _ "(" searchterm ")"
searchterm <- word ( cword / word)*
cword <- operation "(" searchterm ")"
word <- not string
operation <- (operation)?
operation_ <- _ ("AND" / "OR" / "&&" / "||") __ not
not <- _ "NOT "? _
field <- not "text:" / "category:" / "requestID:"
string <- _ [a-zA-Z0-9]+ [a-zA-Z0-9*?]* _
_ <- [ \t\r\n]*
__ <- [ \t\r\n]+
Query to parse
text: ( # invalid to grammar, but check runs
text: ( anychar # "RangeError: Maximum call stack size exceeded" is raised (chrome)
I have no idea why there should be a recursion problem at this point. If the "searchterm" is removed within the "cword" than it works again, but this should not be a problem.
There's currently no way for Java actions to throw an error to signal failure, which has been problematic in a larger project I'm implementing with this great piece of software.
I've worked around this by patching the action invocations to throw. (Edited for the newest version.)
diff --git a/node_modules/canopy/lib/builders/java.js b/node_modules/canopy/lib/builders/java.js
index 9a446c4..bc32f0c 100644
--- a/node_modules/canopy/lib/builders/java.js
+++ b/node_modules/canopy/lib/builders/java.js
@@ -132,7 +132,7 @@ class Builder extends Base {
method_ (name, args, block) {
this._newline()
- this._line('TreeNode ' + name + '() {', false)
+ this._line('TreeNode ' + name + '() throws ParseError {', false)
this._scope(block)
this._line('}', false)
}
diff --git a/node_modules/canopy/templates/java/Actions.java b/node_modules/canopy/templates/java/Actions.java
index 74fd824..1f9aa07 100644
--- a/node_modules/canopy/templates/java/Actions.java
+++ b/node_modules/canopy/templates/java/Actions.java
@@ -2,6 +2,6 @@ import java.util.List;
public interface Actions {
{{#each actions}}
- public TreeNode {{this}}(String input, int start, int end, List<TreeNode> elements);
+ public TreeNode {{this}}(String input, int start, int end, List<TreeNode> elements) throws ParseError;
{{/each}}
}
This has worked very well in having the actions throw nested errors, with the top-level wrapping in a ParseError to pass to the caller.
Since existing implements without the throw
are more specific than the interface, this will not affect existing code.
If you'd like any other changes to merge a throwable into the mainline, please let me know what I can do!
Pretty simple, I have generated a parse tree and made a few edits to the leaves of the parse tree within the valid ranges of the grammar. Now I want to regenerate the code from the parse tree.
Examples of the other generator parsers that have reverse functionality:
https://github.com/tjvr/nearley-reverse
https://github.com/lark-parser/lark
Is this possible with Canopy? If it's not implemented yet, I'd appreciate help in tackling this. Thanks.
Just out of boredom, I started building a playground for Canopy.
Unfortunately, the package isn't "web friendly", so it's not really possible at this time.
The trouble is the Base
class, which is hard coded with dependencies on NodeJS's fs
and path
- ideally, a package shouldn't have dependencies on NodeJS. (or any specific JS environment, if possible.)
Some ways around this dependency:
.js
and require
them (a bit clumsy to maintain because of \
escapes.).js
at build-time (but you don't currently have any scripts
in package.json
)The templates are an implementation detail, so they probably should just be built-in?
You also might consider using modern string templates - this is kind of option 1, but this would enable you to get rid of the template engine dependency as well, and formalize the template arguments, which might be the simplest, most elegant solution.
So instead of templates like this:
if (typeof require === 'function' && typeof exports === 'object') {
Object.assign(exports, exported);
} else {
var ns = (typeof this === 'undefined') ? window : this;
{{#each namespace}}
ns = ns.{{this}} = ns.{{this}} || {};
{{/each}}
ns.{{name}} = exported;
}
Instead, you'd have an actual function, like this:
module.exports = ({ name, namespaces }) => `
if (typeof require === 'function' && typeof exports === 'object') {
Object.assign(exports, exported);
} else {
var ns = (typeof this === 'undefined') ? window : this;
${namespaces
.map(namespace => `ns = ns.{{namespace}} = ns.{{namespace}} || {};`)
.join("\n ")}
ns.${name} = exported;
}
`
The templates, with extensions like .js
, .rb
and .java
aren't valid code in the first place, because of the placeholders.
You could probably make these functions compute/return the filename as well - that way it's all neatly tugged away with no dependency on the file-system; only the CLI front-end would depend on the filesystem, which is appropriate, since that's specifically designated for Node only, while the library code would be free from any dependencies on more than the standard JS run-time.
Thoughts? 🙂
First off, thanks a lot for sharing this development, that is the best I have found for this purpose, and the fact that it addresses various target environments is really awesome.
I have a suggestion : it could be a good idea, and quite a simple one, to allow for a generic %make function, that would receive the node type as an argument. For the moment, we need to define as many %make_xxxx functions as there are node types to be handled, but all my %make_xxxx functions do exactly the same thing: they instanciate a custom Node object, the only difference being the node type. So it's like:
class Actions(object):
def make_number(self, input, start, end, elements):
return ParseNode('number', input, start, end, elements)
def make_float(self, input, start, end, elements):
return ParseNode('float', input, start, end, elements)
def make_variable(self, input, start, end, elements):
return ParseNode('variable', input, start, end, elements)
def make_expression(self, input, start, end, elements):
return ParseNode('expression', input, start, end, elements)
# and so on, for about thirty node types.
I suspect it will be quite common to instanciate the same object type, with the node type as an argument. So if the node type was among the arguments of the %make functions, we could define a single %make function.
In the Java grammar, the generated code will try to cut a substring as in the following pseudocode:
if (offset < input.length)
result := input.substring(offset, offset + xxx);
The substring may clearly be called if offset + xxx
exceeds the input's length. In Java, substring will throw an exception if this is the case, which will cause the code to abort if this condition is met.
The condition is met trivially by having a grammar with an expected string larger than the given one at the end of the file.
I've patched this in canopy.js as in the npm module, but am not sure how to do this for the source.
diff --git a/node_modules/canopy/lib/canopy.js b/node_modules/canopy/lib/canopy.js
index a575820..ae3c9a9 100644
--- a/node_modules/canopy/lib/canopy.js
+++ b/node_modules/canopy/lib/canopy.js
@@ -3186,7 +3186,7 @@ Canopy.extend(Canopy, {
chunk_: function(length) {
var chunk = this.localVar_('chunk', this.null_()), input = 'input', of = 'offset';
- this.if_(of + ' < inputSize', function(builder) {
+ this.if_(of + ' + ' + length + ' <= inputSize', function(builder) {
builder._line(chunk + ' = ' + input + '.substring(' + of + ', ' + of + ' + ' + length + ')');
});
return chunk;
In this patch, the full string size is tested before taking the conditional. This fixes the generated code for me in several grammars.
the 0.3.0
version available on NPM is currently broken. the issue is that its package.json
puts no version bounds on mkdirp
, so it gets mkdirp==1.0.x
, which had a breaking API change from mkdirp==0.5.x
, the version canopy 0.3.0 used. this was actually fixed in fc11abe, but never made it into a release.
as a workaround, i put up a mirror that installs correctly at lubieowoce/canopy-compat, but it'd be nice to have an official release that can be installed via the usual channels :)
Aha - one should use actions to pass their desired node back to the parent tree.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.