hoaproject / compiler Goto Github PK
View Code? Open in Web Editor NEWThe Hoa\Compiler library.
Home Page: https://hoa-project.net/
The Hoa\Compiler library.
Home Page: https://hoa-project.net/
in Lexer.php
, preg_match
could return false
in case of regex error.
the false
return should be tested and explained at https://github.com/hoaproject/Compiler/blob/master/Llk/Lexer.php#L270
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
When using multiline input, the arrow indicating position of invalid token is not positioned correctly:
/**
* @Foo(hello
*/
When error is the (
(i.e. missing pair )
), the exception message currently contains:
/**
* @Foo(hello
*/
↑
What would be expceted instead:
/**
* @Foo(hello
↑
*/
or:
/**
* @Foo(hello
↑
Take the following grammar:
%skip s \s
%token a a
%token b b
%token c c
#root:
( <a> ) ( <b> #other )* <c>
echo 'abbc' | hoa compiler:pp Grammar.pp 0 -v dump
> #root
> > token(a, a)
> > token(b, b)
> > token(b, b)
> > token(c, c)
Because we have ( <a> )
and not only <a>
, we have the #root
ID and not #other
as expected.
There's currently no way of locating an erroneous token within source during semantic analysis.
Refer to: http://discourse.hoa-project.net/t/hoa-compiler-locate-nodes-in-source/256
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
writing grammar can be a tricky work.
Of course pp
and Hoa\Compiler
ease the job, but a dev / debug mode allowing the developer to understand the grammar in some special cases, and giving verbose traces would be a real plus.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
While attempting to run the suite locally with lowest dependencies (Composer's --prefer-lowest
), it crashes right away:
$ vendor/bin/hoa test:run -a
PHP Deprecated: The (unset) cast is deprecated in /tmp/Compiler/vendor/hoa/consistency/Prelude.php on line 73
Deprecated: The (unset) cast is deprecated in /tmp/Compiler/vendor/hoa/consistency/Prelude.php on line 73
Error: The (unset) cast is deprecated in /tmp/Compiler/vendor/hoa/test/.bootstrap.atoum.php at line 7
$ php -v | head -n 1
PHP 7.2.3 (cli) (built: Mar 12 2018 20:39:08) ( NTS )
When using Compiler as dependency and installing with --prefer-lowest
, it produces lots of errors, regarding the deprecated cast, but also non-existent class Hoa\Iterator\Buffer. See Travis log here: https://travis-ci.org/schmittjoh/serializer/jobs/371673251#L580
Compiler must, as a library, require lowest versions of its dependencies in versions it works with.
Example code with an error on the last line:
type A {
some(arg: String = "😺 😸 😹 😻 😼 😽 🙀 😿 😾"): Any
}
😿
Expected (line 5 and column 1):
Unrecognized token "😿" at line 5 and column 1:
😿
↑ <- arrow here
Actual (line 1 and column 89):
Unrecognized token "😿" at line 1 and column 89:
type A {
some(arg: String = "😺 😸 😹 😻 😼 😽 🙀 😿 😾"): Any
}
😿some
↑ <- arrow here
For example 5 abcde
would be parsed into
token(number, 5)
token(fixedlengthstring, abcde)
with the constraint that the length of the 2nd string token "abcde" is equal to the value of the first number token 5.
Here are 3 examples that would need this feature (maybe that using the Compiler is not the right tool for the job, in which case this issue can be closed) :
1️⃣ parsing the response of a IMAP FETCH command
* 1 FETCH (BODY[HEADER]<0> {100}
The first 100 byte literals of the headers would be here)
In this example, the size of the next data token is included not far before the data to parse.
2️⃣ Creating an AST from a PDF file:
5 0 obj
<< /Length 42 >>
stream
This (possibly encoded) stream contains 42 bytes of dataendstream
endobj
This one is a bit more tricky since the size of the data stream (42 bytes) is contained in a the previous "Dictionary Node" (from an AST Point-Of-View), thus requiring knowledge of previous nodes already emited.
3️⃣ Other protocols uses this as well:
Content-Length
header (HTTP), same as 2️⃣payload len
in a websocket frame, though this one is even more challenging because it would in addition require being able to read a streamed inputpicked from http://www.gazelle-parser.org/docs/manual.html#_repetition
The (sep) modifier specifies 0 or more occurrences of the previous component, where each occurrence is separated by sep. It is a more straightforward way of writing (component (sep component))?. sep can be any valid component (or in unusual cases, expression of components) that can appear on a right-hand-side of a rule.
The +(sep) modifier specifies 1 or more occurrences of the previous component, where each occurrence is separated by sep. It is a more straightforward way of writing component (sep component)*. sep can be any valid component (or in unusual cases, expression of components) that can appear on a right-hand-side of a rule.
Hi, i try this code https://keynote.hoa-project.net/PHPTour14/Demonstration/Generation_exhaustive.php to show all possible compiler output. Is it the same as paths? But my CPU just goes to 100% and nothing gets printed.
I'm trying to diagnose why i get a PHP Fatal error: Uncaught Hoa\Compiler\Llk\Parser::parse(): (0) Unexpected token ... blabla
there is no indication of which token is expected instead. Which alternatives were tried and what it backtracked on. For this i tried the code in the guide https://hoa-project.net/En/Literature/Hack/Compiler.html#Traces
Also question on the side .. the hoa
tool. I can not find it, i was expecting it to be in vendor/bin
but it's not there, how can i get this tool ?
In some grammars, we need the lexer to not be in Unicode mode. For instance, in Hoa\Json
. So we need to introduce pragma, and the first one would aim to disable Unicode. Something like:
%pragma unicode false
In some days, we will need to implement the following one too:
%pragma backtrack_limit 3
(for an LL(3) compiler)
Maybe the names are not correct. I need your help here!
In my task I need to know information about some tokens that have been marked as "%skip".
For example, here is the code:
%skip T_COMMENT \/\*.*\*\/
/** Docblock */
class /** skipped */ Test /** skipped */
{
/** skipped */
}
Output:
#Class
#Name
token(T_NAME, Test)
But Im need something like this:
#Class
#Docblock
token(T_COMMENT, /** Docblock */)
#Name
token(T_NAME, Test)
I can replace the %skip
with a %token
. But I will have to make it (%token
) optional after every active token in the grammar file.
Any ideas how implement this?
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
Hi,
I'd like to use your library, but I want to automatically inline parent class \Hoa\Compiler\Llk\Parser
and minimize generated class. So far it looks like
class FooParser extends \Hoa\Compiler\Llk\Parser
{
public function __construct()
{
// ...
}
}
The point is to decouple final product from the compiler.
I wanna ask whether it's acceptable for you. Of course, generated class would have reference to your library, e.g.
/**
* This parser was autogenerated by merging Foo-related stuff and Hoa Compiler Parser into this one.
* In other words, all the credits regarding generic Parser and Lexer goes to maintainers of https://github.com/hoaproject/Compiler.
* But rights on the Foo-related part of the software (describing formal grammar, etc.) belongs to Santa-Claus.
*
* Generated on Wednesday, November 7, 2018 (UTC) using Hoa Compiler "3.17.08.08".
*/
final class FooParser
{
// ...
}
Text of your license states that I also have to include the whole. Okay, but does it mean that my MIT-licensed product (for example) becomes product with mixed license?
What do you think?
Hi i've a problem with some pp syntax, i think the wrong repetition is getting precendence but i don't understand why. See below for relevant info.
notation under test:
Ab ::= Cd
pp
%skip S [\x09\x0A\x0D\x20]
%token Production_a ::=
%token NameStartChar \w
%token Choice_a \|
%token SequenceOrDifference_a -
%token Item_a [\?\*\+]
#Production:
NCName() ::Production_a:: Choice()
#NCName:
::NameStartChar:: (::NameStartChar::)*
#Choice:
SequenceOrDifference() (::Choice_a:: SequenceOrDifference())*
#SequenceOrDifference:
(NCName() (::SequenceOrDifference_a:: NCName() | NCName()*))?
Trace:
# namespace token name token value offset
-------------------------------------------------------------------------------
0 default NameStartChar A 0
1 default NameStartChar b 1
2 default Production_a ::= 3
3 default NameStartChar C 7
4 default NameStartChar d 8
5 default EOF 9
> enter Production (#Production)
> > enter NCName
token NameStartChar, consumed A
> > > enter 17
> > > > enter 16 (#NCName)
token NameStartChar, consumed b
< < < < ekzit 16
< < < ekzit 17
< < ekzit NCName
token Production_a, consumed ::=
> > enter Choice
> > > enter SequenceOrDifference
> > > > enter 12
> > > > > enter NCName
token NameStartChar, consumed C
> > > > > > enter 17
< < < < < < ekzit 17
< < < < < ekzit NCName
> > > > > enter 11
> > > > > > enter 10 (#SequenceOrDifference)
> > > > > > > enter 9
> > > > > > > > enter NCName
token NameStartChar, consumed d
> > > > > > > > > enter 17
< < < < < < < < < ekzit 17
< < < < < < < < ekzit NCName
< < < < < < < ekzit 9
< < < < < < ekzit 10
< < < < < ekzit 11
< < < < ekzit 12
< < < ekzit SequenceOrDifference
> > > enter 5
< < < ekzit 5
< < ekzit Choice
< ekzit Production
> #Production
> > #NCName
> > #Choice
> > > #SequenceOrDifference
> > > > #NCName
> > > > #NCName
At:
token NameStartChar, consumed C
> > > > > > enter 17
Expect:
> > > > > > > enter 16 (#NCName)
But get:
`< < < < < < ekzit 17`
I think there is a preference for the second NCName()
here:
#SequenceOrDifference:
(NCName() (::SequenceOrDifference_a:: NCName() | >>> NCName()* <<< ))?
Instead of trying to match another ::NameStartChar::
here:
#NCName:
::NameStartChar:: ( >>> ::NameStartChar:: <<<)*
Please note that i put >>> <<<
just to indicate the flow of the grammar, refer to case.pp
for the actual grammar.
If you remove the support for the token namespaces, then can significantly speed up the Lexer. This will simply completely rewrite the algorithm.
Benchmarks:
Stand:
Sources: Original Lexer
AVG: 29.0824s (408 token/s)
Sources: Rewritten Lexer
AVG: 30.9218s (383.7 token/s)
Sources: FastLexer
AVG: 0.2091s (56752.7 token/s)
Can it make sense to adapt it for Hoa and in those cases when the user's grammar contains only the default
namespace to use the FastLexer implementation?
The third argument to the Token rule is not needed and is not required at runtime or for some other things. We can get rid of it.
new Token($id, $name, $nodeId, $unificationId, $kept);
// ^^^^^^^ - excess
There are suspicions that the backtrack
method does not work correctly and does not find the previous correct chain of rules, but I have not found the reason yet :\
%skip T_WHITESPACE \s+
%token T_DIGIT \d+
%token T_WORD \w+
#grammar:
digits() | words()
#digits:
<T_DIGIT>*
#words:
<T_WORD>*
2 3 4 a b c
> #grammar
> > #digits
> > > token(T_DIGIT, 2)
> > > token(T_DIGIT, 3)
> > > token(T_DIGIT, 4)
> > #words
> > > token(T_WORD, a)
> > > token(T_WORD, b)
> > > token(T_WORD, c)
Hoa\Compiler\Exception\UnexpectedToken : Unexpected token "a" (T_WORD) at line 1 and column 7:
2 3 4 a b c
↑
~/vendor/hoa/compiler/Llk/Parser.php:1
While working on TML I needed to write a second grammar for test purpose. This grammar is quite the same as the original language grammar but with some rules overloaded.
I played a bit whit compiler and came up with an %import
directive. It works quite well in my case and was wondering if you would want such feature in compiler. Here is how it looks like:
// src/tml.pp
%token T_FN \.[a-zA-Z_][a-zA-Z0-9_]*
%token T_VAR @[a-zA-Z_][a-zA-Z0-9_]*
%token T_NUMBER \-?[1-9][0-9]*
#tml:
( fn() | expr() | assign() | str() )+
#expr:
( <T_NUMBER> | rvar() ) ( operator() expr() )?
// ...
// tests/tml.pp
%import ../src/tml.pp
#tml:
expr()
| assign()
| str()
#expr:
<T_NUMBER> ( operator() expr() | division() )?
// Avoids division by zero
division:
<T_NUMBER[0]> <T_OP_DIVIDE> <T_NUMBER[0]>
operator:
<T_OP_PLUS>
| <T_OP_MINUS>
| <T_OP_MULTI>
Basically, the tests/tml.pp
will be loaded (Compiler\Llk\Llk::load
) and parsed. The %import
directive will be reached and the src/tml.pp
grammar will in turn be loaded and parsed. Then we continue with tests/tml.pp
thus overloading rules and tokens from src/tml.pp
.
The %import
can be used at the top of the file to overload imported grammar or a anywhere else to produce different results. Grammar are imported relatively to the file importing it. If an imported grammar imports in turn another grammar, the import will be relative to the file where the %import
directive is written.
What do you think ?
After the PP is read how can the grammar nodes be access / traversed programmatically? Before reading in any actual input for the parser.
@jubianchi did a good POC with it. It is a great idea. We should do that. Maybe @jubianchi could explain more.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
In some cases, it can be useful to come back to previous namespace (ie: “the NS we were before the current one”). Such a feature could be expressed with 2 syntaxes (for example, one can imagine another one).
The following examples illustrates these 2 syntax on json.pp file (look at %token string:_quote
token) :
%skip space \s
// Scalars.
%token true true
%token false false
%token null null
// Strings.
%token quote_ " -> string
%token string:string [^"]+
%token string:_quote " -> __PREVIOUS_NS__
// Objects.
%token brace_ {
%token _brace }
// Arrays.
%token bracket_ \[
%token _bracket \]
// Rest.
%token colon :
%token comma ,
%token number \d+
value:
<true> | <false> | <null> | string() | object() | array() | number()
string:
::quote_:: <string> ::_quote::
number:
<number>
#object:
::brace_:: pair() ( ::comma:: pair() )* ::_brace::
#pair:
string() ::colon:: value()
#array:
::bracket_:: value() ( ::comma:: value() )* ::_bracket::
where __PREVIOUS_NS__
is an alias for __PREVIOUS_1_NS__
, and the general form being __PREVIOUS_#_NS__
(#
is the number of namespace the compiler should go back)
%skip space \s
// Scalars.
%token true true
%token false false
%token null null
// Strings.
%token quote_ " -> string
%token string:string [^"]+
%token string:_quote " <-
// Objects.
%token brace_ {
%token _brace }
// Arrays.
%token bracket_ \[
%token _bracket \]
// Rest.
%token colon :
%token comma ,
%token number \d+
value:
<true> | <false> | <null> | string() | object() | array() | number()
string:
::quote_:: <string> ::_quote::
number:
<number>
#object:
::brace_:: pair() ( ::comma:: pair() )* ::_brace::
#pair:
string() ::colon:: value()
#array:
::bracket_:: value() ( ::comma:: value() )* ::_bracket::
where <-
means “leave current NS” (implying “go back to previous one”), and with <-
being allowed several times (for example <- <- <-
goes 3 ns back)
$nodeId
and $defaultId
in grammar rules?(new Concatenation($id, $children, $nodeId))->setDefaultId($defaultId);
// ^^^^^^^ - There ------- ^^^^^^^^^^
$nodeId
in tokens?new Token($id, $name, $nodeId, $unificationId, $kept);
// ^^^^^^^ - excess?
Those. we can have completely numeric indexes in the rules array, and whether or not the rule in the AST will be determined by the presence of the name in the "Entry" element of the trace. That some order should speed up the initialization and fetching (optimizing arrays for numeric indexes php7+)
Hi, as the lexer uses mb_strlen
, I think that the composer.json should require ext-mbstring
or symfony/polyfill-mbstring
, don't you ? :)
Maybe it's just a PhpStorm's bug, but I really didn't find the root Exception
class.
use Hoa\Exception as HoaException;
— what is that? Aliasing sub-namespace or aliasing class Exception
in the namespace Hoa
?
If first, how does it work:
class Exception extends HoaException
?
If second, I see no class Exception
in the namespace Hoa
(only Hoa\Exception\Exception
).
It looks like black magic. What's the purpose?
And aliasing is still supported by most popular IDE (PhpStorm) quite bad, e.g. find usages ignores aliased classes. I know that it's not an argument ("they should fix it"), but it also leads to confusions when reading code. Is it possible to resolve all aliases?
And I still don't get purpose of Consistency::flexEntity
.
in matchesLexem
(https://github.com/hoaproject/Compiler/blob/master/Llk/Lexer.php#L185-L199), if a token allows empty string to match (for example [a-z]*
), then strpos
(https://github.com/hoaproject/Compiler/blob/master/Llk/Lexer.php#L191) will report an error because needle is an empty string.
Either Hoa should complain about token allowing empty match (and doc should say it's forbiden),
either Hoa should support empty matches (But I'm not sure it makes sense to have empty tokens…)
👋 The lexer always reports Line 1 when it encounters an unrecognized token:
https://github.com/hoaproject/Compiler/blob/master/Llk/Lexer.php#L151
This is particularly problematic as it the file being parsed has 100s of lines. Guess this is also related to #97
Hello :-),
Supported editors:
Should we add these resources in this library or in Hoa\Devtools
@hoaproject/hoackers?
PHP Warning: strrpos(): Offset is greater than the length of haystack string in ~/vendor/hoa/compiler/Llk/Parser.php on line 198
Grammar.pp:
%skip T_IGNORE [\xfeff\x20\x09\x0a\x0d]+
%token T_COLON :
%token T_BRACE_OPEN {
%token T_BRACE_CLOSE }
%token T_NAME ([_A-Za-z][_0-9A-Za-z]*)
%token T_SCHEMA_DEFINITION schema
%token T_TYPE_DEFINITION type
%token T_ENUM_DEFINITION enum
%token T_UNION_DEFINITION union
%token T_INTERFACE_DEFINITION interface
%token T_SCALAR_INTEGER Int
%token T_SCALAR_FLOAT Float
%token T_SCALAR_STRING String
%token T_SCALAR_BOOLEAN Boolean
%token T_SCALAR_ID ID
#Document:
TypeDefinition()*
#TypeDefinition:
::T_TYPE_DEFINITION:: <T_NAME>? ::T_BRACE_OPEN:: Fields()* ::T_BRACE_CLOSE::
#Fields:
<T_NAME> ::T_COLON:: FieldValue()
#FieldValue:
<T_SCALAR_INTEGER> | <T_SCALAR_FLOAT> | <T_SCALAR_STRING> | <T_SCALAR_BOOLEAN> | <T_SCALAR_ID> | <T_NAME>
Source code for parsing:
type Test {
}
Both the JSON grammar and the generated data can be updated since recent commits on Hoa\Compiler
and Hoa\Json
. It must be updated.
When running this grammar https://github.com/hamlet-framework/type/blob/master/src/Reader/grammar.pp
against this test set: https://github.com/hamlet-framework/type/blob/master/tests/Reader/ParserTest.php
the output is:
\Hamlet\Cast\Type
> token(id, \Hamlet\Cast\Type)
array
> token(array, array)
int
> token(built_in, int)
array<string, array<string, array{DateTime}>>
> token(array, array)
array|null|false|1|1.1
> token(array, array)
array{id:int|null,name?:string|null}
> token(array, array)
('a'|'b'|'c')
> token(string:string, a)
'a'|'b'
> token(string:string, a)
string[][]
> token(built_in, string)
(1|false)[]
> token(int_number, 1)
(A::FOO | A::BAR)
> token(id, A)
int[]
> token(built_in, int)
callable(('a'|'b'), int):(string|array{\DateTime}|callable():void)
> token(callable, callable)
array{0: string, 1: string, foo: stdClass, 28: false}
> token(array, array)
A::class|B::class
> token(id, A)
Closure(bool):int
> token(id, Closure)
array<string,\DateTime>
> token(array, array)
Generator<T0, int, mixed, T0>
> token(id, Generator)
Generator<T0, int, mixed, T0> & object
> token(id, Generator)
What is the output not the complete tree but just the first token?
llk.php
disallows lexeme matching an empty value with following code
if('' === $matches[0])
throw new \Hoa\Compiler\Exception\Lexer(
'A lexeme must not match an empty value, which is the ' .
'case of "%s" (%s).', 3, array($lexeme, $regex));
but if this lexeme changes the current NS, it can permit a new match in the new NS. So this has to be allowed (maybe only in certain circumstances).
In fact, I have no idea if such a pattern would be useful, but I just found this piece of code theoretically false
see http://discourse.hoa-project.net/t/hoa-compiler-feedback-help/205
Like I say, for me the documentation need some improvements :
Honestly that was the two point that were missing and that require to have help for me.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
since https://github.com/hoaproject/Compiler/blob/master/Bin/Pp.php#L238 uses printf
, unicode strings are not correctly handled. As pointed in http://stackoverflow.com/questions/16003505/php-sprintf-with-foreign-characters printf is not utf8-aware, and @geraldcroes suggests in an answer :
You can do the trick by doing :
utf8_encode(sprintf('format', utf8_decode($yourstring));
... Of course you'll have to check every arguments if many are given.
Hi,
I would like to match MATCH
with a space at the end. When using regular expressions a space just remains a space. Can the token be enclosed somehow in quotes?
Thanks for support
when first token is unexpected, https://github.com/hoaproject/Compiler/blob/master/Llk/Parser.php#L216 will fail because strrpos
's offset will be greater than $text
's length
Hi!
This library looks really awesome, so I'm playing with it but I'm facing with a "basic" issue and I can't figure out if it's a limitation, a bug, or my mistake… Can you help me?
Here a minimalist grammar to illustrate my situation:
%token a a
%token word \w+
#root:
<word> | <a>
I want to match all words, but a
is a special keyword, I want to match it distinctly. The problem comes when I try to parse "ab"
: a
is recognized as a token and then the parser is stuck on b
character with an UnexpectedToken
exception. In my understanding, the parser should backtrack, discard the choice of the token a
and follow with the token word
… Am I wrong?
ℹ️
"a"
input is identified as a word
👎%token word a\w+|[^a]\w*
at first rule but… looks very weird and hard to maintain IMHOa
, matching words only and use AST to identify my specific keywords, but I think it's the role of the syntax analyzer, isn't it?Thanks in advance for your help, and your nice work on this library :) 👍
Hello,
It seems that when a rule is recursive or have a circular dependecy with other rules, the parser goes to an infinite loop. During the execution of Hoa\Compiler\Llk\Parser::unfold
, instead of decreasing, Hoa\Compiler\Llk\Parser::_todo
keeps growing, so the while
loop never stops.
I have this behavior with hoa/compiler 3.17.08.08
Example to reproduce:
<?php
use Hoa\Compiler\Llk\Llk;
use Hoa\Compiler\Visitor\Dump;
use Hoa\File\ReadWrite;
$file = new ReadWrite('php://temp');
$file->writeString(<<<PP
%skip whitespace \s
%token and &&
%token integer \d+
%token foo_ \(
%token _foo \)
rule:
_rule() | ::foo_:: _rule() ::_foo::
_rule:
(::integer:: | rule()) ::and:: (::integer:: | rule())
PP
);
$ast = Llk::load($file)->parse(<<<CODE
1 && (2 && 3) && 4
CODE
);
echo (new Dump())->visit($ast);
In addition to synax analysis (lex), there is also a semantic analysis of code. And in the case of errors of semantics, it is required to understand exactly where the error occurred. Like:
function a(int $b) {}
a(null);
// Error: Integer required in XXX file on line 2 and offset 3, but null given.
// a(null);
// ^
The only way to get this information is the TreeNode instance:
class TreeNode {
+public function getOffset(): int;
}
I tried to implement this in the form of Pull Request, but the source code is pretty confusing =\
It seems Hoa\Compiler
cannot parse Off-side rule languages.
Maybe it could be sufficient to have the compiler adding automatically INDENT
(respectively UNINDENT
) tokens each time indent increase (respectively decrease) by 1.
The tricky part seems to be the matching between spaces, tab, and indent length…
Here is a grammar I extracted from Hoa\Math
%skip space \s
%token number (0|[1-9]\d*)(\.\d+)?([eE][\+\-]?\d+)?
%token plus \+
%token minus \-|−
exp:
<number>
(( ::plus:: #add | ::minus:: #sub ) exp())?
I changed it into this
%skip space \s
%token number (0|[1-9]\d*)(\.\d+)?([eE][\+\-]?\d+)?
%token plus \+
%token minus \-|−
exp:
<number>
exp2()?
exp2:
( ::plus:: #add | ::minus:: #sub ) exp()
And was hoping it will work as previous grammar did but here are the AST
First grammar
#sub
token(number, 1)
#addtoken(number, 2)
token(number, 3)
Second grammar
token(number, 1)
Is it normal that the rule exp2() is handled at all (even if the trace shows that it goes through it) ?
Thank you
Dependabot can't resolve your PHP dependency files.
As a result, Dependabot couldn't update your dependencies.
The error Dependabot encountered was:
Your requirements could not be resolved to an installable set of packages.
Problem 1
- hoa/math 1.16.01.15 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- hoa/math 1.16.01.29 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- hoa/math 1.16.05.22 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- hoa/math 1.16.08.29 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- hoa/math 1.17.01.13 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- hoa/math 1.17.05.16 requires hoa/compiler ~3.0 -> satisfiable by hoa/compiler[3.16.01.11, 3.16.01.14, 3.16.08.15, 3.16.10.24, 3.17.01.10, 3.17.08.08].
- Can only install one of: hoa/compiler[3.16.01.14, No version set (parsed as 1.0.0)].
- Can only install one of: hoa/compiler[3.16.08.15, No version set (parsed as 1.0.0)].
- Can only install one of: hoa/compiler[3.16.10.24, No version set (parsed as 1.0.0)].
- Can only install one of: hoa/compiler[3.17.01.10, No version set (parsed as 1.0.0)].
- Can only install one of: hoa/compiler[3.17.08.08, No version set (parsed as 1.0.0)].
- hoa/compiler 3.16.01.11 requires hoa/file ~0.0 -> satisfiable by hoa/file[0.14.09.16, 0.14.09.17, 0.14.09.23, 0.14.11.09, 0.14.11.26, 0.14.12.10, 0.15.02.19, 0.15.05.12, 0.15.05.27, 0.15.11.09] but these conflict with your requirements or minimum-stability.
- hoa/math 1.16.01.14 requires hoa/zformat ~0.0 -> no matching package found.
- Installation request for hoa/compiler No version set (parsed as 1.0.0) -> satisfiable by hoa/compiler[No version set (parsed as 1.0.0)].
- Installation request for hoa/math ~1.0 -> satisfiable by hoa/math[1.16.01.14, 1.16.01.15, 1.16.01.29, 1.16.05.22, 1.16.08.29, 1.17.01.13, 1.17.05.16].
Potential causes:
- A typo in the package name
- The package is not available in a stable-enough version according to your minimum-stability setting
see <https://getcomposer.org/doc/04-schema.md#minimum-stability> for more details.
- It's a private package and you forgot to add a custom repository to find it
Read <https://getcomposer.org/doc/articles/troubleshooting.md> for further common problems.
If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.
You can mention @dependabot in the comments below to contact the Dependabot team.
Hi, I've some questions about the compiler. Since I'm considering implementing an existing Domain Specific Language (DSL) in PHP. So I would need a compiler for this. But then I noticed that most already defined languages are defined in Backus–Naur Form (BNF) or Extended Backus–Naur Form (EBNF). From here certain questions come to mind:
For completeness here is a list of resources I found on PHP, Compilers and (E)BNF
EBNF & some other stuff
http://karmin.ch/ebnf/index
http://sourceforge.net/projects/lime-php/
https://github.com/hafriedlander/php-peg
http://php.comsci.us/syntax/statement/ebnf.php
http://php.comsci.us/syntax/statement/bnf.php
http://marc.info/?l=php-internals&m=129387252319019
https://github.com/ferno/loco/blob/master/ebnf.php
BNF
http://www.garshol.priv.no/download/text/bnf.html
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_parser.y
http://lxr.php.net/xref/PHP_TRUNK/Zend/zend_language_scanner.l
http://www.icosaedro.it/articoli/php-syntax-ebnf.txt
http://www.icosaedro.it/articoli/php-syntax-yacc.txt
http://www.phpclasses.org/package/7142-PHP-Parse-language-source-with-a-BNF-grammar-syntax.html
https://github.com/ferno/loco/blob/master/bnf.php
http://code.google.com/p/pragmatic-parser/source/browse/trunk/parser.class.php?r=2
When I read the following topics http://discourse.hoa-project.net/t/hoa-compiler-keyword-identifier-clash/252 I can understand that namespace permit to avoid collision between namespace but also to isolate each namespace (correct?).
So it could be intresting to add a section in the namespace documentation about that, on the why using namespace and they purpose with a good example ;)
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
It has been noted in #93 (comment) that hoa/compiler
depends ext/ctype
just because of a single call to ctype_digit
in a non-critical path. I think it's a good idea to remove this dependency :-).
while manipulating $offset
for #6, I came to the following : if you use UTF8 chars with pp, error messages (at least) can be wrong. Consider, for example,
the following pp
%skip SPACE \s
%token FOO foo
%token BAR bar
%token BAZ baz
%token CHECK ✓
#doc:
<FOO> <BAR> <CHECK> <BAZ>
with the following string
foo bar ✓ pouet baz
error message will be
Hoa\Compiler\Llk\Lexer::lexMe(): (0) Unrecognized token "p" at line 1 and column 13:
foo bar ✓ pouet baz
↑
in /media/Data/Matthieu/Documents/hoa/Libs/Compiler/Llk/Lexer.php at line 1.%
but p
is effectively at column 11, not 13
Note: I suppose used char
How to reproduce:
Grammar.pp
%skip space \s
// Strings.
%token quote_ " -> string
%token string:string [a-z]+
%token string:_quote " -> default
value:
string()
string:
::quote_:: <string> ::_quote::
Run script:
$sampler = new Hoa\Compiler\Llk\Sampler\Coverage(
Hoa\Compiler\Llk\Llk::load(
new Hoa\File\Read('Grammar.pp')
),
new Hoa\Regex\Visitor\Isotropic(
new Hoa\Math\Sampler\Random()
)
);
foreach($sampler as $i => $value)
echo $value;
Will output some thing like:
" gvqjcd "
But in JSON there is not space
after default:quote_
and string:string
token.
space
where added by Sampler::generateToken() to each token.
I've fix temporary it by a little patch, but I don't know his behaviour with other languages...
protected function generateToken ( \Hoa\Compiler\Llk\Rule\Token $token ) {
$toNamespace = $this->completeToken($token);
$this->setCurrentNamespace($toNamespace);
$string = $this->_tokenSampler->visit(
$token->getAST()
);
if ( ( 'quote_' != $token->getTokenName()
|| 'default' != $token->getNamespace())
&& ( 'string' != $token->getTokenName()
|| 'string' != $token->getNamespace()))
$string .= ' '; // @todo: use skip token.
return $string;
}
I think, we must add a PP keyword (or some thing else) for specify whether no space
token must be added after a token.
PS: By the way Json\Grammar.pp <escaped>
produce not valid JSON.
I have the token \
i need to escape it once because pp uses regex so it becomes:
%token token62 \\
when this is written as php class the backslash is only escaped once, like so:
'token62' => '\\\',
it doesn't matter if there are more surrounding characters (i have a few other rules which suffer from the same bug)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.