I need to force-include characters and keywords (e.g. +, -, ...) in my language. Do I

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Alternatives are called <a href="https://pest.rs/book/grammars/peg.html#ordered-choice

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to force-include a char?,about badicsalex/peginator

Comments (15)

ethindp commented on June 12, 2024 1

@badicsalex I'm trying to write a compiler for it. Right now I just want to parse stuff; I'll work on minimizing it and making it easier to analyze later.

from peginator.

badicsalex commented on June 12, 2024

I'm not sure what you mean. Can you post a quick example of 1-2 strings and how you'd like it to be parsed?

from peginator.

ethindp commented on June 12, 2024

@badicsalex Okay, so, as background, I'm writing an Ada parser.

This is my first time working with peginator, so I don't know how the AST would look, but as an example, from my understanding, terminals are normally discarded in productions (which makes sense: you (usually) don't need them). However, take the following Ada code (from the Arithmetic-Geometric Mean on RosettaCode):

with Ada.Text_IO, Ada.Numerics.Generic_Elementary_Functions;

procedure Arith_Geom_Mean is

   type Num is digits 18; -- the largest value gnat/gcc allows
   package N_IO is new Ada.Text_IO.Float_IO(Num);
   package Math is new Ada.Numerics.Generic_Elementary_Functions(Num);

   function AGM(A, G: Num) return Num is
      Old_G: Num;
      New_G: Num := G;
      New_A: Num := A;
   begin
      loop
         Old_G := New_G;
         New_G := Math.Sqrt(New_A*New_G);
         New_A := (Old_G + New_A) * 0.5;
         exit when (New_A - New_G) <= Num'Epsilon;
         -- Num'Epsilon denotes the relative error when performing arithmetic over Num
      end loop;
      return New_G;
   end AGM;

begin
   N_IO.Put(AGM(1.0, 1.0/Math.Sqrt(2.0)), Fore => 1, Aft => 17, Exp => 0);
end Arith_Geom_Mean;

In lines 16-18 and 25, we have operators that I need to keep (*, +, <=, ...). As I noted in my original comment, I could define each simple and compound operator like this:

@char
LessThan = '<';
// ... Others...

But I'm wondering if just doing @:'<=', for example, would work instead.

from peginator.

badicsalex commented on June 12, 2024

No need for @char, that's just a microoptimization anyway. Even if the @: operator worked with "immediate" right side (it doesn't, the part after : has to be a rule name), @: would just alias the LessThan syntax tree element to String, but always put the same string in it. Better to leave it empty.

You can just:

Relational = LessOrEqual | GreaterOrEqual | NotEqual | LessThan | GreaterThan | Equal;
LessOrEqual = "<=";
GreaterOrEqual = ">=";
NotEqual = "/=";
LessThan = "<";
GreaterThan = ">";
Equal = "=";

This will give you a nice enum named Relational, with empty members that you can use in later steps.

Just be sure careful with the order in your operator unions. An incorrect order:

Relational = LessThan |GreaterThan | LessOrEqual | GreaterOrEqual | NotEqual;
...

This would parse a <= b as the a token, then just <, and depending on the rest of the rules, either fail, or parse = and then b.

from peginator.

badicsalex commented on June 12, 2024

If the question was "can I put arbitrary expressions, or at least string literals after :, or at least after @:", then unfortunately the answer is no.

I was thinking about creating an anonymous rule feature (the expression part after : would create and immediately use an anonymous rule) to do this, but it was never actually needed; it always just made the grammar harder to understand.

from peginator.

ethindp commented on June 12, 2024

@badicsalex Thanks for the help; I hope I do the parsing for expressions and such right (I did define all the keywords like And = i'and', so I'm good there). I have one more question - how does the parser handle diverging alternatives? (I'm unsure what else to call it.) E.g.:

@export
IteratedElementAssociation = For loop_spec:LoopParameterSpecification [Use key_expr:Expression] "=>" expr:Expression
| For iterator_spec:IteratorSpecification [Use key_expr:Expression] "=>" expr:Expression;

This also happens in expressions:

@export
Expression = relations:Relation {and:And relations:Relation} | Relations {and:And then:Then relations:Relation}
| relations:Relation {or:Or relations:Relation} | relations:Relation {or:Or else:Else relations:Relation}
| relations:Relation {xor:Xor relations:Relation};

Problem of course is that I think this is really messy. I might collapse the operators (and, and then, ...) into parenthesized blocks to make the tree easier to handle, but I was wondering how the parser handled things like this (the Ada grammar has things like this all over the place).

from peginator.

badicsalex commented on June 12, 2024

Alternatives are called "ordered choice" in PEGs, i.e. the first one that successfully parses is chosen, and parsing of the alternatives stop there.

I'd rewrite your first rule like this:

@export
IteratedElementAssociation = For (spec:LoopParameterSpecification | spec:IteratorSpecification) [Use key_expr:Expression] "=>" expr:Expression;

This way the spec field will be an enum that contains either a LoopParameterSpecification or an IteratorSpecification, and you can call match on it in your later logic.

And this is how I would do expressions:

@export
Expression = e:RelationExpression { op:LogicOperator e:RelationExpression};
RelationExpression = e:AddSubExpression { op:RelationOperator e:AddSubExpression};
AddSubExpression = e:UnaryExpression { op:AddSubOperator e:UnaryExpression};
...

LogicOperator = And | Or | Xor;
And = i'and';
Or = i'or';
Xor = i'or';
...

This would take care of operator precedence. See the calculator example, and also search for "operator precedence in PEG grammars"

from peginator.

ethindp commented on June 12, 2024

@badicsalex Operator precidence is handled directly in the grammar of Ada; I don't think that's necessarily something that I need to worry about since the grammar encodes it directly. I suppose I could rewrite it to be something like:

@export
Expression = relations:Relation {op:(And | (And Then) | Or | (Or Else) | Xor) relations:Relation};

Not sure how this would be represented though. The reason I asked about ordered choice was because Ada does things like what I provided previously a lot, even when the items are entirely different, e.g.:

@export
FullTypeDeclaration = (Type ident:DefiningIdentifier [discriminant:KnownDiscriminantPart] Is definition:TypeDefinition
[aspects:AspectSpecification] ";")
| task_type_declaration:TaskTypeDeclaration
| protected_type_declaration:ProtectedTypeDeclaration;

Or:

@export
IterationScheme = While Condition
| For LoopParameterSpecification
| For IteratorSpecification
| [Parallel [AspectSpecification]]
For ProceduralIterator
| Parallel ["(" ChunkSpecification ")"] [AspectSpecification]
For LoopParameterSpecification
| Parallel ["(" ChunkSpecification ")"] [AspectSpecification]
For IteratorSpecification;

I assume the general strategy for this would be something like (using the first example):

@export
FullTypeDeclaration = normal_type_declaration:(Type ident:DefiningIdentifier [discriminant:KnownDiscriminantPart] Is definition:TypeDefinition
[aspects:AspectSpecification] ";")
| task_type_declaration:TaskTypeDeclaration
| protected_type_declaration:ProtectedTypeDeclaration;

from peginator.

badicsalex commented on June 12, 2024

Operator precidence is handled directly in the grammar of Ada

Ah, I didn't know that! I see there's a published context-free grammar. Be careful, | is very different in context-free and PEG grammars.

So unfortunately while field:("whatever" IDunno) would be intuitive, it's not supported. But you can always rewrite it by putting the part in the brackets in a separate rule: field:Whatever + Whatever = "whatever" IDunno.

One special thing you can do is if you'd have a rule like this:

MultiThing = "hmmm" thing:(Something | SomethingElse) "alrite";

You can instead write:

MultiThing = "hmmm" (thing:Something | thing:SomethingElse)  "alrite";

peginator will notice that you used thing in two different alternatives with two different types, and create an enum type just for the thing field, and will set it accordingly.

By the way 1

If you write

MultiConti = "multi:" (num:Number | spec:Special | txt:Text);

peginator will generate the following structure:

struct MultiConti{
    num: Option<Number>,
    spec: Option<Special>,
    txt: Option<Text>,
}

and always set exactly one of the options. This will lead to error-prone parsing steps later, because you might forget a field or something.

If you do this instead:

MultiConti = "multi:" (m:Number | m:Special | m:Text);

peginator will create something like this:

struct MultiConti{
    m: MultiConti_m,
}
enum MultiConti_m{
    Number(Number),
    Special(Special),
    Text(Text),
}

And in later code you can just match (conti.m) { MultiConti_m:Number(n) => ...}.

Alternatively:

MultiConti = "multi:" (@:Number | @:Special | @:Text);

enum MultiConti{
    Number(Number),
    Special(Special),
    Text(Text),
}

FullTypeDeclaration

Based on the above, I'd write:

FullTypeDeclaration = @:NormalTypeDeclaration
| @:TaskTypeDeclaration
| @:ProtectedTypeDeclaration;

NormalTypeDeclaration = Type ident:DefiningIdentifier [discriminant:KnownDiscriminantPart] Is definition:TypeDefinition
[aspects:AspectSpecification] ";";

By the way 2

By the way 3

You only need to @export rules where you want to directly use the Rule::parse() function in rust. This is usually the root rule (I guess compilation in case of Ada?)

from peginator.

ethindp commented on June 12, 2024

@badicsalex Thank you for that! I'll go through the grammar and modify it to add (pseudo)? rules to make the structure simpler. (I'll have to inline a lot of rules anyway -- there are lots of rules that are just alternatives, and I don't want an over-complicated AST later, but I'll worry about that later.)

from peginator.

badicsalex commented on June 12, 2024

By the way 4: peginator was designed so that it gives you very fine grained control over the resulting parse tree structure, so that later steps are easy. You should always think about what you will do with the parse tree when designing (or in this case, rewriting) the grammar. If you are looking to just syntax check Ada files or something, peginator may not be the tool of choice.

E.g. You can use @: rather liberally to reduce the depth of the AST.

EDIT: After reading the above comment, I see you understand this :D

from peginator.

ethindp commented on June 12, 2024

The fine-grained control and really good syntax is what made me want to try this library. I've got one last question about this and then I'll go try to refactor this a bit.

The Ada grammar has a rule called name. This Rule is the really "ambiguous" rule in the grammar and why I may need to do post-processing of the parse tree: most of the alternatives in that rule can match. As an example, the indexed_component rule is:

IndexedComponent = prefix:Prefix "(" exprs:Expression {"," exprs:Expression} ")";

But the Slice, FunctionCall, and GeneralizedIndexing rules also have similar syntaxes:

Slice = prefix:Prefix"(" range:DiscreteRange")";

GeneralizedIndexing = prefix:Prefix parameters:ActualParameterPart;

# ...

FunctionCall =FunctionName
| FunctionPrefix ActualParameterPart;

ActualParameterPart ="(" ParameterAssociation {"," ParameterAssociation}")";

ParameterAssociation =[FormalParameterSelectorName "=>" ] ExplicitActualParameter;

ExplicitActualParameter = Expression | VariableName;

So I might need to do some rewriting of the tree. (Or I could be crazy and go "Hey, let's build a symbol table with check functions..."). Any recommendations for resolving this? (I'm not too worried about it right now, but in future I do want the parser to try to figure this stuff out on it's own. I'd refactor the grammar but whenever I do that I always worry that I might end up changing the language in some way I don't understand right away.)

from peginator.

badicsalex commented on June 12, 2024

Ooof, I looked at the link and this looks pretty complicated. I think you won't be able to distinguish between these cases in the parsing step. Take A(1)(2)... Is it a two dimensional array? An array of functions? A function returning an array? You will only know if you know what A is.

What I'd do is store it in some relatively generic form in the parsing step (like storing the prefix and the expression in the brackets), and then during a later step you check the type of A (which you should know by then), and act accordingly. So instead of directly match-ing on the syntax node enum, you do a smarter, dynamic dispatch and error handling.

I wouldn't exactly do "post-processing of the tree", as in "create a new tree where these are disambiguated, but everything else stays the same", but you will have to do some lowering step that completely rewrites the tree anyway, e.g. to get to SSA form, and you can just shove this dynamic dispatch there.

I don't think there's a way to have a list of parsed functions during parsing, because the checking functions (even the custom ones) have to be stateless because of the backtracking.

from peginator.

ethindp commented on June 12, 2024

Ooooof.... I'll try to merge it all into one construct, but it might pose a bit of a challenge. I wish I could do this in the parsing stage but I don't mind having to disambiguate it later. Thanks for all your assistance!

from peginator.

badicsalex commented on June 12, 2024

No problem, have fun :)

from peginator.

How to force-include a char? about peginator HOT 15 CLOSED

Comments (15)

By the way 1

FullTypeDeclaration

By the way 2

By the way 3

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent