Code Monkey home page Code Monkey logo

Comments (9)

yhirose avatar yhirose commented on August 14, 2024

Hi Jerry,

You might be interested in the AstOptimizer in peglint.h. (Here is a usage. Or here)

AstOptimizer is a general purpose AST tree optimizer, which makes a regular AST output much simpler. You can taste it with peglint with --ast --opt. What it does is to reduce redundant nodes. (If a node contains only one child node, it merges the two nodes into a node.) It worked pretty well in many cases even though it's simple.

But if we want to gain the best possible optimization result, I believe it would be better to write a specific tree re-writer for the particular situation instead of using the general purpose re-writer.

Also this stackoverflow item is worth to read. We can do the same with cpp-peglib's enter/exit handlers.

Hope you will find the best solution for your situation!

from cpp-peglib.

g40 avatar g40 commented on August 14, 2024

Hi Yuji

So I've been running the optimizer over the ASTs all the time. Turns out the optimize flag was set to false though. Fixing that helps optimize things quite a lot!

A really quick win would be analogous to the ignore operator.

Right now '~' means nodes do not get added to the AST, but neither do their children.

So how about using something like '!' to direct the build to add any child nodes to the parent of the current node.

_!_LBRACE # skip

        RESULT          <- (NAMED_RESULT / ANON_RESULT)
        NAMED_RESULT    <- IDENTIFIER (ASSIGNMENT_OP/HASH_OP) (STRING_LITERAL / LBRACE (RESULT_LIST)* RBRACE / LBRACK (RESULT_LIST)* RBRACK)
        ANON_RESULT     <- (STRING_LITERAL / LBRACE (RESULT_LIST)* RBRACE / LBRACK (RESULT_LIST)* RBRACK)
        ~ASSIGNMENT_OP  <- < '=' >
        ~HASH_OP            <- < '#' >
# skip this node. add any child nodes to the parent of LBRACE/RBRACE ...
        !LBRACE         <- < '{' >
        !RBRACE         <- < '}' >

Possibly more advanced but I'm not sure really more useful at this stage.

Thinking out loud here, if the grammar was tagged with a prefix ':' combination:

r0:RESULT_LIST      <- r1:RESULT (COMMA r2:RESULT)*

then you could describe the AST itself independently.

%AST
r0 = +(r1,r2)

from cpp-peglib.

yhirose avatar yhirose commented on August 14, 2024

Hi Jerry

Right now '~' means nodes do not get added to the AST, but neither do their children.

I probably don't fully understand what you mean with it. But I'll show you what I actually got with ~. The following is an example of lbrace (RESULT_LIST)* rbrace, and lbrace and rbrace are declared with ~.

ROOT            <- RESULT_LIST*
RESULT_LIST     <- RESULT (comma RESULT)*
RESULT          <- STRING_LITERAL / lbrace (RESULT_LIST)* rbrace

STRING_LITERAL  <- '"' (('\\"' / '\\t' / '\\n') / (!["] .))* '"'

~lbrace         <- '{'
~rbrace         <- '}'
~comma          <- ','

%whitespace     <-  [ \t\r]*

The following is the results from peglint with --ast --opt (optimized AST mode).

$ peglint a.peg '"aaa", {"bbb", "ccc"}, "ddd"' --ast --opt
+ ROOT[RESULT_LIST]
  - RESULT[STRING_LITERAL] ("aaa")
  + RESULT[RESULT_LIST]
    - RESULT[STRING_LITERAL] ("bbb")
    - RESULT[STRING_LITERAL] ("ccc")
  - RESULT[STRING_LITERAL] ("ddd")

As you can see in the above result, ~lbrace and ~rbrace didn't affect their children ("bbb" and "ccc"). The children have successfully added to the parent (RESULT[RESULT_LIST]). Please let me know if I miss something.

r0:RESULT_LIST <- r1:RESULT (COMMA r2:RESULT)*
%AST
r0 = +(r1,r2)

Yes! It would be nice to have such 'tree construction grammar'. I was actually thinking of making something like ANTLR3's tree construction grammar and tree pattern matching grammar. They are very powerful and flexible, but a bit complicated to me. Anyway, I'll research more about the idea.

Thanks,
Yuji

from cpp-peglib.

g40 avatar g40 commented on August 14, 2024

Hi Yuji,

Apologies. I did not explain that very well. The 'ignore' works as you say for terminals but not for non-terminals. Here's a better example.

        RESULT_LIST     <- RESULT (',' RESULT)*
        RESULT          <- (NAMED_RESULT / ANON_RESULT)
        NAMED_RESULT    <- IDENTIFIER (ASSIGNMENT_OP/HASH_OP) ANON_RESULT
        ANON_RESULT     <- (STRING_LITERAL / BRACE_LIST / BRACK_LIST)
        BRACE_LIST      <- '{' (RESULT_LIST)* '}'
        BRACK_LIST      <- '[' (RESULT_LIST)* ']'
        ASSIGNMENT_OP   <- < '=' >
        HASH_OP         <- < '#' >

Now with this grammar we currently get this optimized tree:

    + ANON_RESULT (RESULT_LIST)
     + RESULT (NAMED_RESULT)
      - IDENTIFIER: 'number'
      - ASSIGNMENT_OP: '='
      - ANON_RESULT (STRING_LITERAL): '"1"'
     + RESULT (NAMED_RESULT)
      - IDENTIFIER: 'type'
      - ASSIGNMENT_OP: '='
      - ANON_RESULT (STRING_LITERAL): '"breakpoint"'

but the result nodes are superfluous, what we really want (I believe) is this

    + ANON_RESULT (RESULT_LIST)
      - IDENTIFIER: 'number'
      - ASSIGNMENT_OP: '='
      - ANON_RESULT (STRING_LITERAL): '"1"'
      - IDENTIFIER: 'type'
      - ASSIGNMENT_OP: '='
      - ANON_RESULT (STRING_LITERAL): '"breakpoint"'

which would be easy to achieve during the parse if you could annotate the grammar with the ! operator. This would tell the tree builder to suppress generating a new node, and if it had any children to add them to the new parent RESULT_LIST.

Does this make more sense?

        RESULT_LIST     <- RESULT (',' RESULT)*
        !RESULT         <- (NAMED_RESULT / ANON_RESULT)
        !NAMED_RESULT   <- IDENTIFIER (ASSIGNMENT_OP/HASH_OP) ANON_RESULT
        !ANON_RESULT        <- (STRING_LITERAL / BRACE_LIST / BRACK_LIST)
        !BRACE_LIST     <- '{' (RESULT_LIST)* '}'
        !BRACK_LIST     <- '[' (RESULT_LIST)* ']'
        ASSIGNMENT_OP   <- < '=' >
        HASH_OP         <- < '#' >

from cpp-peglib.

yhirose avatar yhirose commented on August 14, 2024

Hi Jerry,

Thanks for the full explanation about what you are trying to do. (You suggests a feature to pull up descendant nodes, right?)

I think ideal AST structure totally depends on users. The result AST by using ! operator could suitable for certain situations, but may not for other situations. So I would rather keep the 'tree re-writing` separated from the PEG parser itself, would like to provide a general solution that could give more control for generating AST to users. But I fully agree with you about the need for flexible tree re-writing feature.

Your suggestions at #15, #16 are very interesting to me, because it's a more general approach that can be applied to many situations. Please give time to ponder over it. (I would like to make a good and long-lasting design decision. 😄)

I have also been checking ANTLR way about the tree construction. I mentioned that ANTLR3 had the good 'tree construction grammar in the previous comment, but the latest ANTLR4 doesn't come with the feature anymore. ANTLR4 only creates the parser tree only and provides 'listener/visitor' interface for three re-writing. So users cannot specify the tree construction rule in their grammar files any more.

ANTLR 4 automatically constructs parse trees for you and abstract syntax tree (AST) construction is no longer an option. See also What if I need ASTs not parse trees for a compiler, for example?

I guess they don't want to include any information that is not related to BNF grammar, so that the grammar file could remain clean and readable. It makes sense to me, but less convenient for AST handling.

Anyway, it's a very interesting topic to thing about!
Yuji

from cpp-peglib.

g40 avatar g40 commented on August 14, 2024

Hi Yuji,

Yes that is it. I've got a work-around but it makes the grammars look a little odd. No matter as I am quite happy to wait and see how things develop.

Personally I much prefer the (apparent) generality of #16 as you can (I think) generate almost any kind of tree as a result.

from cpp-peglib.

yhirose avatar yhirose commented on August 14, 2024

Combine #15 and #16 here.

from cpp-peglib.

yhirose avatar yhirose commented on August 14, 2024

https://theantlrguy.atlassian.net/wiki/spaces/~admin/blog/2012/12/08/524353/Tree+rewriting+in+ANTLR+v4

from cpp-peglib.

yhirose avatar yhirose commented on August 14, 2024

Close it for now.

from cpp-peglib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.