Code Monkey home page Code Monkey logo

Comments (10)

skyrich62 avatar skyrich62 commented on June 5, 2024 2

Hi,
In the case of (alice, bob, carol) vs. (alice, bob, dan), you should re-factor your grammar such that prefix to rules are separated like this:

struct prefix : seq<alice, bob> { };
struct group: seq< prefix, sor< carol, dan> > { };

This way, backtracking is eliminated. It is also possible to use the at<> rule to do infinite look-ahead, which doesn't invoke actions.

from pegtl.

skyrich62 avatar skyrich62 commented on June 5, 2024 1

It seems that "star<Sep, Rule>" is insufficient for the expected purposes. Perhaps a bit of look-ahead could be used to avoid calling the action of Sep, if not followed by Rule?

star< at<Sep, Rule>, Sep, Rule > ??

I don't know if it's the best way, but it does work, (at least in this case) consider:

#include <tao/pegtl.hpp>
using namespace tao::pegtl;

template <typename Rule, typename Sep>
using list_tail2 = seq< Rule, star< at< Sep, Rule>, Sep, Rule>, opt< Sep > >;

struct path_sep : one<'/'> {};
struct path : list_tail2<identifier, path_sep> {};

template <class Rule>
struct path_action : nothing<Rule> {};

template <>
struct path_action<path> {
    template <class Input>
    static void apply(Input const &in) {
        printf("  - path: '%s' @ %s\n", in.string().c_str(), to_string(in.position()).c_str());
    }
};

template <>
struct path_action<path_sep> {
    template <class Input>
    static void apply(Input const &in) {
        printf("  - separator: '%s' @ %s\n", in.string().c_str(), to_string(in.position()).c_str());
    }
};

void test() {
    for (std::string test_str: {"no/trailing", "yes/trailing/"}) {
        printf("parse '%s' {\n", test_str.c_str());
        parse<path, path_action>(string_input<> { test_str, "<code>" });
        printf("}\n");
    }
}

int main()
{
    test();
    return 0;
}
parse 'no/trailing' {
  - separator: '/' @ <code>:1:3
  - path: 'no/trailing' @ <code>:1:1
}
parse 'yes/trailing/' {
  - separator: '/' @ <code>:1:4
  - separator: '/' @ <code>:1:13
  - path: 'yes/trailing/' @ <code>:1:1
}

--Rich

from pegtl.

ColinH avatar ColinH commented on June 5, 2024 1

@gitamohr Deferring all actions until after a successful parsing run is something that I had looked into briefly as it could be useful while developing a grammar or for debugging, but it's not quite as easy as it appears at first.

from pegtl.

ColinH avatar ColinH commented on June 5, 2024 1

This is now fixed on the main branch. It required a new form of star called star_partial which is now used inside of list_tail. There might be some related-but-different future changes to list and rep_opt that will be discussed elsewhere.

from pegtl.

ColinH avatar ColinH commented on June 5, 2024

The list_tail rule is implemented as

template< typename Rule, typename Sep >
using list_tail = seq< Rule, star< Sep, Rule >, opt< Sep > >;

so it's clear why the action for a trailing separator is called twice.

(Once when Sep matches during the final attempt of the implicit seq< Sep, Rule > inside of star, and once in opt< Sep > after backtracking to before the Sep in the seq.)

@d-frey I suppose we should consider this as undesired behaviour and therefore a bug...

from pegtl.

gitamohr avatar gitamohr commented on June 5, 2024

Thank you! Your explanation makes sense, but I'm curious how users handle this in general. Say I am using actions to build up a data structure while parsing. How would I know to "undo" actions that were invoked as part of a match that was later back-tracked? I tried an example with the following grammar:

template <char c> struct person : one<c> {};

struct alice : person<'a'> {};
struct bob   : person<'b'> {};
struct carol : person<'c'> {};
struct dan   : person<'d'> {};

struct carol_tail : seq<alice, bob, carol> {};
struct dan_tail   : seq<alice, bob, dan> {};
struct group      : sor<carol_tail, dan_tail> {};

And as you described, if I parse the group "abd", the actions invoked are alice, bob, alice, bob, dan. How would I know that the first "alice" and "bob" actions are moot?

Reading a bit more about PEG, it seems like some options are:

  1. Invoke actions only when parsing is complete and the final parse is known; this way no backtracking can occur.
  2. Invoke actions even for possible matches that might be backtracked. This requires that actions be totally idempotent (no counting occurrences, for example!)
  3. Fancier memoziation/caching of action results, packrat-parser style.

PEGTL seems to be number 2. But I would ask that you consider adding an optional mode like number 1. That provides a lot of freedom for action writers to build data structures (e.g. vector::push_back, or count occurrences, etc). It would have a real cost I'm sure, since the full parse tree would need to be stored in order to do a final action invocation traversal. But for small simple expression languages, trading that cost for the freedom to count or otherwise mutate state in actions could be a real win.

from pegtl.

ColinH avatar ColinH commented on June 5, 2024

What @skyrich62 said is what we usually go for, factor out common prefixes to remove, or at least reduce, the problem.

That said, in the case of list_tail it's still a bug because the double invocation is a side-effect of how the rule is implemented, and not the documented behaviour that a user should expect.

Looking through the PEGTL rule definitions I found a few other rules that might need to be treated for the same problem:

list.hpp:   using list = seq< Rule, star< Sep, Rule > >;  // Is it ok to call an action for Sep when it's not followed by Rule?
list_tail.hpp:   using list_tail = seq< Rule, star< Sep, Rule >, opt< Sep > >;  // This issue.
list_tail_pad.hpp:   using list_tail_pad = seq< Rule, star< pad< Sep, Pad >, Rule >, opt< star< Pad >, Sep > >;  // Same as list_tail<>.
rep_min.hpp:   using rep_min = seq< rep< Min, Rule, Rules... >, star< Rule, Rules... > >;  // Same as list<>

from pegtl.

gitamohr avatar gitamohr commented on June 5, 2024

Ah I see, thank you both!

from pegtl.

gitamohr avatar gitamohr commented on June 5, 2024

Thanks again @skyrich62 and @ColinH ! Here's a probably naive question if you'll indulge me. It seems like something that works like at<whole_grammar> could in principle build a list of the actions to apply as it parses, without applying them, deleting those for rules that are backtracked. Then they could be applied once at the end, without invoking any actions for rules that were backtracked. Does that make sense? Is something like that possible today in PEGTL?

Writing the grammar in a way that minimizes backtracking would still be important, and there would be overhead associated with building the list. But for smaller languages & inputs it seems appealing, at least from my green perspective. Cheers!

from pegtl.

gitamohr avatar gitamohr commented on June 5, 2024

Cool, thank you!

from pegtl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.