Code Monkey home page Code Monkey logo

Comments (6)

mqnc avatar mqnc commented on September 17, 2024 1

Thanks, I really appreciate your support and how quickly you implement feature requests!
I didn't test the capture scopes yet. But your tests look good. I think you can close the issue and if I find something, I can reopen it.

I'm making progress with my project, you can see it here if you are interested:
https://github.com/mqnc/c-sick

from cpp-peglib.

yhirose avatar yhirose commented on September 17, 2024

@mqnc, thank you for the feedback. Since the CustomDelimiter is determined at run-time, the C++11 raw string literal cannot be handled with just the original PEG grammar. In order to support it, we need something like the regex capturing/backreferences feature.

I have just implemented the capturing/backreferences feature with this pull request. It can handle the C++11 raw string literal. As you see, we can now capture text with $name<rules> and back-reference it with $name later.

Here are the test cases for your reference.

TEST_CASE("Back reference test", "[back reference]")
{
    parser parser(R"(
        START  <- _ LQUOTE < (!RQUOTE .)* > RQUOTE _
        LQUOTE <- 'R"' $delm< [a-zA-Z]* > '('
        RQUOTE <- ')' $delm '"'
        ~_     <- [ \t\r\n]*
    )");

    std::string token;
    parser["START"] = [&](const SemanticValues& sv) {
        token = sv.token();
    };

    {
        token.clear();
        auto ret = parser.parse(R"delm(
            R"("hello world")"
        )delm");

        REQUIRE(ret == true);
        REQUIRE(token == "\"hello world\"");
    }

    {
        token.clear();
        auto ret = parser.parse(R"delm(
            R"foo("(hello world)")foo"
        )delm");

        REQUIRE(ret == true);
        REQUIRE(token == "\"(hello world)\"");
    }

    {
        token.clear();
        auto ret = parser.parse(R"delm(
            R"foo("(hello world)foo")foo"
        )delm");

        REQUIRE(ret == false);
        REQUIRE(token == "\"(hello world");
    }

    {
        token.clear();
        auto ret = parser.parse(R"delm(
            R"foo("(hello world)")bar"
        )delm");

        REQUIRE(ret == false);
        REQUIRE(token.empty());
    }
}

TEST_CASE("Invalid back reference test", "[back reference]")
{
    parser parser(R"(
        START  <- _ LQUOTE (!RQUOTE .)* RQUOTE _
        LQUOTE <- 'R"' $delm< [a-zA-Z]* > '('
        RQUOTE <- ')' $delm2 '"'
        ~_     <- [ \t\r\n]*
    )");

    REQUIRE_THROWS_AS(
        parser.parse(R"delm(
            R"foo("(hello world)")foo"
        )delm"),
        std::runtime_error);
}

from cpp-peglib.

mqnc avatar mqnc commented on September 17, 2024

Oh, thank you! I will check it out.

from cpp-peglib.

mqnc avatar mqnc commented on September 17, 2024

I played around a bit with it. It works great with c++ multiline string literals which is what I need it for.
However, it does not support recursion, right? As far as I understood, you write the captures into a map and if you encounter one of the captured strings in a back reference, it matches, right?
So you cannot check correct HTML with this, right?

<b>bold<u>and underlined</u></b>
<b>bold<u>and underlined</b></u>

Except if you create different rules for different nesting levels I suppose...

(I insert all these "right?" not to make fun of you but because I am new to the whole parsing business, this is highly complicated for me and I want to convey my insecurity)

I think there might be another problem: If you set a capture in a branch where it turns out that it doesn't match and you backtrack to before the branch started and then encounter a backreference into the false branch, it incorrectly matches. Let me illustrate:

parser parser(R"(
        TREE <- WRONG_BRANCH / CORRECT_BRANCH
		WRONG_BRANCH <- BRANCH THAT IS_capture WRONG
		CORRECT_BRANCH <- BRANCH THAT IS_backref CORRECT
		BRANCH <- 'branch'
		THAT <- 'that'
		IS_capture <- $ref<..>
		IS_backref <- $ref
		WRONG <- 'wrong'
		CORRECT <- 'correct'
    )");

This incorrectly matches parser.parse(R"(branchthatiscorrect)", val);

So I think this is not a problem for me since c++ string literals are not nested. But maybe it can lead to unexpected behavior for some people.

EDIT:

Maybe a way to deal with this would be to allow backreferences only inside the rule where they were captured and store the capture in that rule. This would break your test, as you would have to place it all into a single rule, but maybe that's the only way. I don't know.

from cpp-peglib.

yhirose avatar yhirose commented on September 17, 2024

You are completely right! I'll try to fix the problems by introducing 'capture scope' concept. I'll get back to you when I finish the implementation.

from cpp-peglib.

yhirose avatar yhirose commented on September 17, 2024

The backreference feature now supports the recursion.

TEST_CASE("Nested capture test", "[backreference]")
{
    parser parser(R"(
        ROOT      <- CONTENT
        CONTENT   <- (ELEMENT / TEXT)*
        ELEMENT   <- $(STAG CONTENT ETAG)
        STAG      <- '<' $tag< TAG_NAME > '>'
        ETAG      <- '</' $tag '>'
        TAG_NAME  <- 'b' / 'u'
        TEXT      <- TEXT_DATA
        TEXT_DATA <- ![<] .
    )");

    REQUIRE(parser.parse("This is <b>a <u>test</u> text</b>."));
    REQUIRE(!parser.parse("This is <b>a <u>test</b> text</u>."));
    REQUIRE(!parser.parse("This is <b>a <u>test text</b>."));
}

As you see the above, $( ... ) creates a new capture scope.

I also fixed the problem with the prioritized choice that you mentioned. If a branch fails to match, all the captures in the branch will be cleared out.

TEST_CASE("Backreference with Prioritized Choice test", "[backreference]")
{
    parser parser(R"(
        TREE           <- WRONG_BRANCH / CORRECT_BRANCH
        WRONG_BRANCH   <- BRANCH THAT IS_capture WRONG
        CORRECT_BRANCH <- BRANCH THAT IS_backref CORRECT
        BRANCH         <- 'branch'
        THAT           <- 'that'
        IS_capture     <- $ref<..>
        IS_backref     <- $ref
        WRONG          <- 'wrong'
        CORRECT        <- 'correct'
    )");

    REQUIRE_THROWS_AS(parser.parse("branchthatiscorrect"), std::runtime_error);
}

If you find any other potential problems, please let me know, otherwise I'll close this issue.
Thanks for your help!

from cpp-peglib.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.