Comments (6)
Thanks, I really appreciate your support and how quickly you implement feature requests!
I didn't test the capture scopes yet. But your tests look good. I think you can close the issue and if I find something, I can reopen it.
I'm making progress with my project, you can see it here if you are interested:
https://github.com/mqnc/c-sick
from cpp-peglib.
@mqnc, thank you for the feedback. Since the CustomDelimiter is determined at run-time, the C++11 raw string literal cannot be handled with just the original PEG grammar. In order to support it, we need something like the regex capturing/backreferences feature.
I have just implemented the capturing/backreferences feature with this pull request. It can handle the C++11 raw string literal. As you see, we can now capture text with $name<rules> and back-reference it with $name later.
Here are the test cases for your reference.
TEST_CASE("Back reference test", "[back reference]")
{
parser parser(R"(
START <- _ LQUOTE < (!RQUOTE .)* > RQUOTE _
LQUOTE <- 'R"' $delm< [a-zA-Z]* > '('
RQUOTE <- ')' $delm '"'
~_ <- [ \t\r\n]*
)");
std::string token;
parser["START"] = [&](const SemanticValues& sv) {
token = sv.token();
};
{
token.clear();
auto ret = parser.parse(R"delm(
R"("hello world")"
)delm");
REQUIRE(ret == true);
REQUIRE(token == "\"hello world\"");
}
{
token.clear();
auto ret = parser.parse(R"delm(
R"foo("(hello world)")foo"
)delm");
REQUIRE(ret == true);
REQUIRE(token == "\"(hello world)\"");
}
{
token.clear();
auto ret = parser.parse(R"delm(
R"foo("(hello world)foo")foo"
)delm");
REQUIRE(ret == false);
REQUIRE(token == "\"(hello world");
}
{
token.clear();
auto ret = parser.parse(R"delm(
R"foo("(hello world)")bar"
)delm");
REQUIRE(ret == false);
REQUIRE(token.empty());
}
}
TEST_CASE("Invalid back reference test", "[back reference]")
{
parser parser(R"(
START <- _ LQUOTE (!RQUOTE .)* RQUOTE _
LQUOTE <- 'R"' $delm< [a-zA-Z]* > '('
RQUOTE <- ')' $delm2 '"'
~_ <- [ \t\r\n]*
)");
REQUIRE_THROWS_AS(
parser.parse(R"delm(
R"foo("(hello world)")foo"
)delm"),
std::runtime_error);
}
from cpp-peglib.
Oh, thank you! I will check it out.
from cpp-peglib.
I played around a bit with it. It works great with c++ multiline string literals which is what I need it for.
However, it does not support recursion, right? As far as I understood, you write the captures into a map and if you encounter one of the captured strings in a back reference, it matches, right?
So you cannot check correct HTML with this, right?
<b>bold<u>and underlined</u></b>
<b>bold<u>and underlined</b></u>
Except if you create different rules for different nesting levels I suppose...
(I insert all these "right?" not to make fun of you but because I am new to the whole parsing business, this is highly complicated for me and I want to convey my insecurity)
I think there might be another problem: If you set a capture in a branch where it turns out that it doesn't match and you backtrack to before the branch started and then encounter a backreference into the false branch, it incorrectly matches. Let me illustrate:
parser parser(R"(
TREE <- WRONG_BRANCH / CORRECT_BRANCH
WRONG_BRANCH <- BRANCH THAT IS_capture WRONG
CORRECT_BRANCH <- BRANCH THAT IS_backref CORRECT
BRANCH <- 'branch'
THAT <- 'that'
IS_capture <- $ref<..>
IS_backref <- $ref
WRONG <- 'wrong'
CORRECT <- 'correct'
)");
This incorrectly matches parser.parse(R"(branchthatiscorrect)", val);
So I think this is not a problem for me since c++ string literals are not nested. But maybe it can lead to unexpected behavior for some people.
EDIT:
Maybe a way to deal with this would be to allow backreferences only inside the rule where they were captured and store the capture in that rule. This would break your test, as you would have to place it all into a single rule, but maybe that's the only way. I don't know.
from cpp-peglib.
You are completely right! I'll try to fix the problems by introducing 'capture scope' concept. I'll get back to you when I finish the implementation.
from cpp-peglib.
The backreference feature now supports the recursion.
TEST_CASE("Nested capture test", "[backreference]")
{
parser parser(R"(
ROOT <- CONTENT
CONTENT <- (ELEMENT / TEXT)*
ELEMENT <- $(STAG CONTENT ETAG)
STAG <- '<' $tag< TAG_NAME > '>'
ETAG <- '</' $tag '>'
TAG_NAME <- 'b' / 'u'
TEXT <- TEXT_DATA
TEXT_DATA <- ![<] .
)");
REQUIRE(parser.parse("This is <b>a <u>test</u> text</b>."));
REQUIRE(!parser.parse("This is <b>a <u>test</b> text</u>."));
REQUIRE(!parser.parse("This is <b>a <u>test text</b>."));
}
As you see the above, $(
... )
creates a new capture scope.
I also fixed the problem with the prioritized choice that you mentioned. If a branch fails to match, all the captures in the branch will be cleared out.
TEST_CASE("Backreference with Prioritized Choice test", "[backreference]")
{
parser parser(R"(
TREE <- WRONG_BRANCH / CORRECT_BRANCH
WRONG_BRANCH <- BRANCH THAT IS_capture WRONG
CORRECT_BRANCH <- BRANCH THAT IS_backref CORRECT
BRANCH <- 'branch'
THAT <- 'that'
IS_capture <- $ref<..>
IS_backref <- $ref
WRONG <- 'wrong'
CORRECT <- 'correct'
)");
REQUIRE_THROWS_AS(parser.parse("branchthatiscorrect"), std::runtime_error);
}
If you find any other potential problems, please let me know, otherwise I'll close this issue.
Thanks for your help!
from cpp-peglib.
Related Issues (20)
- [Question] Multiline object with matched deliminators HOT 2
- Need help understanding what my parser is doing :) HOT 2
- QUESTION: How does one use "apd" (and predicate operator) HOT 1
- Tests fail to compile HOT 3
- Performance comparison with boost::spirit, PEGTL etc.? HOT 2
- Tag 1.8.5 needed HOT 1
- String token rule problem HOT 2
- Warnings due to -Woverloaded-virtual HOT 3
- Grammar performance, dictionary case insensitive not accepted HOT 3
- Inconsistent syntax error for the attached grammar HOT 1
- Use named capture in error message? HOT 1
- Last-resort failure HOT 7
- Expose a choice field for SemanticValues in dictionaries HOT 1
- Whitespace rule can spoil error messages HOT 7
- Parsing strings containing `#` HOT 1
- Severe performance regression in 1.8.6 HOT 8
- I wrote a page about cpp-peglib HOT 2
- Compatibility with CPM package manager HOT 3
- Nondeterministic parsing failure in grammar HOT 4
- C++11 Version? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cpp-peglib.