Code Monkey home page Code Monkey logo

pegtl's Introduction

Welcome to the PEGTL

Windows macOS Linux Android
clang-analyze clang-tidy Sanitizer CodeQL Codecov

The Parsing Expression Grammar Template Library (PEGTL) is a zero-dependency C++ header-only parser combinator library for creating parsers according to a Parsing Expression Grammar (PEG).

During development of a new major version the main branch can go through incompatible changes. For a stable experience please download the latest release rather than using the main branch.

Documentation

Contact

For questions and suggestions regarding the PEGTL, success or failure stories, and any other kind of feedback, please feel free to open a discussion, an issue or a pull request, or contact the authors at taocpp(at)icemx.net.

Introduction

Grammars are written as regular C++ code, created with template programming (not template meta programming), i.e. nested template instantiations that naturally correspond to the inductive definition of PEGs (and other parser-combinator approaches).

A comprehensive set of parser rules that can be combined and extended by the user is included, as are mechanisms for debugging grammars, and for attaching user-defined actions to grammar rules. Here is an example of how a parsing expression grammar rule is implemented as C++ class with the PEGTL.

// PEG rule for integers consisting of a non-empty
// sequence of digits with an optional sign:

// sign ::= '+' / '-'
// integer ::= sign? digit+

// The same parsing rule implemented with the PEGTL:

using namespace tao::pegtl;

struct sign : one< '+', '-' > {};
struct integer : seq< opt< sign >, plus< digit > > {};

PEGs are superficially similar to Context-Free Grammars (CFGs), however the more deterministic nature of PEGs gives rise to some very important differences. The included grammar analysis finds several typical errors in PEGs, including left recursion.

Design

The PEGTL is designed to be "lean and mean", the core library consists of approximately 6000 lines of code. Emphasis is on simplicity and efficiency, preferring a well-tuned simple approach over complicated optimisations.

The PEGTL is mostly concerned with parsing combinators and grammar rules, and with giving the user of the library (the possibility of) full control over all other aspects of a parsing run. Whether/which actions are taken, and whether/which data structures are created during a parsing run, is entirely up to the user.

Included are some examples for typical situation like unescaping escape sequences in strings, building a generic JSON data structure, and on-the-fly evaluation of arithmetic expressions.

Through the use of template programming and template specialisations it is possible to write a grammar once, and use it in multiple ways with different (semantic) actions in different (or the same) parsing runs.

With the PEG formalism, the separation into lexer and parser stages is usually dropped -- everything is done in a single grammar. The rules are expressed in C++ as template instantiations, and it is the compiler's task to optimise PEGTL grammars.

Status

Each commit is automatically tested with multiple architectures, operating systems, compilers, and versions thereof.

Each commit is checked with the GCC and Clang sanitizers, Clang's Static Analyzer, and clang-tidy. Additionally, we use CodeQL to scan for (security) issues.

Code coverage is automatically measured and the unit tests cover 100% of the core library code (for releases).

Releases are done in accordance with Semantic Versioning. Incompatible API changes are only allowed to occur between major versions.

Thank You

In appreciation of all contributions here are the people that have directly contributed to the PEGTL and/or its development.

amphaal anand-bala andoma barbieri bjoe bwagner cdiggins clausklein delpinux dkopecek gene-hightower irrequietus jedelbo joelfrederico johelegp jovermann jubnzv kelvinhammond kneth kuzmas lambdafu lichray michael-brade mkrupcale newproggie obiwahn ohanar pauloscustodio pleroux0 quadfault quarticcat ras0219 redmercury robertcampion samhocevar sanssecours sgbeal skyrich62 studoot svenjo wickedmic wravery zhihaoy

The Art of C++

The PEGTL is part of The Art of C++.

colinh d-frey uilianries

License

Open Source Initiative

Copyright (c) 2007-2023 Daniel Frey and Dr. Colin Hirsch

The PEGTL is certified Open Source software. It is licensed under the terms of the Boost Software License, Version 1.0 reproduced here.

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization obtaining a copy of the software and accompanying documentation covered by this license (the "Software") to use, reproduce, display, distribute, execute, and transmit the Software, and to prepare derivative works of the Software, and to permit third-parties to whom the Software is furnished to do so, all subject to the following:

The copyright notices in the Software and this entire statement, including the above license grant, this restriction and the following disclaimer, must be included in all copies of the Software, in whole or in part, and all derivative works of the Software, unless such copies or derivative works are solely in the form of machine-executable object code generated by a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

pegtl's People

Contributors

baconpaul avatar bjoe avatar colinh avatar cropi avatar d-frey avatar delpinux avatar gitamohr avatar jbomanson avatar jedelbo avatar joelfrederico avatar johelegp avatar kelvinhammond avatar kuzmas avatar lambdafu avatar lichray avatar michael-brade avatar mkrupcale avatar pauloscustodio avatar pleroux0 avatar quadfault avatar ras0219 avatar redmercury avatar robertcampion avatar samhocevar avatar sanssecours avatar striezel avatar studoot avatar uilianries avatar wravery avatar zhihaoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pegtl's Issues

Need hierarchical action handling

I'm writing a parser for the Verilog language using PEGTL and wanted to share a problem I'm running into and a possible solution. This is my first time using PEG-style parsing, although I am far from a novice at parsing, having done work for my PhD thesis on approximate parsing in a network security context (google "flowsifter").

I can appreciate the simplicity and elegance (and incredible run-time efficiency) of having actions associated with each rule run when that rule is matched; this combines with the control system to make a very powerful way to write code that gets triggered by parsing. But I'm particularly annoyed at how unnecessarily difficult it is to write side-effect heavy code that works correctly without these pieces of code having a hierarchical context to work in.

In the best parsers I've worked with (and written), when part of the grammar matches, it can return a value to the next level up, and that value can be used as part of constructing the result of that higher-level rule's parse result. This ends up resulting in a sequence of function calls that are nested in exactly the same structure as the parse tree of the text being parsed. In PEGTL, I can decide exactly what code runs when a particular rule matches, but I get no nesting of function calls, I only get a flat space of code executions. Of course it's possible to implement that hierarchy by using an explicit stack and pushing and popping from the stack as rules are matched, but this has a high degree of complexity and is prone to user error. The expression parser example in this repository even goes as far as using a stack of stacks to handle parentheses in expressions.

There's got to be a reasonable way to stitch together rules and actions in such a way that actions can return a value and the action for a rule receives/can access values produced by child rules. Maybe this will require each child rule to have an action that returns a value; that seems a reasonable price to pay for this feature.

REQ: return error information when parsing fails

When a rule uses must then the exception reveals error information related to place and rule where parsing has failed. However there seems to be no way to get such or similar information for rules w/o must. If it doesn't hurt the current design of the PEGTL then it'd be a valuable addition to provide some way to obtain error information in any case when user needs it.

compile issue on OSX

Hi,
I pulled down the code, and tried to build on OS X 10.9.5.
It failed though as shown below, I didn't have time to look at it more,
but i thought might be useful to know,
Regards
Jiri

System details:

[2015.01.19:11.11][Jiri@jirimbp:~/lib/cpp/PEGTL]$  g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.0 (clang-600.0.56) (based on LLVM 3.5svn)
Target: x86_64-apple-darwin13.4.0
Thread model: posix

It fails with the following message:

c++ -I. -std=c++11 -pedantic -Wall -Wextra -Werror -O3 source/modulus_match.cc -o build/source/modulus_match
In file included from source/modulus_match.cc:4:
In file included from ./pegtl.hh:8:
In file included from ./pegtl/parse.hh:14:
In file included from ./pegtl/internal/rule_match_help.hh:11:
In file included from ./pegtl/internal/rule_match_impl.hh:10:
./pegtl/internal/rule_match_call.hh:17:88: error: 'match' following the 'template' keyword does not
      refer to a template
  ...auto match( Input & in, States && ... st ) -> decltype( Rule::template match< E, Action, Con...
                                                             ~~~~~~         ^
./pegtl/internal/rule_match_impl.hh:43:18: note: in instantiation of template class
      'pegtl::internal::rule_match_call<modulus::my_rule<3, 0>, 1, nothing, normal>' requested here
            if ( rule_match_call< Rule, error_mode::THROW, Action, Control >::match( in, st ...
                 ^
./pegtl/internal/rule_match_help.hh:20:173: note: in instantiation of function template
      specialization 'pegtl::internal::rule_match_impl<modulus::my_rule<3, 0>, 1, nothing, normal,
      1>::match<pegtl::input>' requested here
  ...Action< Rule > >::value ? apply_here::NOTHING : apply_here::ACTION >::template match( in, s...
                                                                                    ^
./pegtl/internal/rule_conjunction_impl.hh:31:20: note: in instantiation of function template
      specialization 'pegtl::internal::rule_match_help<modulus::my_rule<3, 0>, 1, nothing, normal,
      pegtl::input>' requested here
            return rule_match_help< Rule, E, Action, Control >( in, st ... ) && rule_conjunc...
                   ^
./pegtl/internal/until.hh:47:88: note: in instantiation of function template specialization
      'pegtl::internal::rule_conjunction_impl<modulus::my_rule<3, 0> >::match<1, nothing, normal,
      pegtl::input>' requested here
  ...if ( in.empty() || ! rule_conjunction_impl< Rule, Rules ... >::template match< E, Action, Co...
                                                                             ^
./pegtl/internal/rule_match_call.hh:19:35: note: in instantiation of function template
      specialization 'pegtl::internal::until<pegtl::ascii::eolf, modulus::my_rule<3, 0> >::match<1,
      nothing, normal, pegtl::input>' requested here
            return Rule::template match< E, Action, Control >( in, st ... );
                                  ^
./pegtl/internal/rule_match_impl.hh:43:79: note: in instantiation of function template
      specialization 'pegtl::internal::rule_match_call<modulus::grammar, 1, nothing,
      normal>::match<pegtl::input>' requested here
            if ( rule_match_call< Rule, error_mode::THROW, Action, Control >::match( in, st ...
                                                                              ^
./pegtl/internal/rule_match_help.hh:20:173: note: in instantiation of function template
      specialization 'pegtl::internal::rule_match_impl<modulus::grammar, 1, nothing, normal,
      1>::match<pegtl::input>' requested here
  ...Action< Rule > >::value ? apply_here::NOTHING : apply_here::ACTION >::template match( in, s...
                                                                                    ^
./pegtl/parse.hh:21:17: note: in instantiation of function template specialization
      'pegtl::internal::rule_match_help<modulus::grammar, 1, nothing, normal, pegtl::input>'
      requested here
      internal::rule_match_help< Rule, error_mode::THROW, Action, Control >( in, st ... );
                ^
./pegtl/parse.hh:28:7: note: in instantiation of function template specialization
      'pegtl::parse<modulus::grammar, nothing, normal>' requested here
      parse< Rule, Action, Control >( in, st ... );
      ^
source/modulus_match.cc:33:14: note: in instantiation of function template specialization
      'pegtl::parse<modulus::grammar, nothing, normal>' requested here
      pegtl::parse< modulus::grammar >( 1, argv );
             ^
1 error generated.
make: *** [build/source/modulus_match] Error 1

Is there an easy way to obtain the position *range* of a matched PEG expression?

Hi,
Currently, I have action<> specializations with apply() method to acquire the matched text, eg:

template<> struct action<number>
{
    template <typename Input>
    static void apply(const Input &in, SomeState &state)
    {
         // here I capture the text via "in.string()", and the starting position via "in.position()"
         ...
    }
};

Is there an easy way to obtain the position at the end of the matched text?
Right now, I'm iterating over the string, increment byte, and byte_in_line, (unless I see a newline), in which case, I bump the line counter, and reset the byte_in_line counter. It occurs to me that the parser must already have this information somewhere, and I'm just wasting CPU cycles. :(

Also, I have a small "calculator" project which creates and AST from the input, then evaluates the AST to perform the calculations. Let me know if you're interesting in seeing it.
--Rich

Build fails with MinGW

Using CMake and MinGW to build the library, I get the following error message:

Scanning dependencies of target abnf2pegtl
[ 80%] Building CXX object src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp: In lambda function:
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:177:98: error: '::strcasecmp' has not been declared
          return std::find_if( rbegin, rules.rend(), [&]( const rules_t::value_type& p ) { return ::strcasecmp( p.first.c_str(), v.c_str() ) == 0; } );
                                                                                                  ^~
In file included from c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algobase.h:71:0,
                 from c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\algorithm:61,
                 from D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:4:
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\predefined_ops.h: In instantiation of 'bool __gnu_cxx::__ops::_Iter_pred<_Predicate>::operator()(_Iterator) [with _Iterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)>]':
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:120:14:   required from '_RandomAccessIterator std::__find_if(_RandomAccessIterator, _RandomAccessIterator, _Predicate, std::random_access_iterator_tag) [with _RandomAccessIterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = __gnu_cxx::__ops::_Iter_pred<abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)> >]'
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:161:23:   required from '_Iterator std::__find_if(_Iterator, _Iterator, _Predicate) [with _Iterator = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = __gnu_cxx::__ops::_Iter_pred<abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)> >]'
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\stl_algo.h:3817:28:   required from '_IIter std::find_if(_IIter, _IIter, _Predicate) [with _IIter = std::reverse_iterator<__gnu_cxx::__normal_iterator<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> >*, std::vector<std::pair<std::__cxx11::basic_string<char>, std::__cxx11::basic_string<char> > > > >; _Predicate = abnf2pegtl::data::find_rule(const string&, const reverse_iterator&)::<lambda(const value_type&)>]'
D:\Users\Daniel\Programming\PEGTL-master\src\example\pegtl\abnf2pegtl.cpp:177:149:   required from here
c:\mingw\lib\gcc\mingw32\6.3.0\include\c++\bits\predefined_ops.h:234:11: error: void value not ignored as it ought to be  { return bool(_M_pred(*__it)); }
           ^~~~~~~~~~~~~~~~~~~~
src\example\pegtl\CMakeFiles\abnf2pegtl.dir\build.make:62: recipe for target 'src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj' failed
mingw32-make[2]: *** [src/example/pegtl/CMakeFiles/abnf2pegtl.dir/abnf2pegtl.cpp.obj] Error 1
CMakeFiles\Makefile2:3260: recipe for target 'src/example/pegtl/CMakeFiles/abnf2pegtl.dir/all' failed
mingw32-make[1]: *** [src/example/pegtl/CMakeFiles/abnf2pegtl.dir/all] Error 2
Makefile:139: recipe for target 'all' failed
mingw32-make: *** [all] Error 2

<cstring> seems to be included in the source, I don't know why it doesn't seem to be able to resolve strcasecmp...

Include paths problem on OSX

Hi !

I was unable to compile on OSX due to the compiler not finding the file type_traits, and I found that it needs the -stdlib option set to libc++ to look in the good places.

I suggest adding it to the Makefile, as I've done, if it doesn't break compiling on other OSs.

REQ: stop parsing from action

I understand that it's possible to abort parsing by throwing an exception but it would be nice to have ability to stop parsing depending on action result.
E.g., instead of current signature an action can also return result:

template<> struct action< SOME >
{
  template <class T, class... ARGS>
  static bool apply(T const& in, ARGS&&...)
  {
    if (CONDITION) return false;
    return true;
  }
};

enable actions with at

Is there any chance to enable actions in the at rule?
I have tried to use

    enable< at<my_rule> >

and

   at< enable<my_rule> >

but for both grammar expressions the corresponding action is never called...

Add possibility for pegtl::control to not throw exception on global error

Many, many thanks to your brilliant code!
Just discovered it and it seems to be a killer solution for my task.
However I need to parse messages and can't afford to wrap decoding in try/catch blocks since it's costly. Rather would be enough to get true/false from parse. Perhaps I couldn't find it in docs but trying to provide custom control doesn't help since the PEGTL code will call std::abort if no exception thrown.

Could you please consider some control policy to inhibit exceptions?

Q: raise failure from action?

Is it possible to raise global error from an action?
E.g., a grammar accepts 1..3 digits but there's additional constraint on the value in these digits, namely TTL can be 0..255 but not 256 or 999 which is valid by the grammar itself.
In the action I can add a check for the range but how to trigger error?
Tried to use throw tao::pegtl::parse_error but can't compile since:
const class tao::pegtl::internal::action_input<tao::pegtl::lf_crlf_eol, (tao::pegtl::tracking_mode)1u>' has no member named 'position'

internal::file_reader fails on multi-lined Windows files

Hi, I appear to have found a slight bug in internal::file_reader.

If a file with multiple newlines is created on Windows, file_reader fails with "unable to fread() file size errno 0". Creating the same file on Linux, or converting with 'dos2unix', causes file_reader to successfully parse the file. Conversely, a single line Windows file will also parse successfully. The core of the problem seems to be due to Windows using "\r\n" to indicate newlines, or more specifically, the different ways fseek/ftell and fread count the "\r\n" sequence.

In file_reader::read(), fread is given the number of characters it is expected to read. That number is calculated in file_reader::size(), which uses fseek/ftell to count the number of characters in the file (counting "\r\n" as two characters). However, std::fread automatically converts all "\r\n" to '\n', causing it to actually read in a smaller memory chunk than told. Since it is unable to read in the specified number of characters, it returns 0, causing the if check to fail and throwing the error shown above. In all other respects, the read completed successfully.

Also, if you stop execution right before the throw and look at the constructed string, the last N characters of the string are '\0', with N being the exact number of newlines in the file.

Compilation issue with Visual Studio 2015 CTP

The VS2015 technology preview is the only version that almost manages to build PEGTL. However, it fails to understand the decltype+comma SFINAE trick in rule_match_call.hh:

1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(26): error C2535: 'unknown-type
    pegtl::internal::rule_match_call<Rule,E,Action,Control>::match(Input &,States &&...)':
    member function already defined or declared
1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(17): note: see declaration of
    'pegtl::internal::rule_match_call<Rule,E,Action,Control>::match'
1>c:\users\sam\pegtl\pegtl\internal\rule_match_call.hh(27): note: see reference to class
    template instantiation 'pegtl::internal::rule_match_call<Rule,E,Action,Control>' being
    compiled

Even though this is a compiler bug, do you think a workaround is possible? I have tried a few things but failed to come up with something that works.

missing expression.cc example?

Hi,

First, wow! Kudos to you, this is very cool.

Your documentation says that there should be examples/expression.cc which is the same as the calculator example, but builds an parse tree and operates on that. I'm either incredibly blind, or it's not in the examples directory.

I'd really like to see a good example of how to build a parse tree with this parser.

Incorrect memory_input parameters in string_input.hpp

Hi !
First and foremost, it's a real pleasure to use PEGTL, thank you !

When using string_input like:

std::string s{ "something to parse" };
tao::pegtl::string_input<> in(s);

a compiler error occurs.

Obviously in string_input.hpp:42 the second parameter to memory_input is not what is expected.

Changing it to data.data() + data.size() fixes the problem.

Hope this helps.

Best.

binary data and range<C,D>

For parsing binary data, it would be useful to have range<C, D> for unsigned char. Otherwise, using a range above 0x7f fails to match properly due to signed conversion (e.g., (char)3 is not matched by range<0, (char)0xbf>).

Should be a straightforward application of internal range with an appropriate peek function.

Of course, there is a workaround by writing a custom rule.

opening files with non-ANSI filenames on Windows

Thank you for maintaining PEGTL.
My understanding is that currently file_input and read_input don't work with Unicode filenames on Windows? Because I see that internal::file_reader uses fopen(_s) not _wfopen(_s).
If that's true, this is a feature request to add support for Unicode filenames.

pegtl-2.1.3 doesn't build with gcc-7.1

[ 22%] Building CXX object src/test/pegtl/CMakeFiles/internal_file_opener.dir/internal_file_opener.cpp.o
cd /var/tmp/portage/dev-libs/pegtl-2.1.3/work/pegtl-2.1.3_build/src/test/pegtl && /usr/lib64/ccache/bin/x86_64-pc-linux-gnu-g++   -I/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/include   -DNDEBUG -O2 -pipe -march=native -fomit-frame-pointer   -pedantic -Wall -Wextra -Wshadow -Werror -std=c++11 -o CMakeFiles/internal_file_opener.dir/internal_file_opener.cpp.o -c /var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/internal_file_opener.cpp
/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/contrib_raw_string.cpp:59:42: error: declaration of template parameter ‘Rule’ shadows template parameter
       template< typename Rule, template< typename Rule > class Action, unsigned M, unsigned N >
                                          ^~~~~~~~
/var/tmp/portage/dev-libs/pegtl-2.1.3/work/PEGTL-2.1.3/src/test/pegtl/contrib_raw_string.cpp:59:17: note: template parameter ‘Rule’ declared here
       template< typename Rule, template< typename Rule > class Action, unsigned M, unsigned N >
                 ^~~~~~~~
make[2]: *** [src/test/pegtl/CMakeFiles/contrib_raw_string.dir/build.make:63: src/test/pegtl/CMakeFiles/contrib_raw_string.dir/contrib_raw_string.cpp.o] Error 1

Details here: build.txt

The README hello world is not up-to-date with the source

When running the hello world in the README.md I get:

/Users/rfonseca/Documents/workspace-CPP/SeqScan/src/testpegtl.cc:84:35: error: no type named 'input' in namespace 'pegtl'
  static void apply( const pegtl::input & in, std::string & name )
                           ~~~~~~~^

input wrapping issue

There seems to be some internal wrapping mechanism when working with large input strings. I see several lines in the trace which look like this:

pegtl: success flags 1 rule    1 nest   1 at 15,23 expression ...
pegtl: start   flags 2 rule    1 nest   1 at 1,1 expression ...

I'm occasionally experiencing a bug where a rule which works otherwise doesn't work immediately following one of these discontinuities. I can try to work up a minimal example if that would help.

Win32 compile errors

Hi, great to see the new project structure & hosting! Have you any interest in making pegtl work on win32? Here's a quick sample of some of the issues I found:

Warnings:

  • not all control paths return a value (unhex_char, unescape_c)
  • a number of warnings about buffer overruns (see the end for a sample)

Most other issues seem to be related to partial c++11 support.

  • inline namespace not supported (ascii.hh)
  • pegtl\examples\json_build_one.cc(93): error C2783: 'std::shared_ptr<_Ty> std::make_shared(_Types &&...)' : could not deduce template argument for '_Ty'

11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 5
11>s:\source\pegtl\pegtl\internal\sor.hh(26): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\sor.hh(26): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 4
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 5
11>s:\source\pegtl\pegtl\internal\rule_conjunction.hh(21): warning C4789: buffer '' of size 4 bytes will be overrun; 1 bytes will be written starting at offset 6

Anonymous rules must_err<>

I think it might be a good idea in terms of convinience to extend the availible match rules by must_err<R, ERR>.
The thought behind that is defining an Error message for an anonymous set of rules.
The implementation of must<R>is described as:

sor< R, raise< R > >

Having a scenario as following:

struct foo : seq< Spacing, string<'v', 'a', 'r'>, must< Spacing, Identifier, Spacing > > {};

I would need to define an error_control for this anonymous segment. Rather that or naming this sequence but I really would like to avoid that. Also this is ambigious, as Spacing, Identifier, Spacing is a very generic term, which may need very different error messages, depending of it's apperance inside a rule.
What I would suggest instead is:

struct foo : seq< Spacing, string<'v', 'a', 'r'>, must_err< seq< Spacing, Identifier, Spacing >, ERR_MSG("var must be followed by an Identifier") > > {};

Where ERR_MSG( "..." ) is a macro like TAO_PEGTL_STRING( "..." ), just returning something more simple than string<> (without match method).

must_err<> then rises an tao::pegtl::parse_error, with msg ERR_MSG( "..." ). While overwriting the content of msg with the custom message (together with filename / linenumber etc..), I would suggest parse_error would get some more methods/members, that give access to:

  • Filename (if any)
  • Line number
  • Character position
  • the "pure" error message
  • maybe some stuff I didn't think of right now

I tried it out but as always my template programming skills are very limited.
For the sake of simplicity I extended the given string<> template with a static data() method and created my own rule must_err<>.

         template< char... Cs >
         struct string
         {
           ... 
            static const std::string data() {
                const std::initializer_list< char >& l = { Cs... };
                const std::string str (l.begin(), l.end());
                return str;
            }
         };

And copied and modified the must<> rule

         template< typename Rule, class ERR_MSG = string<'f', 'o', 'o'> >
         struct must_err
         {
            using analyze_t = typename Rule::analyze_t;

            template< apply_mode A,
                      rewind_mode,
                      template< typename... > class Action,
                      template< typename... > class Control,
                      typename Input,
                      typename... States >
            static bool match( Input& in, States&&... st )
            {
               //Actually a raise should happen here
               std::cout << "Error: " << ERR_MSG::data() <<  std::endl;
               return true;
            }
         };

Looking forward on your thoughts on this topic.

Transforming siblings with parse_tree

I want to have a parser that translates the Syntax introduced in Bryan Ford's paper to translate to PEGTL, as I find this Syntax very readable. To do so I try to utilize the parse_tree.h provided with this library. My approch was to use a cutom node struct, that implements a to_String method like this:

struct node {
		std::vector< std::unique_ptr< node > > children;
		std::string str_before, str_after;

		std::string to_String()
		{
			std::string result = str_before;
			for(auto it = children.begin(); it != children.end();) {
				std::string child_str = it->get()->to_String();
				result += child_str;
				if(++it != children.end() && child_str != "") {
					result += ", ";
				}
			}
			return result += str_after;
		}
...
};

To make the actual magic happen I thought to use the transform function like so:

	template< typename > struct store : std::false_type {
	};

	template<> struct store< Sequence > : std::true_type {
		static void transform(std::unique_ptr< PEG2PEGTL::node >& n)
		{
			n.get()->str_before = "sor< ";
			n.get()->str_after = " >";
		}
	};

And it works for me, until it gets a little tricky, since I would need to nest sibling inside each other, if I have a rule like this:

struct Element : seq< opt< Prefix >, Primary, opt< Suffix > > {
	};

If only I could access Element's children from Prefix and/or Suffix, and move them inside itself. I suppose I could do this by using the node's id_ and move the logic inside the Element's transform function. Is that the intended usage? How far off the intended usage am I with this approach?

unexpected behaviour of discard (or bug?)

Hi,

I found that discard breaks input matching in the action. This seems to be a bug? Below you can find a test case that shows two behaviours. I'd have expected either to work.

The guts of it is in this rule:

struct word1 : seq<discard, bytes<4>> {};

In the action, I try to match the four byte input string, but find that the discard (which should throw away the three preceeding characters) throws away the first three characters of bytes<4>.

$ ./discard-test
Unexpected:
INPUT: 3/3: one
INPUT: 4/4: vase
WORD: e
INPUT: 3/3: two
INPUT: 4/4: pots
WORD: s
INPUT: 0/3: 

Expected:
INPUT: 3/3: one
INPUT: 4/4: vase
WORD: vase
INPUT: 3/3: two
INPUT: 4/4: pots
WORD: pots
INPUT: 0/3: 
#include <tao/pegtl.hpp>
#include <stringstream>
#include <iostream>

using namespace tao::pegtl;

struct word1 : seq<discard, bytes<4>> {};
struct grammar1 : star<discard, bytes<3>, word1> {};

struct word2 : seq<bytes<4>> {};
struct grammar2 : star<discard, bytes<3>, discard, word2> {};

template <typename Rule>
struct xaction : nothing<Rule> {};

template <>
struct xaction<word1> {
  template <typename Input>
  static void apply(const Input& in) {
    std::string str = in.string();
  std:
    std::cerr << "WORD: " << str << "\n";
  }
};

template <>
struct xaction<word2> : xaction<word1> {};

int main(int argc, char* argv[]) {
  std::stringstream data;
  data << "onevasetwopots";
  using reader_t =
      std::function<std::size_t(char* buffer, const std::size_t length)>;

  auto reader = [&data](char* buffer, const std::size_t length) mutable {
    std::streamsize sz = data.read(buffer, length).gcount();
    std::cerr << "INPUT: " << sz << "/" << length << ": ";
    std::cerr.write(buffer, sz);
    std::cerr << "\n";
    return sz;
  };
  std::cerr << "Unexpected:\n";
  buffer_input<reader_t> input1("reader", 1024, reader);
  parse<grammar1, xaction>(input1);

  std::cerr << "\nExpected:\n";
  data.clear();
  data.seekg(0, data.beg);
  buffer_input<reader_t> input2("reader", 1024, reader);
  parse<grammar2, xaction>(input2);
  return 0;
}

Add CSV parser example

Hey Colin, great library! I'm looking forward to using it. I actually got here through googling C++ CSV Parser, which took me here where you wrote:

CSV isn't a precisely defined format; on multiple occasions I slapped together a simple CSV parser for whatever file format was thrown my way with our PEGTL parser library (PEG based, C++11, header-only, production quality, small and light - and with documentation).

Unfortunately, I didn't see a CSV parser in your examples folder, and I was hoping you could add one such that I could see how it'd be done idiomatically with PEGTL.

stream parsing support

Is it possible to parse true (unseekable, possibly infinite) streams with PEGTL?

For example, could the calculator demo be modified to parse stdin as a stream, with 'quit' being a parser action?

In my particular case, I have a growing number (1000s) of large (~200GB) files. The files are stored compressed and accessed over a network. I would like to decompress these and pipe the uncompressed stream directly to the parser, to avoid the time penalty associated with decompressing to temporary files just so that they can be parsed. Sadly, I can't decompress entire files into memory and parse them that way, and "chunking" them is not as simple as splitting on newlines (as in the calculator demo).

'template argument 3 is invalid'

Using a modified version of the 'ID and sum comma-separated digits' example:

peg_test.cpp:29:68: error: template argument 3 is invalid
                            seq< plus< D >, opt< dot, star< D > > > > {};
                                                                    ^
In file included from ../inst/include/pegtl/internal/action.hpp:9:0,
                 from ../inst/include/pegtl/internal/rules.hpp:7,
                 from ../inst/include/pegtl/ascii.hpp:11,
                 from ../inst/include/pegtl.hpp:10,
                 from peg_test.cpp:3:

C++11 via GCC 4.9.3; works fine on clang, however!

Accessing line length from file_input<>

Today I tried to use the file_input<> in class to try to find out some information on the characters surrounding the byte position I've got from a thrown parse_error.
My goal is to create an error message like this:

[Error] ./test/testfile.js:2:14(58): Unexpected character in variable statement
>>> var Bar = 0xE var Foo;
                  ^

With the parse_error thrown I've got access to:

  • byte
  • byte_in_line
  • line
  • source

Do you have any suggestions on how to get the error's line's size?
Only thing I could imagine is using in.bump() with the error's offset, and then bump by 1 until eof, or in.position.line() != e.positions().front().line , but I can't seem to find a reset operator. Basicly I'm not able to iterate trogh the Input. What I tried out so far is:

auto err_pos = e.positions.front();
std::cout << "[ERROR] " << e.what() << std::endl;
std::cout << ">>> " << std::string(in.current(), err_pos.byte - err_pos.byte_in_line, err_pos.byte_in_line + 1 ) << std::endl;
std::cout << std::string(4 + err_pos.byte_in_line, ' ') << '^'<< std::endl;

But it obviously end with the first character of the parsed error.
Any suggestions?

Parse tree generating duplicates

I have been using parse_tree.hpp to generate ASTs but there are some rules that always get duplicated in the child nodes. Two consecutive nodes represent exactly the same node.

Is that normal? Am I doing something wrong?

Example:

std::unique_ptr<parse_tree::node> ast = parse_tree::parse<pop::grammar, pop::store>(in);
print_node( *ast );

Output (summarized):

ROOT
  pop::cpp_function at /LevyProblem.pop:15:0(220)
    pop::function_keyword "minimize" at /LevyProblem.pop:15:0(220) to /LevyProblem.pop:15:8(228)
    pop::var_name "levyFunction" at /LevyProblem.pop:15:9(229) to /LevyProblem.pop:15:21(241)
    pop::cpp_brackets "{ expression; }" at /LevyProblem.pop:15:24(244) to /LevyProblem.pop:38:1(948)
  pop::cpp_function at /LevyProblem.pop:15:0(220)
    pop::function_keyword "minimize" at /LevyProblem.pop:15:0(220) to /LevyProblem.pop:15:8(228)
    pop::var_name "levyFunction" at /LevyProblem.pop:15:9(229) to /LevyProblem.pop:15:21(241)
    pop::cpp_brackets "{ expression; }" at /LevyProblem.pop:15:24(244) to /LevyProblem.pop:38:1(948)

RFE: Memoization

It would be nice if PEGTL supported (configurable) memoization to increase performance in situations where backtracking happened.

O_CLOEXEC is not defined on RHEL5

pegtl/internal/file_opener.hpp is now using O_CLOEXEC.
According to a man page on Linux:

O_CLOEXEC (since Linux 2.6.23)

and I was just told that it doesn't compile on CentOS 5.
Perhaps it could be wrapped in #ifdefs?

Example abnf2pegtl2.cpp does not throw custom errors.

I tried out the abnf2pegtl2.cpp example after I couldn't find out at which point the error_control is passed trough the tao::pegtl::parse_tree::parse to tao::pegtl::parse, and it seems it just isn't. Therefor custom error messages are not thrown.

The 'sor' combinator doesn't parse the second rule. contrib/http

Hi, everyone! Thank you very much for the absolutely amazing library!

I have encountered a problem:

I have a simple code:

#include <iostream>
#include <tao/pegtl.hpp>
#include <tao/pegtl/contrib/http.hpp>
#include <tao/pegtl/contrib/tracer.hpp>

using namespace std::string_literals;
using namespace tao::pegtl;

namespace rule {
using grammar = must<http::start_line>;
}

int main() {
  auto response =
      "HTTP/1.1 206 Partial content\r\n"s;

  string_input<> input(response, "test");

  try {
    parse<rule::grammar, nothing, tracer>(input);

  } catch (const std::exception &e) {
    std::cerr << "\nERROR: " << e.what() << std::endl;
  }

  return 0;
}

It fails like this:

test:1:0(0)  start  tao::pegtl::must<tao::pegtl::http::start_line>
test:1:0(0)  start  tao::pegtl::http::start_line
test:1:0(0)  start  tao::pegtl::http::request_line
test:1:0(0)  start  tao::pegtl::http::method
test:1:0(0)  start  tao::pegtl::http::tchar
test:1:0(0)  start  tao::pegtl::abnf::ALPHA
test:1:1(1) success tao::pegtl::abnf::ALPHA
test:1:1(1) success tao::pegtl::http::tchar
test:1:1(1)  start  tao::pegtl::http::tchar
test:1:1(1)  start  tao::pegtl::abnf::ALPHA
test:1:2(2) success tao::pegtl::abnf::ALPHA
test:1:2(2) success tao::pegtl::http::tchar
test:1:2(2)  start  tao::pegtl::http::tchar
test:1:2(2)  start  tao::pegtl::abnf::ALPHA
test:1:3(3) success tao::pegtl::abnf::ALPHA
test:1:3(3) success tao::pegtl::http::tchar
test:1:3(3)  start  tao::pegtl::http::tchar
test:1:3(3)  start  tao::pegtl::abnf::ALPHA
test:1:4(4) success tao::pegtl::abnf::ALPHA
test:1:4(4) success tao::pegtl::http::tchar
test:1:4(4)  start  tao::pegtl::http::tchar
test:1:4(4)  start  tao::pegtl::abnf::ALPHA
test:1:4(4) failure tao::pegtl::abnf::ALPHA
test:1:4(4)  start  tao::pegtl::abnf::DIGIT
test:1:4(4) failure tao::pegtl::abnf::DIGIT
test:1:4(4)  start  tao::pegtl::ascii::one<(char)33, (char)35, (char)36, (char)37, (char)38, (char)39, (char)42, (char)43, (char)45, (char)46, (char)94, (char)95, (char)96, (char)124, (char)126>
test:1:4(4) failure tao::pegtl::ascii::one<(char)33, (char)35, (char)36, (char)37, (char)38, (char)39, (char)42, (char)43, (char)45, (char)46, (char)94, (char)95, (char)96, (char)124, (char)126>
test:1:4(4) failure tao::pegtl::http::tchar
test:1:4(4) success tao::pegtl::http::method
test:1:4(4)  start  tao::pegtl::abnf::SP
test:1:4(4) failure tao::pegtl::abnf::SP

ERROR: test:1:4(4): parse error matching tao::pegtl::abnf::SP

As far as I understand it tries to parse the first part of the rule struct start_line : sor< request_line, status_line > {}; which is request_line, but I'm parsing actually a status_line. So if I'm not mistaken, it must check the second sor rule if the first one fails, because of "OR" predicate nature, but it just fails at the first one.

If I'm wrong with my understanding, what's the problem with that code?

Allow literal strings for pegtl::string

Allowing

pegtl::string<"Hello, ">

would obviously be simpler than

pegtl::string<'H', 'e', 'l', 'l', 'o', ',', ' '>

but implementing this is a chore. I am not versed enough in C++ to know if upcoming standard versions make it simpler, but for now I found a workaround with boost. This issue is a placeholder so others can find it until pegtl supports it natively (so feel free to close the issue if it is not a priority for now).

  #include <boost/metaparse/string.hpp>

  template<typename T>
  struct literal_string {};

  template<char... Cs>
  struct literal_string<boost::metaparse::string<Cs...> > : pegtl::string<Cs...> {};

  #define STRING(str) literal_string< BOOST_METAPARSE_STRING(str) >

  struct prefix
     : STRING("Hello, ")
   {};

clang 4.0.0 using gcc 7.1.1 build error with `-std=c++1z`

Not sure whether it's a clang's upstream bug or the lib's one, but anyway it doesn't build.

Sysinfo

System: ArchLinux CURRENT

$ clang++ -v

clang version 4.0.0 (tags/RELEASE_400/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1
Found candidate GCC installation: /usr/lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Selected GCC installation: /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
$ gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/7.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc-multilib/src/gcc/configure --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++ --enable-shared --enable-threads=posix --enable-libmpx --with-system-zlib --with-isl --enable-__cxa_atexit --disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch --disable-libssp --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-plugin --enable-install-libiberty --with-linker-hash-style=gnu --enable-gnu-indirect-function --enable-multilib --disable-werror --enable-checking=release
Thread model: posix
gcc version 7.1.1 20170528 (GCC) 

Test code

// test.cpp
#include <tao/pegtl.hpp>

int main() {
  return 0;
}

Compilers results

-std= g++ clang++
c++11
c++14
c++1z
c++17 (N/A)

Output

clang++ -std=c++1z test.cpp:

In file included from test.cpp:1:
In file included from PEGTL/include/tao/pegtl.hpp:10:
In file included from PEGTL/include/tao/pegtl/ascii.hpp:8:
In file included from PEGTL/include/tao/pegtl/eol.hpp:28:
In file included from PEGTL/include/tao/pegtl/internal/eol.hpp:11:
In file included from PEGTL/include/tao/pegtl/internal/../analysis/generic.hpp:9:
In file included from PEGTL/include/tao/pegtl/internal/../analysis/grammar_info.hpp:7:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/map:60:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/bits/stl_tree.h:72:
In file included from /usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/bits/node_handle.h:39:
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:27: error: use of class template 'optional' requires template arguments
  template <typename _Tp> optional(_Tp) -> optional<_Tp>;
                          ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:451:11: note: template is declared here
    class optional
          ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:40: error: expected ';' at end of declaration
  template <typename _Tp> optional(_Tp) -> optional<_Tp>;
                                       ^
/usr/bin/../lib64/gcc/x86_64-pc-linux-gnu/7.1.1/../../../../include/c++/7.1.1/optional:1032:41: error: cannot use arrow operator on a type
  template <typename _Tp> optional(_Tp) -> optional<_Tp>;
                                        ^
3 errors generated.

examples/double.hh: dot is not optional

The example grammar in double.hh requires dot in the input, so it won't parse 2 or 2e0.
(it's not an intentional limitation? numeral from lua53_parse.cc handles it correctly)
Here is what the sum.cc program shows:

$ ./sum 
Give me a comma separated list of numbers.
The numbers are added using the PEGTL.
Type [q or Q] to quit

2.0
parsing OK; sum = 2
2.
parsing OK; sum = 2
2.e0
parsing OK; sum = 2
2e0
parsing failed
2
parsing failed

Add cmake option for disabling testing & examples

Hello,
I am using PEGTL as a submodule for my own repository and link it by using "cmake_add_subdirectroy" (which works perfectly fine). The annoying part is that this will add all of the PEGL examples & tests to my own project. Currently there is no cmake option to disable this. I'd recommend something like this:
option(BUILD_TESTS "Wether or not to build PEGL test cases" TRUE),
which would build the tests if not explicitly disabled before.

MAP_FILE

PEGTL uses the MAP_FILE compatibility flag - according to the Linux standard this is ignored. Worth removing?

should buffer_input keep track of absolute offset?

I noticed that position() on a buffer_input returns the relative offset to the last discard, which is fair I guess., but it wasn't obvious from the documentation.
Often, the user will be interested in the absolute offset from the start of parsing. It's easy enough to keep track of that in the read callback, but I wonder if this is something you want to provide in pegtl.

Problem compiling s_expression_2 example

Hello again,

I have an error while compiling the example s_expression_2 that I don't understand :

c++ -I. -std=c++11 -stdlib=libc++ -pedantic -Wall -Wextra -Werror -O3 examples/s_expression_2.cc -o build/examples/s_expression_2
examples/s_expression_2.cc:74:32: error: no matching member function for call to 'parse'
read_parser( fn, in ).parse< main, action >( f2 );
~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
./pegtl/read_parser.hh:39:12: note: candidate template ignored: invalid explicitly-specified argument
for template parameter 'Action'
void parse( States && ... st )
^
1 error generated.
make: *** [build/examples/s_expression_2] Error 1

Clang-cl (on Windows) 5.0.1 64-bit bug

static pair_t peek( Input& in, const std::size_t o = 0 ) noexcept( noexcept( in.peek_char( 0 ) ) )

static void transform( std::unique_ptr< Node >& n, States&&... st ) noexcept( noexcept( n->remove_content( st... ) ) )

Clang-cl has an issue with these two lines. It gets confused and somehow thinks that the code is calling the constructor for the class it's in (peek_char and remove_content) and not the method of the variable in or n, respectively.

I don't know how you'd like to handle this (if at all). I'd appreciate it if the bug were fixed, but that is up to you of course. You could fix it in a few ways:

  • Don't try to mark these functions noexcept.
  • Rename the function or the class so that they're different (so clang-cl doesn't get confused).
  • #ifdefs to not try to use noexcept for this compiler.

FYI, I've created a minimal test case that reproduces the bug, which I will submit to LLVM once they've given me an account. If you're interested, it's attached.

main.cpp.txt

star<ws> fails on more than 15 repetitions

Hi,
I am currently trying to find out why my seemingly correct grammar won't parse. Here's the first issue that I don't understand: take the JSON grammar as an example and start with 16 spaces:

             {
}

It fails with:

     1      1 source:1:0(0)  start  tao::pegtl::disable<tao::pegtl::json::text>
     2      2 source:1:0(0)  start  tao::pegtl::json::text
     3      3 source:1:0(0)  start  tao::pegtl::star<tao::pegtl::json::ws>
     4      4 source:1:0(0)  start  tao::pegtl::json::ws
     5      4 source:1:0(0) failure tao::pegtl::json::ws
     6      3 source:1:0(0) success tao::pegtl::star<tao::pegtl::json::ws>
     7      5 source:1:0(0)  start  tao::pegtl::json::value
     8      6 source:1:0(0)  start  tao::pegtl::sor<tao::pegtl::json::string, tao::pegtl::json::number, tao::pegtl::json::object, tao::pegtl::json::array, tao::pegtl::json::false_, tao::pegtl::json::true_, tao::pegtl::json::null>
     9      7 source:1:0(0)  start  tao::pegtl::json::string
    10      8 source:1:0(0)  start  tao::pegtl::ascii::one<(char)34>
    11      8 source:1:0(0) failure tao::pegtl::ascii::one<(char)34>
    12      7 source:1:0(0) failure tao::pegtl::json::string
    13      9 source:1:0(0)  start  tao::pegtl::json::number
    14     10 source:1:0(0)  start  tao::pegtl::opt<tao::pegtl::ascii::one<(char)45> >
    15     11 source:1:0(0)  start  tao::pegtl::ascii::one<(char)45>
    16     11 source:1:0(0) failure tao::pegtl::ascii::one<(char)45>
    17     10 source:1:0(0) success tao::pegtl::opt<tao::pegtl::ascii::one<(char)45> >
    18     12 source:1:0(0)  start  tao::pegtl::json::int_
    19     13 source:1:0(0)  start  tao::pegtl::ascii::one<(char)48>
    20     13 source:1:0(0) failure tao::pegtl::ascii::one<(char)48>
    21     14 source:1:0(0)  start  tao::pegtl::json::digits
    22     15 source:1:0(0)  start  tao::pegtl::abnf::DIGIT
    23     15 source:1:0(0) failure tao::pegtl::abnf::DIGIT
    24     14 source:1:0(0) failure tao::pegtl::json::digits
    25     12 source:1:0(0) failure tao::pegtl::json::int_
    26      9 source:1:0(0) failure tao::pegtl::json::number
    27     16 source:1:0(0)  start  tao::pegtl::json::object
    28     17 source:1:0(0)  start  tao::pegtl::json::begin_object
    29     18 source:1:0(0)  start  tao::pegtl::ascii::one<(char)123>
    30     18 source:1:0(0) failure tao::pegtl::ascii::one<(char)123>
    31     17 source:1:0(0) failure tao::pegtl::json::begin_object
    32     16 source:1:0(0) failure tao::pegtl::json::object
    33     19 source:1:0(0)  start  tao::pegtl::json::array
    34     20 source:1:0(0)  start  tao::pegtl::json::begin_array
    35     21 source:1:0(0)  start  tao::pegtl::ascii::one<(char)91>
    36     21 source:1:0(0) failure tao::pegtl::ascii::one<(char)91>
    37     20 source:1:0(0) failure tao::pegtl::json::begin_array
    38     19 source:1:0(0) failure tao::pegtl::json::array
    39     22 source:1:0(0)  start  tao::pegtl::json::false_
    40     22 source:1:0(0) failure tao::pegtl::json::false_
    41     23 source:1:0(0)  start  tao::pegtl::json::true_
    42     23 source:1:0(0) failure tao::pegtl::json::true_
    43     24 source:1:0(0)  start  tao::pegtl::json::null
    44     24 source:1:0(0) failure tao::pegtl::json::null
    45      6 source:1:0(0) failure tao::pegtl::sor<tao::pegtl::json::string, tao::pegtl::json::number, tao::pegtl::json::object, tao::pegtl::json::array, tao::pegtl::json::false_, tao::pegtl::json::true_, tao::pegtl::json::null>
    46      5 source:1:0(0) failure tao::pegtl::json::value
    47      2 source:1:0(0) failure tao::pegtl::json::text
    48      1 source:1:0(0) failure tao::pegtl::disable<tao::pegtl::json::text>

Note 5 4 source:1:0(0) failure tao::pegtl::json::ws: it fails already at the first space! If you delete one space, it works.

minor bug in abnf2pegtl

~> cat q.abnf
quoted-pair = "\" (%x00-09 / %x0B-0C / %x0E-7F)

~> ./abnf2pegtl q.abnf
struct quoted_pair : pegtl::seq< pegtl::one< '\' >, pegtl::sor< pegtl::range< 0x00, 0x09 >, pegtl::range< 0x0B, 0x0C >, pegtl::range< 0x0E, 0x7F > > > {};

Note that "\" from abnf was converted to '\' with just one backslash, i.e. malformed char.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.