Code Monkey home page Code Monkey logo

lexy's Introduction

lexy

Project Status Build Status Playground

lexy is a parser combinator library for C++17 and onwards. It allows you to write a parser by specifying it in a convenient C++ DSL, which gives you all the flexibility and control of a handwritten parser without any of the manual work.

Documentation: lexy.foonathan.net

IPv4 address parser
namespace dsl = lexy::dsl;

// Parse an IPv4 address into a `std::uint32_t`.
struct ipv4_address
{
    // What is being matched.
    static constexpr auto rule = []{
        // Match a sequence of (decimal) digits and convert it into a std::uint8_t.
        auto octet = dsl::integer<std::uint8_t>;

        // Match four of them separated by periods.
        return dsl::times<4>(octet, dsl::sep(dsl::period)) + dsl::eof;
    }();

    // How the matched output is being stored.
    static constexpr auto value
        = lexy::callback<std::uint32_t>([](std::uint8_t a, std::uint8_t b, std::uint8_t c, std::uint8_t d) {
            return (a << 24) | (b << 16) | (c << 8) | d;
        });
};

Features

Full control
  • Describe the parser, not some abstract grammar: Unlike parser generators that use some table driven magic for parsing, lexy’s grammar is just syntax sugar for a hand-written recursive descent parser. The parsing algorithm does exactly what you’ve instructed it to do — no more ambiguities or weird shift/reduce errors!

  • No implicit backtracking or lookahead: It will only backtrack when you say it should, and only lookahead when and how far you want it. Don’t worry about rules that have side-effects, they won’t be executed unnecessarily thanks to the user-specified lookahead conditions. Try it online.

  • Escape hatch for manual parsing: Sometimes you want to parse something that can’t be expressed easily with lexy’s facilities. Don’t worry, you can integrate a hand-written parser into the grammar at any point. Try it online.

  • Tracing: Figure out why the grammar isn’t working the way you want it to. Try it online.

Easily integrated
  • A pure C++ DSL: No need to use an external grammar file; embed the grammar directly in your C++ project using operator overloading and functions.

  • Bring your own data structures: You can directly store results into your own types and have full control over all heap allocations.

  • Fully constexpr parsing: You want to parse a string literal at compile-time? You can do so.

  • Minimal standard library dependencies: The core parsing library only depends on fundamental headers such as <type_traits> or <cstddef>; no big includes like <vector> or <algorithm>.

  • Header-only core library (by necessity, not by choice — it’s constexpr after all).

Designed for text
  • Unicode support: parse UTF-8, UTF-16, or UTF-32, and access the Unicode character database to query char classes or perform case folding. Try it online.

  • Convenience: Built-in rules for parsing nested structures, quotes and escape sequences. Try it online.

  • Automatic whitespace skipping: No need to manually handle whitespace or comments. Try it online.

Designed for programming languages
  • Keyword and identifier parsing: Reserve a set of keywords that won’t be matched as regular identifiers. Try it online.

  • Operator parsing: Parse unary/binary operators with different precedences and associativity, including chained comparisons a < b < c. Try it online.

  • Automatic error recovery: Log an error, recover, and continue parsing! Try it online.

Designed for binary input
  • Bytes: Rules for parsing N bytes or Nbit big/little endian integer.

  • Bits: Rules for parsing individual bit patterns.

  • Blobs: Rules for parsing TLV formats.

FAQ

Why should I use lexy over XYZ?

lexy is closest to other PEG parsers. However, they usually do more implicit backtracking, which can hurt performance and you need to be very careful with rules that have side-effects. This is not the case for lexy, where backtracking is controlled using branch conditions. lexy also gives you a lot of control over error reporting, supports error recovery, special support for operator precedence parsing, and other advanced features.

Boost.Spirit

The main difference: it is not a Boost library. In addition, Boost.Spirit is quite old and doesn’t support e.g. non-common ranges as input. Boost.Spirit also eagerly creates attributes from the rules, which can lead to nested tuples/variants while lexy uses callbacks which enables zero-copy parsing directly into your own data structure. However, lexy’s grammar is more verbose and designed to parser bigger grammars instead of the small one-off rules that Boost.Spirit is good at.

PEGTL

PEGTL is very similar and was a big inspiration. The biggest difference is that lexy uses an operator based DSL instead of inheriting from templated classes as PEGTL does; depending on your preference this can be an advantage or disadvantage.

Hand-written Parsers

Writing a handwritten parser is more manual work and error prone. lexy automates that away without having to sacrifice control. You can use it to quickly prototype a parser and then slowly replace more and more with a handwritten parser over time; mixing a hand-written parser and a lexy grammar works seamlessly.

How bad are the compilation times?

They’re not as bad as you might expect (in debug mode, that is).

The example JSON parser compiles in about 2s on my machine. If we remove all the lexy specific parts and just benchmark the time it takes for the compiler to process the datastructure (and stdlib includes), that takes about 700ms. If we validate JSON only instead of parsing it, so remove the data structures and keep only the lexy specific parts, we’re looking at about 840ms.

Keep in mind, that you can fully isolate lexy in a single translation unit that only needs to be touched when you change the parser. You can also split a lexy grammar into multiple translation units using the dsl::subgrammar rule.

How bad are the C++ error messages if you mess something up?

They’re certainly worse than the error message lexy gives you. The big problem here is that the first line gives you the error, followed by dozens of template instantiations, which end at your lexy::parse call. Besides providing an external tool to filter those error messages, there is nothing I can do about that.

How fast is it?

Benchmarks are available in the benchmarks/ directory. A sample result of the JSON validator benchmark which compares the example JSON parser with various other implementations is available here.

Why is it called lexy?

I previously had a tokenizer library called foonathan/lex. I’ve tried adding a parser to it, but found that the line between pure tokenization and parsing has become increasingly blurred. lexy is a re-imagination on of the parser I’ve added to foonathan/lex, and I’ve simply kept a similar name.

Documentation

The documentation, including tutorials, reference documentation, and an interactive playground can be found at lexy.foonathan.net.

A minimal CMakeLists.txt that uses lexy can look like this:

CMakeLists.txt
project(lexy-example)

include(FetchContent)
FetchContent_Declare(lexy URL https://lexy.foonathan.net/download/lexy-src.zip)
FetchContent_MakeAvailable(lexy)

add_executable(lexy_example)
target_sources(lexy_example PRIVATE main.cpp)
target_link_libraries(lexy_example PRIVATE foonathan::lexy)

lexy's People

Contributors

bbastin avatar benjaminwinger avatar dox5 avatar foonathan avatar github-actions[bot] avatar jan-kelemen avatar jgopel avatar klao avatar lacc97 avatar lmichaelis avatar marcusvoelker avatar nelsoneloi avatar nickelpro avatar rkaminsk avatar ryan-rsm-mckenzie avatar shohirose avatar som1lse avatar suy avatar tavi-cacina avatar xottab-duty avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lexy's Issues

example-or-fix-request: delimited-dsl where closing delimiter is also an escape leader

How to write a delimited-dsl closing delimiter that is the same as an escape leader?

suppose for sake of illustration that in the object language to be parsed, strings were delimited by %
Then I would want to match delimited strings as follows:

  • "%one two%" --> "one two"
  • "%one %% two%" --> "one % two"

In Pascal, for instance, the basic syntax of a string is delimited with single quotes.
If you want to escape a single quote inside such a string, you type it twice.

What is the recommended way to handle this?

here is a snippet that did not work

struct pct_quoted {
  static constexpr auto quote_sym = lexy::symbol_table<char>.map<'%'>('%');
  static constexpr auto rule = [] {
    auto delim = dsl::percent_sign;
    auto escape = dsl::escape(delim).symbol<quote_sym>();
    auto pct_q = dsl::delimited(delim, delim);
    return pct_q(dsl::ascii::character,escape);
  }();
};

void test_pct_quot() {
  auto input = lexy::zstring_input("%one%%two%");

  lexy::trace<pct_quoted>(stdout,input);
};
 1:  1: pct_quoted:
 1:  1: - literal: %
 1:  2: - token: one
 1:  5: - literal: %
 1:  6: - finish

Thank you for this nice library.

benchmar compile errors

Hi,

Would you mind to make benchmark compile-able from the main Cmake to find all dependencies and libraries ?

Otherwise there are many errors.

Best

Assertion failed when using lexy_ext::report_error

I stumbled upon a failing assertion that can be triggered when using lexy_ext::shell and lexy_ext::report_error as in the shell example (with slight modifications).

Here is how you can reproduce it. First, modify the echo rule in such a way that is expects any kind of literal token after the initial keyword. Any of the following has the same effect:

struct cmd_echo
{
    static constexpr auto rule  = LEXY_KEYWORD("echo", identifier) >> LEXY_LIT("abc") + dsl::p<argument>;
    // static constexpr auto rule  = LEXY_KEYWORD("echo", identifier) >> dsl::lit_c<'a'> + dsl::p<argument>;
    // static constexpr auto rule  = LEXY_KEYWORD("echo", identifier) >> dsl::round_bracketed( dsl::p<argument>);
    static constexpr auto value = lexy::new_<shell::cmd_echo, shell::command>;
};

Now start the shell and enter only the keyword echo. Then press Ctrl+D. This sends the current input on to parsing and makes lexy consume the initial keyword. Thus, it expects the literal token next. Now end the input completely by pressing Ctrl+D again. Parsing the command fails, because of the missing literal token and lexy_ext::report_error is called.

Writing the annotation for the expected token then causes the assertion to fail. The + 1 in error.index() + 1 causes the iterator to be moved past its end.

    // Write the main annotation.
    if constexpr (std::is_same_v<Tag, lexy::expected_literal>)
    {
        auto string = lexy::_detail::make_literal_lexeme<typename Reader::encoding>(error.string());

        out = writer.write_annotation(out, annotation_kind::primary, location,
                                      lexy::_detail::next(error.position(), error.index() + 1), // <----- HERE
                                      [&](OutputIt out, lexy::visualization_options opts) {
                                          out = lexy::_detail::write_str(out, "expected '");
                                          out = lexy::visualize_to(out, string, opts);
                                          out = lexy::_detail::write_str(out, "'");
                                          return out;
                                      });
    }

The following assertion fails:

LEXY_PRECONDITION(_idx != _buffer->read_size());

Removing the the + 1 and just leaving it as error.index() actually fixes the problem. Error messages still look good for the cases I could test. But I am not sure, if this is test best way to fix it as I don't really know how important the +1 is there.

Checkout jannikw@617f524 for reproducing this and jannikw@5e4afce for the potential fix.

lexy_ext::report_error on Windows 10 (MSVC)

Running https://lexy.foonathan.net/learn/warmup/ using lexy::string_input

int main()
{
    auto str = "#FFX";
    auto input = lexy::string_input(std::string_view{ str });

    auto result = lexy::parse<grammar::color>(input, lexy_ext::report_error);
    if (result.has_value())
    {
        auto color = result.value();
        std::printf("#%02x%02x%02x\n", color.r, color.g, color.b);
    }

    return result ? 0 : 1;
}

The error contains garbage characters

error: while parsing channel
     ├óÔÇØÔÇÜ
   1 ├óÔÇØÔÇÜ #FFX
     ├óÔÇØÔÇÜ    ^ expected digit.hex

[Question] dsl::integer and negative numbers

Hi,

I notice that dsl::integer is not parsing negative numbers. Is that the desired behavior?

I am not sure how to address this. This is my sample code:

#include <lexy/action/parse.hpp>
#include <lexy/callback.hpp>
#include <lexy/dsl.hpp>
#include <lexy_ext/report_error.hpp>
#include <lexy/input/string_input.hpp>

namespace dsl = lexy::dsl;

struct production
{
  static constexpr auto whitespace = dsl::ascii::space;

  static constexpr auto rule =
      [] {
        auto integer = dsl::integer<int64_t>;
        auto hex_integer = LEXY_LIT("0x") >> dsl::integer<int, dsl::hex>;

        return (hex_integer | integer) + dsl::eof;
      }();
  static constexpr auto value = lexy::forward<int>;
};

int main()
{
  const char* strings[] = {"7", "0x78", "-7"};
  for( const auto& str: strings )
  {
    auto input  = lexy::zstring_input(str);
    auto result = lexy::parse<production>(input, lexy_ext::report_error);
    if (result.has_value())
    {
      std::printf("%d\n", result.value());
    }
  }
  return 0;
}

Output:

7
120
error: while parsing Integer
     |
   1 | -7
     | ^ exhausted choice

Context sensitive optimization

Let's say I want to parse a list of predicates of the form:
A(x,y,...)
and I have an external map from predicate names to argument types (so that I know how to parse the predicates' arguments). With lexy, I can write a sink that takes an immutable parse state (in this case my map from names to lists of types) and parses the arguments into the correct types. Then I would parse predicates like this:

struct predicate {
rule = lexy::p<predicate_name> + lexy::p<argument_list>
value = my_sink(....)
}

where the value of argument_list is vector of strings.

The problem I have is that I have to parse argument_list into a vector of strings. If I could first parse predicate_name and add this information to the parse state so that I know the name of the predicate while computing the value of argument_list, I wouldn't have to delay the decision of how to parse the arguments. Is there a way to achieve this with lexy?

Incoming Breaking Change: Error Recovery

To implement error recovery (i.e. to allow the parser to continue after a parse error), I need to make some changes to the interface of the parse functions (lexy::parse(), lexy::validate(), etc.) as parsing can now return multiple errors in addition to a value.

  • lexy::result<T, E> is going to be deprecated.
  • The return type of lexy::read_file() changes to a new lexy::read_file_result. This one also acts as a Reader directly, so now need to call .value() afterwards.
  • The return type of lexy::validate() and lexy::parse_as_tree() changes to a new lexy::validate_result. It contains zero or more errors.
  • The return type of lexy::parse() changes to a new lexy::parse_result. It contains an optional value and zero or more errors.
  • The error callback must now return void or needs to be a sink to allow multiple invocation. A new callback lexy::collect<Container>(callback) can be used for easy migration.

Tutorial example fails to compile

I'm compiling with MSVC which obviously doesn't compile the project for many reasons, but here's one error that stood out to me.

From what I gather, this conditional enabler here tries to detect a static method name on the object, so that it can produce a type name:

if constexpr (_detail::is_detected<_detect_name_f, T>)
return string_view(T::name());

However, in the tutorial, there's a type grammar::name, which, when plugged into this function here:
auto name_field = make_field(LEXY_LIT("name"), LEXY_MEM(name) = dsl::p<name>);

It evaluates to grammar::name::name(). This invokes the constructor, and is thus a valid expression, but obviously it doesn't return a string. Perhaps the conditional enabler could exclude expressions for which T::name yields a type name?

Question: using bind_sink and opt_list together

I'm trying to use bind_sink to provide an allocator to a sink and I've come across an issue that I can't see a way around that feels like a bug (or I've missed how to do it properly!)

Given a production that parsing a parenthesised list of integers like:

struct production
{
    static constexpr auto whitespace = dsl::ascii::space;

    static constexpr auto rule = [] {
        auto sep = dsl::trailing_sep(dsl::lit_c<','>);
        auto digits = dsl::digits<>.sep(dsl::digit_sep_tick).no_leading_zero();
        return dsl::round_bracketed.opt_list(dsl::integer<int>(digits), sep);
    }();
};

A value member along the lines of the following will capture the list of integers, this works as expected

    static constexpr auto value = lexy::as_list<std::pmr::vector<int>>;

Now to allocate this list with an allocator bind_sink is used:

static constexpr auto value = lexy::bind_sink(lexy::as_list<std::pmr::vector<int>>, lexy::parse_state);

with the above production this fails with a static assert on missing overloads (full example in compiler explorer). Changing the opt_list to plain list works as expected.

I think this is down to the fact that as_list/_list in callback/container.h implements a callback for lexy::nullopt to handle use with opt_list. The implementation of bind_sink does not delegate operator() to the wrapped sink so a wrapped sink cannot be used with an opt_list. Should _bound_sink implement an operator() that delegates to the wrapped sink?

Happy to investigate/raise a PR if this isn't intentional!

Example for Callback lexy::new_ in documentation appears to be incorrect.

The documentation for lexy::new_ on your website appears to show the wrong example.

Currently it shows this:

Example 2. Construct a point on the heap and returns a std::unique_ptr

struct point
{
    int x, y;
};

struct production
{
    static constexpr auto whitespace = dsl::ascii::space;

    static constexpr auto rule = [] {
        auto integer = dsl::integer<int>;
        return dsl::twice(integer, dsl::sep(dsl::comma));
    }();
    static constexpr auto value = lexy::construct<point>;
};

The code here is identical to the example for lexy::construct.

Infinite compilation caused by dsl::times with separator and automatic whitespace

Minimal repro case: https://godbolt.org/z/44hTdPzza
Reproduces on clang 13, clang 14 and gcc 12

This doesn't happen if either of the following is true:

  • automatic whitespace isn't enabled
  • no separator is specified for dsl::times

L.E: In the mean time I discovered lexy::token_production and that makes things a bit better, but the general problem remains: If automatic whitespace is enabled, then chaining too many rules will make the compilation times explode. For instance, a simple

struct production {
    constexpr auto whitespace = dsl::ascii:space;
    constexpr auto rule = LEXY_LIT("why") +  LEXY_LIT("oh") +  LEXY_LIT("why") +  LEXY_LIT("is") +  LEXY_LIT("this") +  LEXY_LIT("taking") +  LEXY_LIT("so ") +  LEXY_LIT("so") +  LEXY_LIT("so") +  LEXY_LIT("oh") +  LEXY_LIT("so") +  LEXY_LIT("long");
};

seems to never finish compiling.

is `operator/` useless? `operator|` offers same semantics

lexy::dsl::operator|
lexy/dsl/choice.hpp
branch  | branch  : Rule
pattern | pattern : Pattern
lexy::dsl::operator/
lexy/dsl/alternative.hpp
pattern / pattern : Pattern

I read the docs and code, is operator/ useless? both to offer same semantics...

128 bit integer parsing

Very nice library by the way. I'm having some fun learning it. Thanks!

128 bit integers on gcc on clang may be helpful

compiler specific adjustment that seems to work for line dsl/integer.hpp:76

        else if constexpr (std::is_same_v<T, unsigned long long>)
            return ULLONG_MAX;
+        else if constexpr (std::is_same_v<T, __int128_t>)
+           return __int128_t(LLONG_MAX) << 64;
+        else if constexpr (std::is_same_v<T, __uint128_t>)
+            return __uint128_t(ULLONG_MAX) << 64;
        else
            static_assert(_detail::error<T>);

Though specialising integer_traits by concept worked even if reproducing max_digit_count, add_digit_unchecked, and add_digit_checked was a bit anti-DRY.


As an aside I note this blew up:

    template<typename T>
    struct number : lexy::token_production {
        static constexpr auto rule  = dsl::sign + dsl::integer<T>(dsl::digits<>);
        static constexpr auto value = lexy::as_integer<T>;
    };

but this is fine:

    struct number : lexy::token_production {
        static constexpr auto rule  = dsl::sign + dsl::integer<int64_t>(dsl::digits<>);
        static constexpr auto value = lexy::as_integer<int64_t>;
    };

    struct number128 : lexy::token_production {
        static constexpr auto rule  = dsl::sign + dsl::integer<int128_t>(dsl::digits<>);
        static constexpr auto value = lexy::as_integer<int128_t>;
    };

I can specialise the config to call separate number and number128 tokens:

    template< typename T>
    struct config {
        static constexpr auto whitespace = dsl::ascii::blank;
        static constexpr auto rule = dsl::p<number>;
        static constexpr auto value = lexy::forward<T>;
    };

    template<Integer128 T>
    struct config<T> {
        static constexpr auto whitespace = dsl::ascii::blank;
        static constexpr auto rule = dsl::p<number128>;
        static constexpr auto value = lexy::forward<int128_t>;
    };

Thanks again for the fun library, --Matt.

Including lexy in CMake project forces use of C++20

In src/CMakeLists.txt:123, the lexy CMake project checks whether the cxx compiler supports c++20 and if so, unconditionally enables it. This in turn gets propagated as part of the INTERFACE properties of the lexy targets, which means even if the user has specified C++17 for the toplevel project, this gets overriden by lexy.

The CMake docs for 3.8 suggest using any of the cxx_std_?? compiler features guarantees at least that version will be used for compilation so it should be fine to just use cxx_std_17 in the compile features and let the user choose C++20 on her own. More recent docs make this more explicit.

Incoming Breaking Change: Error Tag Customization

There are currently many ways to customize the tag of a lexy::error if a rule failed to parse:

  • .error<Tag>() for tokens.
  • <Tag> for dsl::require, dsl::combination(), and others.
  • dsl::error<Tag> if you want to raise an error out-right.

I plan on merging this to use a consistent syntax everywhere (probably .error<Tag>, but I'll see).

parse without a file

It is my understanding that currently lexy::parse only accepts a file returned by lexy::read_file. Is it possible to let lexy::parse accept a std::string?
I know this is not strictly necessary: this is a parser after all. But this feature will prove useful for unit testing.
Thanks!

Incoming Breaking Change: dsl::value, dsl::label, dsl::id

One of lexy's design principles is to separate between what is being parsed (the rules) and where the result is stored.
To that end, you can use the same grammar with lexy::parse(), lexy::validate(), or lexy::parse_as_tree() to perform different things.

The dsl::value_* rules (for producing a constant value) and the dsl::label/dsl::id (for getting information about which branch was taken) always felt like they violate that rule: they only matter when you use lexy::parse() and for nothing else. To that end, I have decided to remove them.

  • The dsl::value_* rules will be replaced by some sort of mechanism to easily specify constants as part of the callback, not as part of the rule. The callbacks determine what value is produced by a production, so it is their job to control that, not the rule itself.
  • The dsl::label/dsl::id rules will be replaced by something like dsl::which(a | b | c) which produces the index a branch took. This is what they are meant to answer, and the resulting interface is cleaner than what I currently have.
  • There will be some other syntax to replace .lit_c<'n'>(dsl::value_c<'\n'>) in the escape sequence DSL; I'm not sure which one.

I will keep dsl::nullopt (also produced by dsl::opt()) and dsl::position: the former is the equivalent to the new dsl::which() for a single branches, and the latter cannot be replicated via callbacks.

Ignore whitespace at beginning of input

I am trying to ignore whitespace between my tokens/productions. Adding a member like

static constexpr auto whitespace
        = dsl::ascii::space | LEXY_LIT("//") >> dsl::until(dsl::eol);

only skips whitespace if it comes after the first actual token in the input. Meaning if the input start with a few blank lines or a comment for example, the parser fails. This is somewhat unexpected as the docs state:

Use whitespace to skip optional whitespace at the beginning of the input.

My work around is replacing the entry production's rule like this:

static constexpr auto rule //
        = dsl::whitespace(whitespace) + (old rule here);

This makes it work the way i want: Ignoring whitespace at the beginning of the input, but I think this should not be necessary if whitespace skipping was working correctly.

Playground for reproducing: https://lexy.foonathan.net/playground/?id=Pzo5jx88o&mode=trace

Unicode code-point ranges

Is it possible to specify unicode code-point ranges in a rule? For ex. using the lexy::utf8_encoding can lexy model this?
identifier-head -> U+0100-U+02FF, U+0370-U+167F, U+1681-U+180D, or U+180F-U+1DBF

PS Awesome library!

dsl for real numbers

I am new to the library so I apologize if I am missing something.

Exploring the examples I can't find anything similar to dsl::integer for real numbers.
O course I understand that parsing real numbers is complicated having a lot of different representations, but I would need something simple such as 5.67 to start with.

My application is an AST similar to the calculator example.

Compiler error on MSVC

The use of #warning in include/lexy/error_location.hpp at line 13 is not supported on MSVC and results in a fatal error:

...\lexy\include\lexy\error_location.hpp(13): fatal error C1021: invalid preprocessor command 'warning'

This issue can be easily replicated in Compiler Explorer:
https://godbolt.org/z/7nbe8TjKh

dsl::try_ still seems to fail on errors

I'm not sure if this is a bug or if I'm just misunderstanding how try_ is supposed to work, but using this sample program on current master:

#include <lexy/dsl.hpp>
#include <lexy/parse.hpp>
#include <lexy/input/string_input.hpp>
#include <lexy_ext/report_error.hpp>

#include <string>

namespace dsl = lexy::dsl;

struct foo {
	static constexpr auto rule = dsl::try_(dsl::lit_c<'t'>, dsl::return_);
};

int main() {
	std::string source_line = "x";
	auto input = lexy::string_input<lexy::utf8_encoding>(source_line);
	lexy::validate<foo>(input, lexy_ext::report_error);
}

I'd expect foo to accept any input. However, instead I get an error because the first rule in the try_ block fails:

$ ./test
error: while parsing foo
     | 
 1: 1| x
     | ^ expected 't'
------

Add cmake install target

I wanted to add lexy to vcpkg but this requires an install target. Alternatively I could copy all files manually but I thing a install target is a cleaner solution.
Nice project! :)

Infinite loop in error message

When parsing this file and displaying an error message related to the string err-😂 via lexy_ext::report_error, the output is 12 │ err-⟨U+????⟩⟨U+????⟩, but with the ⟨U+????⟩ repeated for, at the very least, a long time.

I've also tried it with a few other unicode characters and gotten the same result.

ice on gcc10

I'm trying to compile the commit 174bc6b on ubuntu 20.10 with gcc 10.2.0 and I'm encountering an internal compiler error

lexy/engine/trie.hpp:172:74: internal compiler error: Errore di segmentazione
  172 |                         ? (reader.bump(), result = transition<Transitions>::match(reader), true)

the same doesn't happen with gcc9 or clang11 (quick test here)
the funny thing is that I compiled the same repo this morning with an older ubuntu (probably 20.04) and gcc10, and I was successful, so I think it must be some weird latent bug in gcc.

Did you encounter already something like this?

Planned restructuring of rules

Hey all, this is an FYI that I'm planning to restructure how rules, patterns, and branches work.
The new model will be easier to understand, more consistent and more powerful.

Phase 1: Tokens and Branches (done)

  • Replace the Atom concept with a new Token concept. Conceptually, it represents an atomic parse unit. This is then used to implement all atomic rules, from like dsl::lit.
  • Introduce a dsl::token() rule that takes an arbitrary rule and turns it into a token replacing dsl::match().
  • Implement rules such as dsl::try_, dsl::capture() and operator! on top of tokens only, instead of arbitrary rules/patterns.
  • Add a new Branch concept that generalizes the existing use of Pattern as branches. This also allows turning rules like dsl::capture() into branches.
  • Eliminate the concept of Pattern as it is now unnecessary.

The advantages:

  • The token interface is simpler and does not require the use of LEXY_FORCE_INLINE as the function names are shorter. This should speed up compilation.
  • There is no longer a (at times) inconsistent distinction between rules and patterns. By default, all branch conditions are essentially LL(k).
  • The way Branch separates matching and value production allows the branch condition to produce actual values. They are only produced after the branch has taken
  • Ability to generate parse trees once an .token() hook has been added to a parser.

The breaking changes:

  • Some rules are no longer usable as branch conditions, but they were before. This can be fixed using dsl::token().
  • Some rules previously worked with arbitrary patterns but now require tokens. This can also be fixed using dsl::token().

Phase 2: Whitespace (done)

The goal is to move away from explicitly specifying whitespace at every point to just tell the grammar what whitespace is and have it be skipped automatically. Note that this is optional: if there are few and very explicit cases where whitespace should be skipped, it can still be done manually.

  • Improve lexy::parse_context to remember more metadata about the current production.
  • Add dsl::whitespace(rule) that explicitly skips rule as whitespace. It is not a branch.
  • Add ability to specify whitespace in the entry production and a way for certain productions to opt-out.
  • Add dsl::whitespace that automatically skips the computed whitespace.
  • Implicitly parse dsl::whitespace after every token and production that opted-out of whitespace parsing.
  • Remove operator[] overloads.

Windows compilation failure

I'm getting the following error in windows builds without any change other than updating to the latest version of lexy (linux builds work fine).

C:\GitLab-Runner\builds\bmwinger\fluent-cpp\build\lexy\include\lexy_ext/report_error.hpp(28,1): error C2535: 'lexy_ext::_detail::find_cp_boundary::<lambda_ab72e20f5f5a353016f1d321fd476f7c>::operator unknown-type(void) noexcept const': member function already defined or declared [C:\GitLab-Runner\builds\bmwinger\fluent-cpp\build\fluent.vcxproj]
C:\GitLab-Runner\builds\bmwinger\fluent-cpp\build\lexy\include\lexy_ext/report_error.hpp(28): message : see declaration of 'lexy_ext::_detail::find_cp_boundary::<lambda_ab72e20f5f5a353016f1d321fd476f7c>::operator unknown-type' [C:\GitLab-Runner\builds\bmwinger\fluent-cpp\build\fluent.vcxproj]

Full Log: https://gitlab.com/bmwinger/fluent-cpp/-/jobs/1663302422

It seems to be introduced somewhere between 914f9ea and f58d971

Implement Top level Language Parser

Right now I'm trying to implement a simple language parser for a toy language I made up.

Looks something like:

1 + 2; func test(a, b) { return a + b; }
I want to generate a list (aka a vector) of Expr's which is defined very similar to how json_value is defined in the json example.
So the first element in the parsed list would be the 1 + 2 and the second would be the function defintion.

Right now I have a dsl::list(dsl::p<top-level-production-type>, dsl::sep(dsl::newline / dsl::semicolon) as what the rule is for the final production in the grammar.
And the value field is lexy::as_list<ast::ExprList>

The result I am currently getting is only getting the first element parsed. So just 1 +2 or func test(...) if the that's what's first.
How can I modify it so that it collects each type of production/ast type into a list of them?

Pruduction is not a branch rule

Hi,

Some how I got my grammar in a state I can't figure how to get around it, I known what causes it but can't figure it out.

godbolt link https://godbolt.org/z/b78d3sznG

That doesn't compile, the reason is because of lines 10 and 17
"static constexpr auto whitespace = dsl::ascii::space;"

Commenting those two lines, it will compile, only now I don't have white space skipping

I got into this because I was writing tests for each production and when I added the root one it stopped compiling.

Thank you very much,
Nelson

Recurse tree

Hello!

That simple list:

(1a)(22b)(333c)(4444d)

And simple parser:

namespace test {
  struct g_tree {
    std::string content;
  };
}

namespace grammar {
  namespace dsl = lexy::dsl;

  struct g_tree {
    static constexpr auto rule = [] {
      auto quoted = dsl::delimited(dsl::lit_c<'('>, dsl::lit_c<')'>);
      return quoted(dsl::code_point);
    }();

    static constexpr auto value = lexy::as_string<std::string>;
  };

  struct collection {
    static constexpr auto whitespace = dsl::ascii::space / dsl::ascii::newline;
    static constexpr auto rule = dsl::whitespace + dsl::list(dsl::p<g_tree>) + dsl::eof;
    static constexpr auto value = lexy::as_list<std::vector<test::g_tree>>;
  };
}

But i need recurse tree:

(1a(1-1aa)(1-2aa))(22b)(333c(333-1c(333-2c)))(4444d)

So i am trying:

namespace test {
  struct g_tree {
    std::string content;
    std::vector<g_tree> tree;
  };
}

namespace grammar {
  namespace dsl = lexy::dsl;

  struct g_tree {
    static constexpr auto rule = [] {
      auto quoted = dsl::delimited(dsl::lit_c<'('>, dsl::lit_c<')'>);

      //error C2338: list() without a separator requires a branch condition
      //error C2338: opt() requires a branch condition
      //error C2607: static assertion failed
      return quoted(dsl::code_point + dsl::opt(dsl::list(dsl::recurse<g_tree>)));
    }();

    static constexpr auto value = lexy::as_string<std::string>;
  };

  struct collection {
    static constexpr auto whitespace = dsl::ascii::space / dsl::ascii::newline;
    static constexpr auto rule = dsl::whitespace + dsl::list(dsl::p<g_tree>) + dsl::eof;
    static constexpr auto value = lexy::as_list<std::vector<test::g_tree>>;
  };
}

But unsuccessfully...
Or maybe I'm completely wrong?

Thanks...

LEXY_CHAR_CLASS does not match arbitrary code points

Hi there, it's me again :)
I need to be able to parse parts of the ISO 8859-1 charset but since lexy does not currently support that encoding I figured that I've gotta define it myself. After looking around a bit, I found LEXY_CHAR_CLASS which seems to be exactly what I need. So I defined my own character class, including just the code points I need:

struct quoted {
	static constexpr auto latin1 = LEXY_CHAR_CLASS("char.ISO-8859-1",
	                                               dsl::ascii::alpha / dsl::ascii::punct / dsl::ascii::digit /
	                                                   (dsl::ascii::space - dsl::ascii::newline) /
	                                                   LEXY_LIT("\t") /
	                                                   LEXY_LIT("\xC3") / LEXY_LIT("\xD5") / LEXY_LIT("\xDC") /
	                                                   LEXY_LIT("\xE4") / LEXY_LIT("\xF6") / LEXY_LIT("\xFC"));

	static constexpr auto name = "string";
	static constexpr auto rule = dsl::quoted(latin1);
	static constexpr auto value = lexy::as_string<std::string>;
};

Sadly, it doesn't work:

error: while parsing string
     │
  54 │ ⇨text[1]⇨⇨=⇨"Gothic muss f⟨0xFC⟩r diese Option neu gestartet werden!"; // Kommentar
     │                           ^^^^^^ expected char.ISO-8859-1
error: while parsing string
     │
  90 │ ⇨text[0]⇨⇨= "Aufl⟨0xF6⟩sung";
     │                  ^^^^^^ expected char.ISO-8859-1

From my understanding of this paragraph from the documentation, however, it should work:

Otherwise, if the char class contains non-ASCII characters, matches and consumes all code points that form a code point in this encoding. For ASCII and UTF-32, this is always a single code unit, for UTF-8, this is up to 4 code units, and for UTF-16, this is up to 2 code units. Checks if that code point is part of the char class.

Am I misunderstanding the documentation on this? My input encoding is lexy::ascii_encoding, btw.

lexy doesn't compile in Visual Studio 2022

Build started...
1>------ Build started: Project: LangParserTest, Configuration: Debug x64 ------
1>LangParser.cpp
1>The contents of <bit> are available only with C++20 or later.
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24,69): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_must_dsl<Branch>,lexyd::token_base<lexyd::_lit<CharT,102,117,110,99>,lexyd::branch_base>::p<NextParser>>' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33): message : see reference to class template instantiation 'lexyd::_seq_impl<R,S>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_seq_impl<R,S>,NextParser>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(39): message : see reference to class template instantiation 'lexyd::_seq<R,S>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\LangParserTest\LangParser.h(12): message : see reference to function template instantiation 'auto lexyd::operator +<lexyd::_must_dsl<Branch>,lexyd::_lit<CharT,102,117,110,99>>(R,S)' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char,
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24,49): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_must_dsl<Branch>,lexyd::token_base<lexyd::_lit<CharT,102,117,110,99>,lexyd::branch_base>::p<NextParser>>' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\base.hpp(88): message : see reference to alias template instantiation 'lexyd::_seq_impl<R,S>::p<NextParser>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33,49): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\identifier.hpp(239,25): error C2607: static assertion failed
1>C:\Users\admin\source\repos\LangParserTest\LangParserTest\LangParser.h(14): message : see reference to function template instantiation 'auto lexyd::identifier<lexyd::_seq<R,S>>(CharClass)' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>,
1>            CharClass=lexyd::_seq<lexyd::_must_dsl<lexyd::ascii::_alpha>,lexyd::_lit<char,102,117,110,99>>
1>        ]
1>Main.cpp
1>The contents of <bit> are available only with C++20 or later.
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24,69): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_must_dsl<Branch>,lexyd::token_base<lexyd::_lit<CharT,102,117,110,99>,lexyd::branch_base>::p<NextParser>>' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33): message : see reference to class template instantiation 'lexyd::_seq_impl<R,S>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_seq_impl<R,S>,NextParser>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(39): message : see reference to class template instantiation 'lexyd::_seq<R,S>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\LangParserTest\LangParser.h(12): message : see reference to function template instantiation 'auto lexyd::operator +<lexyd::_must_dsl<Branch>,lexyd::_lit<CharT,102,117,110,99>>(R,S)' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char,
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24,49): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(24): message : see reference to alias template instantiation 'lexy::parser_for<lexyd::_must_dsl<Branch>,lexyd::token_base<lexyd::_lit<CharT,102,117,110,99>,lexyd::branch_base>::p<NextParser>>' being compiled
1>        with
1>        [
1>            Branch=lexyd::ascii::_alpha,
1>            CharT=char
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\base.hpp(88): message : see reference to alias template instantiation 'lexyd::_seq_impl<R,S>::p<NextParser>' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>
1>        ]
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\sequence.hpp(33,49): error C2938: 'lexy::parser_for' : Failed to specialize alias template
1>C:\Users\admin\source\repos\LangParserTest\deps\include\lexy\dsl\identifier.hpp(239,25): error C2607: static assertion failed
1>C:\Users\admin\source\repos\LangParserTest\LangParserTest\LangParser.h(14): message : see reference to function template instantiation 'auto lexyd::identifier<lexyd::_seq<R,S>>(CharClass)' being compiled
1>        with
1>        [
1>            R=lexyd::_must_dsl<lexyd::ascii::_alpha>,
1>            S=lexyd::_lit<char,102,117,110,99>,
1>            CharClass=lexyd::_seq<lexyd::_must_dsl<lexyd::ascii::_alpha>,lexyd::_lit<char,102,117,110,99>>
1>        ]
1>Generating Code...
1>Done building project "LangParserTest.vcxproj" -- FAILED.
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========
#include"LangParser.h"
int main() {
	LangParser* langparser = new LangParser();
	langparser->ParseStr("#func", "!#");
}
#pragma once
//TODO:Convert to lexy
#include<string>
#include<lexy/dsl.hpp>
#include<lexy/callback.hpp>

//#include<pegtl.hpp>
class LangParser
{
private:
	static constexpr auto grammmer = [] {
		auto funcdecl = lexy::dsl::must(lexy::dsl::ascii::alpha) + LEXY_LIT("func");

		return lexy::dsl::identifier(funcdecl);
	}();
	static constexpr auto val = lexy::as_string<std::string>;
	/*struct FunDecl : tao::pegtl::one<'func', 'fun'> {};
	struct FunReturnType : tao::pegtl::alpha {};
	struct FuncImpl : tao::pegtl::must<FunReturnType>, tao::pegtl::opt<FunDecl> {};
	struct Grammer: tao::pegtl::must<tao::pegtl::must<tao::pegtl::shebang>,FunDecl>{};*/
public:
	
	LangParser();
	void ParseStr(std::string str_dat, std::string str_in);
	void ParseFile(std::string path);
};
#include "LangParser.h"
#include <iostream>

LangParser::LangParser()
{
	
}

void LangParser::ParseStr(std::string str_dat, std::string str_in)
{
	/*
	try {
		tao::pegtl::string_input str_input(str_dat, str_in);
		tao::pegtl::parse<Grammer>(str_input);
			std::cout << str_input.begin();
		
	
	}
	catch (tao::pegtl::parse_error e) {
		std::cout << e.what() << std::endl;
	}
	*/
	
}

void LangParser::ParseFile(std::string path)
{
	/*
	tao::pegtl::file_input file_input(path);
	*/
}

error downloading doctest: "Couldn't resolve host name"

Hi there, I found out lexy and I wanted to try it. For this to happen I wrote an ebuild for my Gentoo system but configuration fails with this error:

 Emerging (1 of 1) dev-util/lexy-9999::testing
 * dev-util/lexy will not be compiled with PGO.
>>> Unpacking source...
Initialized empty Git repository in /mnt/distfiles/git3-src/foonathan_lexy.git/
 * Repository id: foonathan_lexy.git
 * To override fetched repository properties, use:
 *   EGIT_OVERRIDE_REPO_FOONATHAN_LEXY
 *   EGIT_OVERRIDE_BRANCH_FOONATHAN_LEXY
 *   EGIT_OVERRIDE_COMMIT_FOONATHAN_LEXY
 *   EGIT_OVERRIDE_COMMIT_DATE_FOONATHAN_LEXY
 * 
 * Fetching https://github.com/foonathan/lexy.git ...
git fetch https://github.com/foonathan/lexy.git +HEAD:refs/git-r3/HEAD
remote: Enumerating objects: 374, done.
remote: Counting objects: 100% (374/374), done.
remote: Compressing objects: 100% (200/200), done.
remote: Total 2953 (delta 187), reused 292 (delta 143), pack-reused 2579
Receiving objects: 100% (2953/2953), 574.83 KiB | 900.00 KiB/s, done.
Resolving deltas: 100% (2004/2004), done.
From https://github.com/foonathan/lexy
 * [new ref]                    -> refs/git-r3/HEAD
git symbolic-ref refs/git-r3/dev-util/lexy/0/__main__ refs/git-r3/HEAD
 * Checking out https://github.com/foonathan/lexy.git to /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999 ...
git checkout --quiet refs/git-r3/HEAD
GIT NEW branch -->
   repository:               https://github.com/foonathan/lexy.git
   at the commit:            3003acc03fd3ae5c2830de4de20c6da09ab32054
>>> Source unpacked in /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work
>>> Preparing source in /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999 ...
>>> Source prepared.
>>> Configuring source in /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999 ...
>>> Working in BUILD_DIR: "/mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999_build"
cmake -C /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999_build/gentoo_common_config.cmake -G Ninja -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_BUILD_TYPE=Gentoo -DCMAKE_TOOLCHAIN_FILE=/mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999_build/gentoo_toolchain.cmake  /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999
loading initial cache file /mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999_build/gentoo_common_config.cmake
-- The CXX compiler identification is GNU 10.2.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/x86_64-pc-linux-gnu-g++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Error at tests/CMakeLists.txt:19 (message):
  error downloading doctest: "Couldn't resolve host name"Could not resolve
  host: raw.githubusercontent.com

  Closing connection 0



-- Configuring incomplete, errors occurred!
See also "/mnt/Volume_3/Gentoo/temp/portage/dev-util/lexy-9999/work/lexy-9999_build/CMakeFiles/CMakeOutput.log".
 * ERROR: dev-util/lexy-9999::testing failed (configure phase):
 *   cmake failed
 * 
 * Call stack:
 *     ebuild.sh, line  125:  Called src_configure
 *   environment, line 2770:  Called cmake-utils_src_configure
 *   environment, line 1188:  Called die
 * The specific snippet of code:
 *       "${CMAKE_BINARY}" "${cmakeargs[@]}" "${CMAKE_USE_DIR}" || die "cmake failed";

When I clone and build manually lexy there is no problem. Any pointer on what I am looking to fix?
Thank you!

Add bounded integer parsing

I'm trying out lexy with a re-implementation of a parser I've already got which, in part, parses register names of the form r<0..31> and f<0..15> - i.e. there are 32 general purpose registers (named r0, r1, ... r31) and 16 floating point registers.

I'd like to verify that the register number is in range when parsed, so I'd like to have something like lexy::code_point, which has traits that check the parsed integer is in range. I've come up with the following generic bounded integer struct (kind of modelled on unbounded).

template <typename T, T Min, T Max>
struct bounded
{
};
template <typename T, T Min, T Max>
struct lexy::integer_traits<bounded<T, Min, Max>>
{
   using type                       = typename integer_traits<T>::type;
   static constexpr auto is_bounded = true;

   template <int Radix>
   static constexpr void add_digit_unchecked(type& result, unsigned digit)
   {
      integer_traits<T>::template add_digit_unchecked<Radix>(result, digit);
   }
   template <int Radix>
   static constexpr std::size_t max_digit_count = lexy::_digit_count(Radix, Max);

   template <int Radix>
   static constexpr bool add_digit_checked(type& result, unsigned digit)
   {
      return integer_traits<T>::add_digit_checked<Radix>(result, digit) && result >= Min && result <= Max;
   }
};

This can be used like this:

namespace grammar
{
   namespace dsl = lexy::dsl;

   template <char Type, uint8_t RegCount>
   struct reg_t
   {
      using reg_num_t = bounded<uint8_t, 0, RegCount-1>;
      static constexpr auto rule = dsl::ascii::case_folding(dsl::lit_c<Type>)
                                 + dsl::integer<reg_num_t>(dsl::digits<dsl::decimal>);
      static constexpr auto value = lexy::forward<std::uint8_t>;
   };
   using r_reg = reg_t<'r', 32>;
   using f_reg = reg_t<'f', 16>;
} // namespace grammar

Is this something that would make a useful addition to lexy? If so, I'm happy enough to create a PR based on this...

as_collection has a hidden dependency on value_type

From here:

void operator()(const typename T::value_type& obj)
{
_result.insert(obj);
}
void operator()(typename T::value_type&& obj)
{
_result.insert(LEXY_MOV(obj));
}

This can cause issues with collections that do not define a value_type when it does not map sense, such as a trie. An alternative would be to try to optionally detect value_type, or just defer to emplace since it can accept value_type itself, or construct it in place.

Also, this line function can end up invoking initializer_list overloads for collections that support them, which can end up as a compiler error:

constexpr auto sink() const
{
return _sink{};
}

An alternative would be to simply construct it as _sink().

Assertion failure during trace

test_main: /home/benjamin/workspace/fluent-cpp/build/lexy/include/lexy_ext/input_location.hpp:155: constexpr lexy_ext::input_location_finder<Input, TokenColumn, TokenLine>::location lexy_ext::input_location_finder<Input, TokenColumn, TokenLine>::find(lexy_ext::input_location_finder<Input, TokenColumn, TokenLine>::iter
ator, const lexy_ext::input_location_finder<Input, TokenColumn, TokenLine>::location&) const [with Input = lexy::string_inputlexy::utf8_encoding; TokenColumn = lexy_ext::_unchecked_code_unit; TokenLine = lexyd::_nl; lexy_ext::input_location_finder<Input, TokenColumn, TokenLine>::iterator = const char8_t*]: Assertion
`false' failed.

Can be reproduced using the input file crlf.ftl and fluent-cpp at the same commit.

Note that to get the trace you can use the test_main executable, with #define DEBUG_PARSER added to src/parser.cpp.

cpp_comment example in playground is incomplete

Current cpp_comment example implementation:

struct production
{
    // Note that `dsl::eol` matches EOF as well, so `.or_eof()` is implied.
    static constexpr auto rule = LEXY_LIT("//") + dsl::until(dsl::eol);
};

The current implementation doesn't take into account the backslash (\):

// in C++ \
the second line will be also recognized as comment, because of backslash, \
but cpp_comment example doesn't handle that. (this line is also a comment)

Add support for binary parsers

This is a feature request to add support for binary parsers similar to Spirit X3's binary parsers. I have been keeping an eye on this library for quite a while now and really enjoy working with it, especially because of the parser error handling, which is just nonexistent In most other parser libraries.

A major roadblock in adoption of lexy for me is the lack of utilities for working with binary input. How would I go about adding building blocks to the DSL for that, e.g., have a literal in the DSL representing a known integer that is represented as a 16 bit big-endian binary? E.g., I have a binary format that contains a version prefix at the very beginning that is one of a set of a supported values. Some form of dsl::big_word_lit<1> | dsl::big_word_lit<2> | dsl::error<unknown_version>.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.