Code Monkey home page Code Monkey logo

Comments (10)

lambdafu avatar lambdafu commented on May 16, 2024

Or maybe range<> can just be fixed to work in this case, as arguably may be what one expects.

from pegtl.

d-frey avatar d-frey commented on May 16, 2024

I see the problem, but I don't think fixing the current range would be a good idea. It is in the (inline) namespace for ASCII, which is only guaranteed to work for 7 bit values. But one could argue that range< A, B > should be equivalent to !range< B, A > when A > B. Although I'm not sure if casting 0xBF to char is even allowed (and clearly defined) by the standard.

We should support binary data properly, but how? C++17 gave us std::byte, we have a C++11 implementation over at taocpp/json which could be moved to the PEGTL. We could then use tao::byte (which is just an alias for std::byte it the latter is available). But defining the ranges, etc. would probably require more boilerplate code (typing std::byte(0xBF) instead of just 0xBF, ...). OTOH, using unsigned char might also be an option. What do you think? Would std::byte be the right choice?

from pegtl.

d-frey avatar d-frey commented on May 16, 2024

Also, if we don't support range< A, B > when A > B I should add a static_assert(). This would at least catch those issues at compile time rather than leading to subtle bugs. I'll commit this for now, we can still assign a meaning to these cases later if we decide to support it.

from pegtl.

lambdafu avatar lambdafu commented on May 16, 2024

The static assert will definitely be helpful. The rules reference doc says "The ASCII rules operate on single bytes, without restricting the range of values to 7 bits. They are compatible with input with the 8th bit set in the sense that nothing breaks in their presence." That's maybe why I got the impression I could just use it.

Personally, I use uint8_t. Tradition would probably suggest unsigned char, but technically that's different (it could be 9 bit, and has been in the past). I didn't know about std::byte until now, but I just read up on it and it is not an arithmetic type (it's defined as an enum of class unsigned char). Sounds painful to use in practice :) but at least for std::byte(0xBF) one could use a user-defined literal suffix, so it's not all bad.

It is pretty simple to roll your own rules, and when working with binary bit streams, there are some other things that are useful, for example tests of individual bits or group of bits. Currently I am using my own bin_range<C,D> and mask_cmp<Mask, From, To=0> rules. In the latter, To defaults to From and the check is (byte & Mask) >= From && (byte & Mask) <= To.

from pegtl.

ColinH avatar ColinH commented on May 16, 2024

One issue is that the C++ standard does not specify whether char is signed or unsigned; of course when dealing with binary data you usually want your bytes to behave like a uint8_t and not like a int8_t.

Which is why class tao::pegtl::memory_input also has a peek_byte() method; I have used the PEGTL for binary data on a couple of occasions, but not yet needed binary parsing rules like range<>.

I'll look into whether we can simply add tao::pegtl::internal::peek_byte as basis for binary range<> etc. rules as soon as I have time for it... What other rules are you using for binary? Is it possible to see your source?

from pegtl.

lambdafu avatar lambdafu commented on May 16, 2024

@ColinH You can find the current code here: https://github.com/das-labor/neopg/blob/master/lib/parser/openpgp.cpp
It implements a simple TLV parser for the OpenPGP packet format (without interpreting the content of each packet), which is complicated by the fact that it is using bitfields and two versions of tags (one version includes a length type specifier, too).
I will write parsers for each individual packet type, too, on top of that. Because some packets can be very long (even indefinitely long), and OpenPGP is traditionally a streaming format, I decided to use buffered input. Luckily, OpenPGP does not require any real backtracking (except for peeking ahead into the bitfields).

from pegtl.

ColinH avatar ColinH commented on May 16, 2024

@lambdafu Thanks, that will help to see what's going on. We also just added a few rules for binary data, including for ranges but without masking (yet), that work with 8/16/32/64 bit values, see commit e04dcbe for the details, it might not be finished yet which is also why it's not documented yet.

from pegtl.

ColinH avatar ColinH commented on May 16, 2024

We now also added masked versions of the binary rules, see documentation, and please let us know if there is anything fundamental missing.

from pegtl.

d-frey avatar d-frey commented on May 16, 2024

Wrong issue-number in commit message above, please ignore :)

from pegtl.

ColinH avatar ColinH commented on May 16, 2024

I just fixed a bug in tao::pegtl::uint8::mask_ranges<>, it didn't actually apply the mask; hope it didn't create any issues, also tao::pegtl::uint8::mask_range<> did not have a bug.

from pegtl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.