Comments (10)
Or maybe range<> can just be fixed to work in this case, as arguably may be what one expects.
from pegtl.
I see the problem, but I don't think fixing the current range would be a good idea. It is in the (inline) namespace for ASCII, which is only guaranteed to work for 7 bit values. But one could argue that range< A, B >
should be equivalent to !range< B, A >
when A > B. Although I'm not sure if casting 0xBF to char
is even allowed (and clearly defined) by the standard.
We should support binary data properly, but how? C++17 gave us std::byte
, we have a C++11 implementation over at taocpp/json which could be moved to the PEGTL. We could then use tao::byte
(which is just an alias for std::byte
it the latter is available). But defining the ranges, etc. would probably require more boilerplate code (typing std::byte(0xBF)
instead of just 0xBF
, ...). OTOH, using unsigned char
might also be an option. What do you think? Would std::byte
be the right choice?
from pegtl.
Also, if we don't support range< A, B >
when A > B I should add a static_assert()
. This would at least catch those issues at compile time rather than leading to subtle bugs. I'll commit this for now, we can still assign a meaning to these cases later if we decide to support it.
from pegtl.
The static assert will definitely be helpful. The rules reference doc says "The ASCII rules operate on single bytes, without restricting the range of values to 7 bits. They are compatible with input with the 8th bit set in the sense that nothing breaks in their presence." That's maybe why I got the impression I could just use it.
Personally, I use uint8_t
. Tradition would probably suggest unsigned char
, but technically that's different (it could be 9 bit, and has been in the past). I didn't know about std::byte
until now, but I just read up on it and it is not an arithmetic type (it's defined as an enum
of class unsigned char
). Sounds painful to use in practice :) but at least for std::byte(0xBF)
one could use a user-defined literal suffix, so it's not all bad.
It is pretty simple to roll your own rules, and when working with binary bit streams, there are some other things that are useful, for example tests of individual bits or group of bits. Currently I am using my own bin_range<C,D>
and mask_cmp<Mask, From, To=0>
rules. In the latter, To
defaults to From
and the check is (byte & Mask) >= From && (byte & Mask) <= To
.
from pegtl.
One issue is that the C++ standard does not specify whether char is signed or unsigned; of course when dealing with binary data you usually want your bytes to behave like a uint8_t
and not like a int8_t
.
Which is why class tao::pegtl::memory_input
also has a peek_byte()
method; I have used the PEGTL for binary data on a couple of occasions, but not yet needed binary parsing rules like range<>
.
I'll look into whether we can simply add tao::pegtl::internal::peek_byte
as basis for binary range<>
etc. rules as soon as I have time for it... What other rules are you using for binary? Is it possible to see your source?
from pegtl.
@ColinH You can find the current code here: https://github.com/das-labor/neopg/blob/master/lib/parser/openpgp.cpp
It implements a simple TLV parser for the OpenPGP packet format (without interpreting the content of each packet), which is complicated by the fact that it is using bitfields and two versions of tags (one version includes a length type specifier, too).
I will write parsers for each individual packet type, too, on top of that. Because some packets can be very long (even indefinitely long), and OpenPGP is traditionally a streaming format, I decided to use buffered input. Luckily, OpenPGP does not require any real backtracking (except for peeking ahead into the bitfields).
from pegtl.
@lambdafu Thanks, that will help to see what's going on. We also just added a few rules for binary data, including for ranges but without masking (yet), that work with 8/16/32/64 bit values, see commit e04dcbe for the details, it might not be finished yet which is also why it's not documented yet.
from pegtl.
We now also added masked versions of the binary rules, see documentation, and please let us know if there is anything fundamental missing.
from pegtl.
Wrong issue-number in commit message above, please ignore :)
from pegtl.
I just fixed a bug in tao::pegtl::uint8::mask_ranges<>
, it didn't actually apply the mask; hope it didn't create any issues, also tao::pegtl::uint8::mask_range<>
did not have a bug.
from pegtl.
Related Issues (20)
- How do I capture each substring at run time HOT 3
- MSVC: error C2338: static_assert failed: 'internal::dependent_true< T > && ( begin != std::string_view::npos ) HOT 3
- Example grammar proto3 does not accept enum fields starting from zero HOT 2
- data type of input? byte? character? HOT 3
- <ciso646> is removed in C++20 and should not be included HOT 1
- Feature Request: Add defines to exclude headers to improve compile time HOT 4
- parse_tree needs to be optimized HOT 8
- Does "pegtl" support the operation of binary data serialization/deserialization? HOT 2
- Why parsing succeeds? HOT 3
- Why do I have an infinite loop? HOT 4
- parser_tree.cpp example not compiling in VS 2022, as of PEGTL 3.2.6 HOT 4
- Order independence of rules HOT 7
- list_tail<> invokes action for trailing separator twice? HOT 10
- Parsing Binary Data Encounters Left recursion Problem HOT 9
- Any consideration of ghc::filesystem HOT 8
- Can't get custom error messages to work. HOT 8
- Backreferences and grammar tracing/analyzing. HOT 5
- vs2022 编译错误 x64-windows-static\include\tao\pegtl\parse.hpp(45,38): error C2062: 意外的类型“unknown-type”
- Issues with change_action_and_state HOT 1
- Inconsistent behaviour of `sor` for custom rules. HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pegtl.