Code Monkey home page Code Monkey logo

minipb's Introduction

MiniPB

This is minipb, a small header only protobuf writer/reader implemented for C++11 and relying only on the STL.

Why ?

Because all protobuf libraries I found fall in two categories:

  • High performance but bloated and complicated
  • Embedded (no dynamic allocations, fixed string widths)

MiniPB tries to hit somewhat of a middleground. It is implemented in a single header with no dependencies except the stl. The generated code is relatively compact (~4 lines of C++ per protobuf line) and the compiled binary is small enough to be considered noise in any reasonably sized C++ App. It's not the fastest protobuf library out there, but I have written it with performance in mind, so I better hope its not the slowest either. It does use dynamic memory where needed (allocating strings and submessages), but avoids them otherwise. It doesn't implement all of the features protobuf supports (e.g. enums, oneof, ...), but everything important (scalars, messages, string and bytes as well as repeated and packed variants of them) are supported. As far as I know it is fully standards compliant for the features it supports.

Compiling proto files

Minipb comes with a plugin for protoc similar to grpc, which can be used to generate implementation files for a proto source.

Example

Given the following proto file

syntax = "proto3";

message my_message {
    string field1 = 1;
    my_message field2 = 2;
    repeated float field3 = 3;
}

Minipb will generate the following c++ type in the header file

struct my_message {
  size_t estimate_size() const noexcept;
  ::minipb::result encode(::minipb::msg_builder& b) const noexcept;
  ::minipb::result decode(::minipb::msg_parser& p) noexcept;

  std::string field1{};
  std::unique_ptr<my_message> field2{};
  std::vector<float> field3{};
};

Generated code

And an additional C++ Source file with the implementations of those functions.

// Header contents is repeated here
size_t my_message::estimate_size() const noexcept {
  size_t size {0};
  size += this->field1.size();
  if(this->field2) size += this->field2->estimate_size() + 10 + 1;
  size += 5 * (this->field3.size());
  size += 11;
  return size;
}

::minipb::result my_message::encode(::minipb::msg_builder& b) const noexcept {
  b.string_field(1, this->field1);
  { if(this->field2) b.message_field(2, *this->field2); }
  b.packed_fixed32_field(3, this->field3);
  return b.last_error();
}

::minipb::result my_message::decode(::minipb::msg_parser& p) noexcept {
  minipb::result res = p.next_field();
  while (res == minipb::result::ok) {
    switch (p.field_id()) {
      case 1: res = p.string_field(this->field1); break;
      case 2: {
        if(!this->field2) this->field2 = std::make_unique<my_message>();
        p.message_field(*this->field2);
      } break;
      case 3: res = p.repeated_float_field(this->field3); break;
      default: res = p.skip_field(); break;
    }
    if (p.is_eof()) break;
    res = p.next_field();
  }
  return res;
}

Note that there are no virtual functions or inheritance. Since all needed information is available at compile time there is no need for them. The function estimate_size() returns a worst case estimate for the serialized size of the message with its current contents and is used for all messages to support serialization. If you consider implementing it yourself, you can simply return 0 and it will still work (but might increase the serialized size). The function can (and usually will) return more than is actually needed for the message, but the message is guaranteed to fit in the returned space. You can therefore use it to (stack-) allocate a buffer to hold the serialized message.

The function encode() can be used to - surprise - encode the message into its serialized form. The only reason it can fail is if theres not enough memory left in the buffer provided or the backing container failed to reallocate (out of memory). Assuming you provided at least estimate_size() bytes, this function is not supposed to fail.

The last function decode() is used to decode a serialized message and fill the struct with its information. Assuming the provided buffer contains a complete and valid protobuf message this should not fail, however it might in case of a schema missmatch or otherwise bad input data. In case an error is returned, the function might leave the message struct in an partially filled state.

Extending

By default the libary can use both preallocated raw arrays, as well as selected stl containers for both input and output. However you can add a custom implementation in order to support whatever datatype/device you need.

class input_stream {
protected:
	~input_stream() = default;
public:
	virtual result read(void* data, size_t data_size) noexcept = 0;
	virtual result skip(size_t data_size) noexcept = 0;
	virtual size_t peek(void*, size_t) noexcept { return 0; }
	virtual size_t bytes_available() const noexcept = 0;
};

read() is used to read the given amount of data into the buffer. If the requested amount exceeds the available data an error should be returned. Minipb will then bubble up the error all the way to user code. skip() is similar to read but ignores the data. peek can be implemented in case the data provider supports it and looking ahead without actually reading the data is cheap (e.g. in a contigous array). Peek data will not be removed from the stream and needs to get returned at a subsequent read() or skipped. Peeks do not accumulate, meaning that two calls to peek() without an intermediate read() or skip() should return the same data. If peek() is not supported by the backend, you can return 0, otherwise return the size of the read data. If peek returns less than data_size, but not 0 the library assumes eof after the returned size. In case 0 is returned minipb will fall back to doing (potentially lots of) single byte reads. A pattern that can be seen often is the library doing a peek for 10 bytes (max size of a varint) followed by a skip of the actual varint size. This avoids having to do up to 10 single byte reads. bytes_available() should return the remaining number of bytes left in the serialized message. Because protobuf has no indication of record end, minipb will try to parse data until bytes_available() is 0.

class output_stream {
protected:
	~output_stream() = default;
public:
	virtual size_t position() const noexcept = 0;
	virtual result write(const void* data, size_t data_size) noexcept = 0;
	virtual result write_at(size_t pos, const void* data, size_t data_size) noexcept = 0;
};

The output stream is used when serializing a message to a byte stream. It consists of 3 functions, all of which need to get implemented. The position() function should return the total number of bytes written using this stream. It is used to calculate sizes and as a marker for calling write_at(). The function write() is called whenever the library wants to emit some data. The function should write all the provided data or return an error if it can't. write_at() is used to overwrite a previously written block of data. This is needed since the exact size of a submessage is not known until after it has been serialized in order to patch the length field in the header. Protobuf supports an alternate method of delimiting messages called "groups" which would allow us to skip the patching, however they have already been deprecated when protobuf was publically released and as a result many implementations don't support them. I might provide a option to use them instead of length delimited messages in the future, allowing for true buffer less forward only serialization at the cost of compatibility.

minipb's People

Contributors

thalhammer avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.