Code Monkey home page Code Monkey logo

pb-jelly's Introduction

pb-jelly

by

pb-jelly is a protobuf code generation framework for the Rust language developed at Dropbox.

History

This implementation was initially written in 2016 to satisfy the need of shuffling large amount of bytes in Dropbox's Storage System (Magic Pocket). Previously, we were using rust-protobuf (and therefore generated APIs are exactly the same to make migration easy) but serializing Rust structs to proto messages, and then serializing them again in our RPC layer, meant multiple copies (and same thing in reverse on parsing stack). Taking control of this implementation and integrating it in our RPC stack end-to-end helped avoid these extra copies.

Over the years, the implementation has grown and matured and is currently used in several parts of Dropbox, including our Sync Engine, and the aforementioned Magic Pocket.

Other implementations exist in the Rust ecosystem (e.g. prost and rust-protobuf), we wanted to share ours as well.


Crates.io Documentation Crates.io Build Status

Features

  • Functional "Rust-minded" proto extensions, e.g. [(rust.box_it)=true]
  • Scalable - Generates separate crates per module, with option for crate-per-directory
    • Autogenerates Cargo.toml
  • Support for Serde (not compliant with the JSON protobuf specification)
  • Zero-copy deserialization with Bytes via a proto extension [(rust.zero_copy)=true]
  • Automatically boxes messages if it finds a recursive message definition
  • Retains comments on proto fields
  • Supports proto2 and proto3 syntaxes

Extensions

Extension Description Type Example
(rust.zero_copy)=true Generates field type of Lazy<bytes::Bytes> for proto bytes fields to support zero-copy deserialization Field zero_copy
(rust.box_it)=true Generates a Box<Message> field type Field box_it
(rust.type)="type" Generates a custom field type Field custom_type
(rust.preserve_unrecognized)=true Preserves unrecognized proto fields into an _unrecognized struct field Field TODO
(rust.nullable_field)=false Generates non-nullable fields types Field TODO
(rust.nullable)=false Generates oneofs as non-nullable (fail on deserialization) Oneof non_optional
(rust.err_if_default_or_unknown)=true Generates enums as non-zeroable (fail on deserialization) Enum non_optional
(rust.closed_enum)=true Generates only a "closed" enum which will fail deserialization for unknown values, but is easier to work with in Rust Enum TODO
(rust.serde_derive)=true Generates serde serializable/deserializable messages File serde

Using pb-jelly in your project

Multiple crates, multiple languages, my oh my!

Essential Crates

There are only two crates you'll need: pb-jelly and pb-jelly-gen.

pb-jelly

Contains all of the important traits and structs that power our generated code, e.g. Message and Lazy. Include this as a dependency, e.g.

[dependencies]
pb-jelly = "0.0.16"
pb-jelly-gen

A framework for generating Rust structs and implementations for .proto files. In order to use pb-jelly, you need to add the pb-jelly-gen as a plugin to your protoc invocation.

We added some code here to handle the protoc invocation if you choose to use it. You'll need to add a generation crate (see examples_gen for an example) Include pb-jelly-gen as a dependency of your generation crate, and cargo run to invoke protoc for you.

[dependencies]
pb-jelly-gen = "0.0.16"

Eventually, we hope to eliminate the need for a generation crate, and simply have generation occur inside a build.rs with pb-jelly-gen as a build dependency. However rust-lang/cargo#8709 must be resolved first.

Note that you can always invoke protoc on your own (for example if you are already doing so to generate for multiple languages) with --rust_out=codegen.py as a plugin for rust.

Generating Rust Code

  1. Install protoc, the protobuf compiler.

To generate with pb-jelly-gen

  1. Create an inner (build-step) crate which depends on pb-jelly-gen. Example
  2. cargo run in the directory of the inner generation crate

To generate manually with protoc

  1. cargo build in pb-jelly-gen
  2. protoc --plugin=protoc-gen-jellyrust=pb-jelly-gen/target/debug/protoc-gen-jellyrust --jellyrust_out=generated/ input.proto

Example

Take a look at the examples crate to see how we leverage pb-jelly-gen and build.rs to get started using protobufs in Rust!


Non-essential Crates

  • pb-test contains integration tests and benchmarks. You don't need to worry about this one unless you want to contribute to this repository!
  • examples contains some examples to help you get started

A Note On Scalability ๐Ÿ“

We mention "scalabilty" as a feature, what does that mean? We take an opinionated stance that every module should be a crate, as opposed to generating Rust files 1:1 with proto files. We take this stance because rustc is parallel across crates, but not yet totally parallel within a crate. When we had all of our generated Rust code in a single crate, it was often that single crate that took the longest to compile. The solution to these long compile times, was creating many crates!


The Name ๐ŸŒ 

pb-jelly is a shoutout to the jellyfish known for its highly efficient locomotion. This library is capable of highly efficient locomotion of deserialized data. Also a shoutout to ability of the jellyfish to have substantial increases in population. This library handles generating a very large number of proto modules with complex dependencies, by generating to multiple crates.

We also like the popular sandwich.

Contributing

First, contributions are greatly appreciated and highly encouraged. For legal reasons all outside contributors must agree to Dropbox's CLA. Thank you for your understanding.



Upcoming

Some of the features here require additional tooling to be useful, which are not yet public.

  • Spec.toml is a stripped down templated Cargo.toml - which you can script convert into Cargo.toml in order to get consistent dependency versions in a multi-crate project. Currently, the script to convert Spec.toml -> Cargo.toml isn't yet available

Closed structs with public fields

  • Adding fields to a proto file will lead to compiler errors. This can be a benefit in that it allows the compiler to identify all callsites that may need to be visited. However, it can make updating protos with many callsites a bit tedious. We opted to go this route to make it easier to add a new field and update all callsites with assistance from the compiler.

Service Generation

  • Generating stubs for gPRC clients and servers

Running the pbtest unit tests

  1. Clone Repo.
  2. Install Dependencies / Testing Dependencies. Use the appropriate package manager for your system.
    • protoc - part of Google's protobuf tools
      • macos: brew install protobuf
      • Linux (Fedora/CentOS/RHEL): dnf install protobuf protobuf-devel
      • Linux (Ubuntu): apt install protobuf-compiler
  3. pb-jelly currently uses an experimental test framework that requires a nightly build of rust.
    • rustup default nightly
  4. cd pb-test
  5. ( cd pb_test_gen ; cargo run ) ; cargo test

Contributors

Dropboxers [incl former]

Non-Dropbox

Similar Projects

rust-protobuf - Rust implementation of Google protocol buffers
prost - PROST! a Protocol Buffers implementation for the Rust Language
quick-protobuf - A rust implementation of protobuf parser
serde-protobuf
protokit

pb-jelly's People

Contributors

benjaminp avatar cyang1 avatar ddeville avatar goffrie avatar grahamking avatar isho avatar nipunn1313 avatar parkertimmerman avatar parkmycar avatar rsmuthu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pb-jelly's Issues

pb-jelly-gen does not work on Native Windows

Creating a -gen create using pb-jelly-gen does not work on Windows due to unix-specific imports.

Output of cargo run --bin proto-gen (proto-gen is the gen crate in a cargo workspace):

   Compiling pb-jelly-gen v0.0.4
error[E0433]: failed to resolve: could not find `unix` in `os`
  --> <home dir>\.cargo\registry\src\github.com-1ecc6299db9ec823\pb-jelly-gen-0.0.4\src\lib.rs:42:9
   |
42 |     os::unix::fs::PermissionsExt,
   |         ^^^^ could not find `unix` in `os`

error[E0599]: no method named `set_mode` found for struct `std::fs::Permissions` in the current scope
   --> <home dir>\.cargo\registry\src\github.com-1ecc6299db9ec823\pb-jelly-gen-0.0.4\src\lib.rs:273:29
    |
273 |                 permissions.set_mode(0o777);
    |                             ^^^^^^^^ method not found in `std::fs::Permissions`

error: aborting due to 2 previous errors

Some errors have detailed explanations: E0433, E0599.
For more information about an error, try `rustc --explain E0433`.
error: could not compile `pb-jelly-gen`.

Example honoring required and optional

It is very confusing to me.

  • Command::default() allows creating an object with everything missing
  • Command { ts: 1 } complains that I will have to pass all values.
  • I have to always pass a Some (regardless required or optional) for each field (this is actually the real issue)

Is there no way I could initialize Command where ts is not Option?

Code (proto 2)

message Command {
    required uint64 ts = 1;
    required CommandType command = 2;
    optional uint64 trace_id = 3;
    optional Service service = 4;
}

Migrate benchmarks to criterion

Given the current benchmark testing feature requires the nightly compiler, we should consider migrating to criterion.

Whoever handles this task should investigate if the unstable test feature is going to be stabilized any time soon (soon is mauybe < 6 months?) and if it isn't, then we should migrate to criterion.

Cleanup `pb-gen`

The API for pb-gen was written kind of hastily, we should remove the logging that is part of the implementation, and expose a properly documented builder struct so the code generation can have more options.

Investigate performance of copy during deserialization

In the benchmark tests under pb-test we create a proto message VecData that has a single field data that is a Vec<u8>. The performance of our deserialization compared to that of PROST! is much worse. In my benchmarking it's about 4,000,000 ns vs 350,000 ns. We should figure out where the slow down is and improve this

Display errors from `protoc` in a more friendly way in `pb-gen`

Currently if protoc fails then gen_protos(...) from pb-gen silently succeeds. The only sign that code generation failed is your generated files are gone because we clean the directories before calling protoc. You can see the error protoc emits by running cargo build -vv and scrolling through the output, but this isn't very friendly.

The task is to update pb-gen such that when protoc fails then the build step should fail too with the error from protoc.

Migrate codegen.py to Python3

Python2 is dead, long live python2!

codegen.py in the pb-jelly-gen crate is a protoc plugin that powers the Rust code generation. We should migrate this to python3

Run CI in windows

Make sure our tests pass on windows + ensure library continues to work on windows

codegen.py assumes proto files in a subdirectory

protos/myproto.proto fails with a confusing error message (see below), but moving it into protos/myproto/myproto.proto will codegen successfully. The comment here right above the error sort of suggests its not expected.

[/home/jordi/.cargo/registry/src/github.com-1ecc6299db9ec823/pb-jelly-gen-0.0.4/src/lib.rs:169] output.status.code() = Some(
    1,
)
stdout=
stderr=Traceback (most recent call last):
  File "/tmp/codegenMIlOse/codegen.py", line 1783, in <module>
    generate_code(request, response)
  File "/tmp/codegenMIlOse/codegen.py", line 1768, in generate_code
    generate_single_crate(ctx, "", to_generate, response)
  File "/tmp/codegenMIlOse/codegen.py", line 1683, in generate_single_crate
    mod_name = mod_parts[-1]
IndexError: list index out of range
--rust_out: protoc-gen-rust: Plugin failed with status code 1.

From discussion over in #77:

one option
provide a better error message - telling folks to move the proto file into a directory
support this case - perhaps picking a crate name to match the file name?
I haven't really thought it through to decide which case to prefer.
Feel free to fix it! It seems like it would be hard to add a unit test in our testing framework for error situations (first option) - > but we could certainly test the second option.

Create a `requirements.txt`

Our codegen script is Python2, for developers to be able to run it, they'll need to install a few packages. To make that easier we need to make a requirements.txt

Open source service generation

Currently there's some non-OSS service generation code at DBX which relies on this - which can come out as well (pb_service)

Refactor pb-rs into a workspace

After playing around with adding an example of how to use pb-rs in #7, I think making pb-rs a workspace with three crates, pb-rs, pb-gen, pb-tests would make it more usable/portable.

  1. pb-rs would contain pretty much everything currently under /src aka only the traits and structs necessary to support the generated code. This would cargo build with stable Rust and would require no other dependencies, this is what users would include in a Cargo.toml as a dependency.

  2. pb-gen would contain /codegen and a Rust wrapper around/replacement of gen_protos.sh. This is what users would include in a Cargo.toml as a build-dependency. To start this would required Rust, Python, Go, and protoc because of the way our gen_protos.sh works. We could remove the Go dependency relatively easily (I think), but building a more batteries included codegen is a different task.

  3. pb-tests would contain the more integration like tests between Rust and Go, and would also contain benchmarks. We could keep this using nightly Rust with the #![feature(test)] or compile it with stable by using criterion instead, but since this is only required for development, and the dev instructions are well documented, slimming the dependencies here is a low priority IMO.

Let me know what you think, would love some comments!

Drop Python2 Compatibility

Inline the mypy annotations
Remove reference to six library

We should wait until python2 end-of-lifes before completing this task!

Document grpc_slices/zero_copy/blob

Currently, codegen supports three different forms of ZC codegen

rust.grpc_slices=true
rust.zero_copy=true
rust.blob=true

Analyze/understand why these are here and what they are, and potentially simplify.

Too lenghty naming

I have a proto file called enums.proto

Is there a way not to use it like:

use proto_enums::proto_enums::Error;

but instead generate it with one of the following (or another suggestion)

use proto::enums::Error;
use proto_enums::enums::Error;

Update BUILD.bazel codegen to include `edition = "2018"`

This is pretty simple, in the codegen script we need to update the template for the rust_library bazel rule include edition = "2018", e.g. the rule should be:

rust_library(
    name = "{crate}",
    crate_type = "lib",
    edition = "2018",
)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.