Code Monkey home page Code Monkey logo

protobuf's Introduction

pure-protobuf

Python implementation of Protocol Buffers data types.

Build Status Coverage Status PyPI - Downloads PyPI โ€“ Version PyPI โ€“ Python License

Dataclasses

pure-protobuf allows you to take advantages of the standard dataclasses module to define message types. It is preferred over the legacy interface for new projects. The dataclasses interface is available in Python 3.6 and higher.

The legacy interface is deprecated and still available via pure_protobuf.legacy.

This guide describes how to use pure-protobuf to structure your data. It tries to follow the standard developer guide. It also assumes that you're familiar with Protocol Buffers.

Defining a message type

Let's look at the simple example. Here's how it looks like in proto3 syntax:

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

And this is how you define it with pure-protobuf:

# Python 3.6+

from dataclasses import dataclass

from pure_protobuf.dataclasses_ import field, message
from pure_protobuf.types import int32


@message
@dataclass
class SearchRequest:
    query: str = field(1, default='')
    page_number: int32 = field(2, default=int32(0))
    result_per_page: int32 = field(3, default=int32(0))
   

assert SearchRequest(
    query='hello',
    page_number=int32(1),
    result_per_page=int32(10),
).dumps() == b'\x0A\x05hello\x10\x01\x18\x0A'

Keep in mind that @message decorator should always stay on top of @dataclass.

Serializing

Each class wrapped with @message gets two methods attached:

  • dumps() -> bytes to serialize message into a byte string
  • dump(io: IO) to serialize message into a file-like object

Deserializing

Each classes wrapped with @message gets two class methods attached:

  • loads(bytes_: bytes) -> TMessage to deserialize a message from a byte string
  • load(io: IO) -> TMessage to deserialize a message from a file-like object

These methods are also available as standalone functions in pure_protobuf.dataclasses_:

  • load(cls: Type[T], io: IO) -> T
  • loads(cls: Type[T], bytes_: bytes) -> T

Specifying field types

In pure-protobuf types are specified with type hints. Native Python float, str, bytes and bool types are supported. Since other Protocol Buffers types don't exist as native Python types, the package uses NewType to define them. They're available via pure_protobuf.types and named in the same way.

Assigning field numbers

Field numbers are provided via the metadata parameter of the field function: field(..., metadata={'number': number}). However, to improve readability and save some characters, pure-protobuf provides a helper function pure_protobuf.dataclasses_.field which accepts field number as the first positional parameter and just passes it to the standard field function.

Specifying field rules

typing.List and typing.Iterable annotations are automatically converted to repeated fields. Repeated fields of scalar numeric types use packed encoding by default:

# Python 3.6+

from dataclasses import dataclass
from typing import List

from pure_protobuf.dataclasses_ import field, message
from pure_protobuf.types import int32


@message
@dataclass
class Message:
    foo: List[int32] = field(1, default_factory=list)

It's also possible to wrap a field type with typing.Optional. If None is assigned to an Optional field, then the field will be skipped during serialization.

Default values

In pure-protobuf it's developer's responsibility to take care of default values. If encoded message does not contain a particular element, the corresponding field stays unassigned. It means that the standard default and default_factory parameters of the field function work as usual:

# Python 3.6+

from dataclasses import dataclass
from typing import Optional

from pure_protobuf.dataclasses_ import field, message
from pure_protobuf.types import int32


@message
@dataclass
class Foo:
    bar: int32 = field(1, default=42)
    qux: Optional[int32] = field(2, default=None)


assert Foo().dumps() == b'\x08\x2A'
assert Foo.loads(b'') == Foo(bar=42)

In fact, the pattern qux: Optional[int32] = field(2, default=None) is so common that there's a convenience function optional_field to define an Optional field with None value by default:

# Python 3.6+

from dataclasses import dataclass
from typing import Optional

from pure_protobuf.dataclasses_ import optional_field, message
from pure_protobuf.types import int32


@message
@dataclass
class Foo:
    qux: Optional[int32] = optional_field(2)


assert Foo().dumps() == b''
assert Foo.loads(b'') == Foo(qux=None)

Enumerations

Subclasses of the standard IntEnum class are supported:

# Python 3.6+

from dataclasses import dataclass
from enum import IntEnum

from pure_protobuf.dataclasses_ import field, message


class TestEnum(IntEnum):
    BAR = 1


@message
@dataclass
class Test:
    foo: TestEnum = field(1)


assert Test(foo=TestEnum.BAR).dumps() == b'\x08\x01'
assert Test.loads(b'\x08\x01') == Test(foo=TestEnum.BAR)

Using other message types

Embedded messages are defined the same way as normal dataclasses:

# Python 3.6+

from dataclasses import dataclass

from pure_protobuf.dataclasses_ import field, message
from pure_protobuf.types import int32


@message
@dataclass
class Test1:
    a: int32 = field(1, default=0)


@message
@dataclass
class Test3:
    c: Test1 = field(3, default_factory=Test1)


assert Test3(c=Test1(a=int32(150))).dumps() == b'\x1A\x03\x08\x96\x01'

Well-known message types

pure_protobuf.google also provides built-in definitions for the following well-known message types:

Annotation pure_protobuf.types.google .proto
datetime Timestamp Timestamp
timedelta Duration Duration
typing.Any Any_ Any

They're handled automatically, you have nothing to do but use them normally in type hints:

# Python 3.6+

from dataclasses import dataclass
from datetime import datetime
from typing import Optional

from pure_protobuf.dataclasses_ import field, message


@message
@dataclass
class Test:
    timestamp: Optional[datetime] = field(1, default=None)

Since pure-protobuf is not able to download or parse .proto definitions, it provides a limited implementation of the Any message type. That is, you still have to define all message classes in the usual way. Then, pure-protobuf will be able to import and instantiate an encoded value:

# Python 3.6+

from dataclasses import dataclass
from typing import Any, Optional

from pure_protobuf.dataclasses_ import field, message
from pure_protobuf.types.google import Timestamp


@message
@dataclass
class Message:
    value: Optional[Any] = field(1)


# Here `Timestamp` is used just as an example, in principle any importable user type works.
message = Message(value=Timestamp(seconds=42))
assert Message.loads(message.dumps()) == message

Legacy interface

The legacy interface is deprecated and stays in "maintanance mode" for Python 2 users. It will be removed one day. New projects should strongly consider using the dataclasses.

Assume you have the following definition:

message Test2 {
  string b = 2;
}

This is how you can create a message and get it serialized:

from io import BytesIO

from pure_protobuf.legacy import MessageType, Unicode

# Create the type instance and add the field.
type_ = MessageType()
type_.add_field(2, 'b', Unicode)

message = type_()
message.b = 'testing'

# Dump into a string.
assert message.dumps() == b'\x12\x07testing'

# Dump into a file-like object.
fp = BytesIO()
message.dump(fp)

# Load from a string.
assert type_.loads(message.dumps()) == message

# Load from a file-like object.
fp.seek(0)
assert type_.load(fp) == message

Required field

To add a missing field you should pass an additional flags parameter to add_field like this:

from pure_protobuf.legacy import Flags, MessageType, Unicode

type_ = MessageType()
type_.add_field(2, 'b', Unicode, flags=Flags.REQUIRED)

message = type_()
message.b = 'hello, world'

assert type_.dumps(message)

If you'll not fill in a required field, then ValueError will be raised during serialization.

Repeated field

from pure_protobuf.legacy import Flags, MessageType, UVarint

type_ = MessageType()
type_.add_field(1, 'b', UVarint, flags=Flags.REPEATED)

message = type_()
message.b = (1, 2, 3)

assert type_.dumps(message)

Value of a repeated field can be any iterable object. The loaded value will always be list.

Packed repeated field

from pure_protobuf.legacy import Flags, MessageType, UVarint

type_ = MessageType()
type_.add_field(4, 'd', UVarint, flags=Flags.PACKED_REPEATED)

message = type_()
message.d = (3, 270, 86942)

assert type_.dumps(message)

Embedded messages

message Test1 {
  int32 a = 1;
}

message Test3 {
  required Test1 c = 3;
}

To create an embedded field, wrap inner type with EmbeddedMessage:

from pure_protobuf.legacy import EmbeddedMessage, MessageType, UVarint

inner_type = MessageType()
inner_type.add_field(1, 'a', UVarint)
outer_type = MessageType()
outer_type.add_field(3, 'c', EmbeddedMessage(inner_type))

message = outer_type()
message.c = inner_type()
message.c.a = 150

assert outer_type.dumps(message)

Data types

Type Python Description
UVarint int unsigned integer (variable length)
Varint int signed integer (variable length)
Bool bool boolean
Fixed64 bytes 8-byte string
UInt64 int C 64-bit unsigned long long
Int64 int C 64-bit long long
Float64 float C double
Fixed32 bytes 4-byte string
UInt32 int C 32-bit unsigned int
Int32 int C 32-bit int
Float32 float C float
Bytes bytes byte string
Unicode str unicode string

Some techniques

Streaming messages

The Protocol Buffers format is not self-delimiting. But you can wrap your message type with EmbeddedMessage and write or read messages sequentially.

add_field chaining

add_field return the message type itself, thus you can do so:

from pure_protobuf.legacy import EmbeddedMessage, MessageType, UVarint

MessageType().add_field(1, 'a', EmbeddedMessage(MessageType().add_field(1, 'a', UVarint)))

protobuf's People

Contributors

eigenein avatar bbayles avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.