microavia / messgen Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 11.0 750 KB

License: MIT License

CMake 1.50% Python 34.72% C++ 17.59% Go 0.76% Makefile 0.45% JavaScript 44.99%

messgen's Introduction

Messgen

Lightweight and fast message serialization library. Generates message classes/structs from yml scheme.

Features:

Embedded-friendly
Fixed size arrays
Dynamic size arrays
Nested messages
Messages metadata
Supported languages: C++, Go, JavaScript

Dependencies

python 3.X

On Linux:

sudo apt install python3

On Windows 10:

Download https://bootstrap.pypa.io/get-pip.py
Execute python3 get_pip.py
Execute pip3 install pyyaml

Generate messages

Each protocol should be placed in directory base_dir/vendor/protocol. base_dir is base directory for message definitions (is allowed to specify multiple base directories). vendor is protocol vendor, it is used as namespace in generated messages allowing to avoid conflict between protocols from different vendors if used in one application. protocol is protocol name, each protocol has protocol ID, that allows to use multiple protocols on single connection, e.g. bootloader and application protocols.

Message generator usage:

python3 generate.py -b <base_dir> -m <vendor>/<protocol> -l <lang> -o <out_dir> [-D variable=value]

For some languages it's necessary to specify some variables using -D option.

Generated messages placed in out_dir directory.

Go

Example for Go messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l go -o out/go -D messgen_go_module=example.com/path/to/messgen

Variable messgen_go_module must point to messgen Go module (port/go/messgen), to add necessary imports in generated messages.

C++

Example for C++ messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l cpp -o out/cpp

Variable metadata_json=true can be passed to generate metadata in JSON format, rather than legacy.

JS/TS

Example for JS messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l json -o out/json

This command will generate json messages.

The types of these messages for TS can be generated as follows:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l ts -o out/ts

if it is necessary to generate typed arrays for TS, it is necessary to pass the flag -D typed_arrays=true:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l ts -o out/ts -D typed_arrays=true

MD

Example for protocol documentation generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l md -o out/md

messgen's People

Contributors

Stargazers

Watchers

Forkers

misterjulian nameofuser1 zemledelec pavletto iptyshew sl-ru meded90 maxbebop pdavydov108 lachem vany

messgen's Issues

Add optional flag BITS into constants description

This will allow generating enum class for constants which are not bitmasks in C++ thus improving safety.

Ownership of dynamic values

Иногда полезно иметь возможность сохранить сообщение с динамическими полями. Есть предложение написать класс обертку над сообщением и сделать специализации функций сериализации и парсинга:

class Storage<T> {
    T msg;
     MemoryAllocator mem;
}

int parse<Storage<T>> (Storage<T> &msg, MessageInfo &info)  {
}

Пока конкретный интерфейс не продумывал, но в планах пофантазировать.

Несколько тонких моментов, о которых можно подумать:

В таком случае в общем-то передавать аллокатор/память отдельными аргументами возможно и нет смысла, тк он будет храниться внутри Storage.
Запретить вызовы parse на динамических сообщениях, если при вызове не используется Storage. Можно как-нибудь разделить два варианта вызова, чтобы не тащить лишние аргументы на сообщениях без динамических полей.
Когда будет очищаться аллокатор. Сейчас он конструируется при вызове parse и стирается после выхода, в итоге мануально его сбрасывать не нужно. Возможно в случае со Storage получится обойтись без создания объекта каждый раз и без мануального контроля. Из идей при вызове метода parse на Storage, вызывать функцию reset, которая будет сбрасывать аллокатор.

@ygorshkov @DrTon что думаете? Может есть другие пути?

Remove `size` from messgen header

The 4-byte size field in messages header is redundant and could be eliminated.
For messages without dynamic fields, size is known in beforehand.
For dynamic messages, the minimum static size is also known in beforehand, and the size of every dynamic filed is contained within the message payload.

Dump markdown formatting

Make dump generation optional, enabling by command line flag
Make possible generation of docs only, without messages
Format dump.txt as markdown
In generated dump add list of used IDs with message names

Сделать удобным поиск следующего свободного message id

Можно при генерации сообщений под каждый протокол, записывать свободные n id в dump.txt. В наш manet уже очень сложно добавлять сообщения становится.

enum class for C++ constants

Enum class instead of enum will allow more clear constant names without risk of collissions.

Changing messages comments/descriptions affects protocol version

Here is what protocol version calculator should take into account:

Msgs IDs and names
Order of msgs fields and their types
Fields names (e.g. changing single field name from "uint32 capacity_liters" to "uint32 capacity_ml" must change the hash)

Fields comments and message descriptions must not affect protocol version

Удобная работа со строками в C

Сейчас со строками работать неудобно. Нужно подумать над интерфейсом для работы с ними.

Есть два пути:

Сделать string альясом для массива байт в коде, а при парсинге дополнительно проверять терминатор. Тогда можно будет писать так:

const auto * str = static_cast<char *>(msg.my_string.ptr)

Сделать string полноценно отдельным типом и обрабатывать его тоже отдельно. Это вроде бы правильнее, но как его аккуратно вписать придется подумать, он вроде бы и не plain, и не встроенная структура, а в то же время и динамический.

Еще из предложений всегда передавать нул терминированную строку. Если нет терминатора, то на этапе генеренного парсинга возвращать ошибку.

Can't parse arrays which total size exceeds 65535 bytes

Parser.h accepts the len argument as uint32_t, but then it passes it to detail::Parser<T>::parse which accepts it as uint16_t:

messgen/port/cpp/messgen/Parser.h

Line 40 in ea97568

    
           static int parse(const uint8_t *buf, uint32_t len, MemoryAllocator & allocator, T& value) {

messgen/port/cpp/messgen/Dynamic.h

Line 57 in ea97568

    
           static int parse(const uint8_t* buf, uint16_t len, MemoryAllocator& allocator, Dynamic<T, false>& dynamic) {

This implicit type conversion uint32_t->uint16_t causes the bug that's not documented and is very hard to spot without deep debugging.

Golang port nested struct

Add support for nested structs in golang port.

Add support for generic type

Allow fields with dynamic types.
Keep backwards-compatibility.

Generate PROTOCOL_VERSION

For each generator, add PROTOCOL_VERSION calculator. This will make it easy to compare the protocol versions between the two peers.

Workaround

Here's a workaround we use now in our project:

the .yaml files are in the separate repo. Backend and frontend use this repo as submodule
upon build, hash of last commit in the submodule is considered a protocol version
on the first handshake request, Backend sends its protocol version to Frontend
Frontend checks the hash of the last commit of its submodule and compares the two

Workaround limitations

it is limited to one messgen protocol per submodule. If multiple protocols share a repo, they also share the last commit hash
unrelated changes in submodule (e.g. change README) also change the protocol version
when you make changes in your .yaml files without commiting them, the protocol is considered unchanged

Suggested approach

For each protocol, its version must be calculated as md5 of all .yaml files in protocol directory. Changing any .yaml file will change protocol md5. Using the truncated md5 (e.g. first 10 chars only) should also be fine.

For cpp generator, here's an example of what it should genarate in messages.h:

struct ProtoInfo {
    static constexpr uint8_t ID = 1;
    static constexpr uint32_t MAX_MESSAGE_SIZE = 24;
    static constexpr const char* VERSION = "i2e8nb0a";
};

C++ can't parse empty message

message::parse method returns number of parsed bytes. This leads to 0 return when message is empty thus messgen::parse function return -1 and we are in trouble. message::parse should return signed integer. This is strongly related to #11.

Obscure error messages when dir/proto not found

When you execute this command

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l cpp -o out/cpp

and some dir not found (base_dir/my_vendor/my_protocol), this is the output:

Traceback (most recent call last):
  File "/tmp/test/messgen/generate.py", line 148, in <module>
    main()
  File "/tmp/test/messgen/generate.py", line 132, in main
    data_types_map = data_types_preprocessor.create_types_map(modules_map)
  File "/tmp/test/messgen/messgen/data_types_preprocessor.py", line 60, in create_types_map
    self.__create_lookup_messages_set(modules_map)
  File "/tmp/test/messgen/messgen/data_types_preprocessor.py", line 85, in __create_lookup_messages_set
    if module["proto_id"] >= self.MAX_PROTO_ID:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

This traceback does not reveal any information regarding what may be the cause of the problem.

Furthermore, it will be nice to also output a success message if the generation is completed successfully.

Create cpp to js serialization test

Add complex type undefined notification for js

Now it looks like this, when type is undefined.

I suggest to enhance notification with more details.

Привести в порядок возвращаемые из функций значения в C++

Когда-то вопрос уже поднимался, но благополучно забыли об этом.

Метод serialize в случае успеха возвращает > 0, в случае фейла - 0. В то время, как parse и get message info возвращают -1 для ошибки и 0 для успеха. Сделать бы одинаково.
Вроде как size_t и int не самый хороший способ возвращать значения. Я бы переделал на типы со строгим размером типа uint32_t, int32_t.

JS messgen parser fails silently with improperly ordered complex message IDs

In JS messgen parser, if the messages listed in a way that the ID of a complex message is less than the ID of one of its internal messages, the parser does not work and does not fire any error. Example:

bottle.yaml:

id: 67
fields:
  - { name: name,   type: string }
  - { name: liquid, type: liquid }

liquid.yaml:

id: 68
descr: "I am inside the bottle"
fields:
  - { name: density,      type: uint64 }
  - { name: is_coca_cola, type: uint8 }

Messgen would generate JS files without any warnings and successfully parse every message. However, when message "bottle" is received, it will have the following properties:

offset: NaN
isComplex: false (which is wrong)

Исправлены пути вложенных типов
Исправлена очередность генерации сообщений использующих сложенные типы