Code Monkey home page Code Monkey logo

messgen's Introduction

JS CI CPP CI

Messgen

Lightweight and fast message serialization library. Generates message classes/structs from yml scheme.

Features:

  • Embedded-friendly
  • Fixed size arrays
  • Dynamic size arrays
  • Nested messages
  • Messages metadata
  • Supported languages: C++, Go, JavaScript

Dependencies

  • python 3.X

On Linux:

sudo apt install python3

On Windows 10:

  1. Download https://bootstrap.pypa.io/get-pip.py
  2. Execute python3 get_pip.py
  3. Execute pip3 install pyyaml

Generate messages

Each protocol should be placed in directory base_dir/vendor/protocol. base_dir is base directory for message definitions (is allowed to specify multiple base directories). vendor is protocol vendor, it is used as namespace in generated messages allowing to avoid conflict between protocols from different vendors if used in one application. protocol is protocol name, each protocol has protocol ID, that allows to use multiple protocols on single connection, e.g. bootloader and application protocols.

Message generator usage:

python3 generate.py -b <base_dir> -m <vendor>/<protocol> -l <lang> -o <out_dir> [-D variable=value]

For some languages it's necessary to specify some variables using -D option.

Generated messages placed in out_dir directory.

Go

Example for Go messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l go -o out/go -D messgen_go_module=example.com/path/to/messgen

Variable messgen_go_module must point to messgen Go module (port/go/messgen), to add necessary imports in generated messages.

C++

Example for C++ messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l cpp -o out/cpp

Variable metadata_json=true can be passed to generate metadata in JSON format, rather than legacy.

JS/TS

Example for JS messages generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l json -o out/json

This command will generate json messages.

The types of these messages for TS can be generated as follows:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l ts -o out/ts

if it is necessary to generate typed arrays for TS, it is necessary to pass the flag -D typed_arrays=true:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l ts -o out/ts -D typed_arrays=true

MD

Example for protocol documentation generation:

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l md -o out/md

messgen's People

Contributors

alexpantyukhin13 avatar awskii avatar drton avatar meded7 avatar meded90 avatar misterjulian avatar nameofuser1 avatar onokonem avatar pavletto avatar roman- avatar sl-ru avatar zemledelec avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

messgen's Issues

Ownership of dynamic values

Иногда полезно иметь возможность сохранить сообщение с динамическими полями. Есть предложение написать класс обертку над сообщением и сделать специализации функций сериализации и парсинга:

class Storage<T> {
    T msg;
     MemoryAllocator mem;
}

int parse<Storage<T>> (Storage<T> &msg, MessageInfo &info)  {
}

Пока конкретный интерфейс не продумывал, но в планах пофантазировать.

Несколько тонких моментов, о которых можно подумать:

  1. В таком случае в общем-то передавать аллокатор/память отдельными аргументами возможно и нет смысла, тк он будет храниться внутри Storage.
  2. Запретить вызовы parse на динамических сообщениях, если при вызове не используется Storage. Можно как-нибудь разделить два варианта вызова, чтобы не тащить лишние аргументы на сообщениях без динамических полей.
  3. Когда будет очищаться аллокатор. Сейчас он конструируется при вызове parse и стирается после выхода, в итоге мануально его сбрасывать не нужно. Возможно в случае со Storage получится обойтись без создания объекта каждый раз и без мануального контроля. Из идей при вызове метода parse на Storage, вызывать функцию reset, которая будет сбрасывать аллокатор.

@ygorshkov @DrTon что думаете? Может есть другие пути?

Remove `size` from messgen header

The 4-byte size field in messages header is redundant and could be eliminated.
For messages without dynamic fields, size is known in beforehand.
For dynamic messages, the minimum static size is also known in beforehand, and the size of every dynamic filed is contained within the message payload.

Dump markdown formatting

  • Make dump generation optional, enabling by command line flag
  • Make possible generation of docs only, without messages
  • Format dump.txt as markdown
  • In generated dump add list of used IDs with message names

Changing messages comments/descriptions affects protocol version

Here is what protocol version calculator should take into account:

  • Msgs IDs and names
  • Order of msgs fields and their types
  • Fields names (e.g. changing single field name from "uint32 capacity_liters" to "uint32 capacity_ml" must change the hash)

Fields comments and message descriptions must not affect protocol version

Удобная работа со строками в C

Сейчас со строками работать неудобно. Нужно подумать над интерфейсом для работы с ними.

Есть два пути:

  1. Сделать string альясом для массива байт в коде, а при парсинге дополнительно проверять терминатор. Тогда можно будет писать так:
const auto * str = static_cast<char *>(msg.my_string.ptr)
  1. Сделать string полноценно отдельным типом и обрабатывать его тоже отдельно. Это вроде бы правильнее, но как его аккуратно вписать придется подумать, он вроде бы и не plain, и не встроенная структура, а в то же время и динамический.

Еще из предложений всегда передавать нул терминированную строку. Если нет терминатора, то на этапе генеренного парсинга возвращать ошибку.

Can't parse arrays which total size exceeds 65535 bytes

Parser.h accepts the len argument as uint32_t, but then it passes it to detail::Parser<T>::parse which accepts it as uint16_t:

static int parse(const uint8_t *buf, uint32_t len, MemoryAllocator & allocator, T& value) {

static int parse(const uint8_t* buf, uint16_t len, MemoryAllocator& allocator, Dynamic<T, false>& dynamic) {

This implicit type conversion uint32_t->uint16_t causes the bug that's not documented and is very hard to spot without deep debugging.

Generate PROTOCOL_VERSION

For each generator, add PROTOCOL_VERSION calculator. This will make it easy to compare the protocol versions between the two peers.

Workaround

Here's a workaround we use now in our project:

  • the .yaml files are in the separate repo. Backend and frontend use this repo as submodule
  • upon build, hash of last commit in the submodule is considered a protocol version
  • on the first handshake request, Backend sends its protocol version to Frontend
  • Frontend checks the hash of the last commit of its submodule and compares the two

Workaround limitations

  • it is limited to one messgen protocol per submodule. If multiple protocols share a repo, they also share the last commit hash
  • unrelated changes in submodule (e.g. change README) also change the protocol version
  • when you make changes in your .yaml files without commiting them, the protocol is considered unchanged

Suggested approach

For each protocol, its version must be calculated as md5 of all .yaml files in protocol directory. Changing any .yaml file will change protocol md5. Using the truncated md5 (e.g. first 10 chars only) should also be fine.

For cpp generator, here's an example of what it should genarate in messages.h:

struct ProtoInfo {
    static constexpr uint8_t ID = 1;
    static constexpr uint32_t MAX_MESSAGE_SIZE = 24;
    static constexpr const char* VERSION = "i2e8nb0a";
};

C++ can't parse empty message

message::parse method returns number of parsed bytes. This leads to 0 return when message is empty thus messgen::parse function return -1 and we are in trouble. message::parse should return signed integer. This is strongly related to #11.

Obscure error messages when dir/proto not found

When you execute this command

python3 generate.py -b ./base_dir -m my_vendor/my_protocol -l cpp -o out/cpp

and some dir not found (base_dir/my_vendor/my_protocol), this is the output:

Traceback (most recent call last):
  File "/tmp/test/messgen/generate.py", line 148, in <module>
    main()
  File "/tmp/test/messgen/generate.py", line 132, in main
    data_types_map = data_types_preprocessor.create_types_map(modules_map)
  File "/tmp/test/messgen/messgen/data_types_preprocessor.py", line 60, in create_types_map
    self.__create_lookup_messages_set(modules_map)
  File "/tmp/test/messgen/messgen/data_types_preprocessor.py", line 85, in __create_lookup_messages_set
    if module["proto_id"] >= self.MAX_PROTO_ID:
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

This traceback does not reveal any information regarding what may be the cause of the problem.

Furthermore, it will be nice to also output a success message if the generation is completed successfully.

Привести в порядок возвращаемые из функций значения в C++

Когда-то вопрос уже поднимался, но благополучно забыли об этом.

  1. Метод serialize в случае успеха возвращает > 0, в случае фейла - 0. В то время, как parse и get message info возвращают -1 для ошибки и 0 для успеха. Сделать бы одинаково.
  2. Вроде как size_t и int не самый хороший способ возвращать значения. Я бы переделал на типы со строгим размером типа uint32_t, int32_t.

JS messgen parser fails silently with improperly ordered complex message IDs

In JS messgen parser, if the messages listed in a way that the ID of a complex message is less than the ID of one of its internal messages, the parser does not work and does not fire any error. Example:

bottle.yaml:

id: 67
fields:
  - { name: name,   type: string }
  - { name: liquid, type: liquid }

liquid.yaml:

id: 68
descr: "I am inside the bottle"
fields:
  - { name: density,      type: uint64 }
  - { name: is_coca_cola, type: uint8 }

Messgen would generate JS files without any warnings and successfully parse every message. However, when message "bottle" is received, it will have the following properties:

offset: NaN
isComplex: false (which is wrong)

YAML formatter

Need to write script that parses all yaml's from specified diretory and reformats in beautiful and always the same way.

Fix embedded types in javascript

На текущий момент неправильно генерируются сообщения с использованием вложенных типов.

DoD:

  1. Исправлены пути вложенных типов
  2. Исправлена очередность генерации сообщений использующих сложенные типы

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.