cddl-codegen is a library to generate Rust, WASM and JSON code from CDDL specifications
See docs here
Codegen serialization logic for CBOR automatically from a CDDL specification
License: MIT License
cddl-codegen is a library to generate Rust, WASM and JSON code from CDDL specifications
See docs here
Currently in parse_control_operator
we don't support explicit ranges as defined in https://www.rfc-editor.org/rfc/rfc8610#section-2.2.2.1
Note that this does have some overlap with .size
which we already support: https://www.rfc-editor.org/rfc/rfc8610#section-3.8.1 since we have RangeOp
already used inside .size
However, currently it fails with 'should not expose Fixed type in member, only needed for serializaiton: Fixed(Uint(0))'
which sounds like this is blocked on #43
It was added to the cddl crate in version 0.9 as well
https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-control-01#section-3
It was added to the cddl crate in version 0.9 as well
https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-control-01#section-2.2
Suppose you have the following CDDL
protocolMagic = u32
(I think this happens on anything that's a primitive?)
type ProtocolMagic = u32;
#[derive(Clone, Debug, Eq, Ord, PartialEq, PartialOrd, serde::Serialize, serde::Deserialize, JsonSchema)]
pub struct ProtocolMagic(pub u32);
EDIT: my bad, this was my mistake instead of a cddl-codegen mistake
Comes from byron.cddl
u8 = uint .lt 256
Supporting this kind of control operator may be slightly tricky because uint is included in aliases
and therefore not even supported as a primitive
I also believe you can specify types larger than u64 in these range operators, so that might also be tricky to handle if that's the case
Additionally, maybe we should support this as part of our prelude?
For some reason, all structs are pub struct
but all the structs are just enum
(not public)
I understand that non-C type Rust enums can't be exported via WASM, but this shouldn't be an issue because these enums aren't marked with #[wasm_bindgen]
anyway
Note: this may already be solved by the codegen v2 in latest master since I'm using an old version at the moment
README.md
is quite outdated and could use an update. It hasn't been updated since before many major features such as the split wasm/rust crate structure, preserve-encodings, etc.
Low priority, but we could simplify type/group choice serialization in some cases by looking at the CBOR types and branching on that instead of just trying every variant until one works. This could potentially make things like metadata parsing (e.g. cip25) faster.
We could also write similar code for c-style enums.
Currently, it's possible for somebody to use .size
or .lt
to try and encode primitives types such as .lt 256
(aka u8
). We do have these as part of the prelude, but maybe we should also alias these types to the prelude types if we ever see them
We could even decide not to panic in the RustIdent
constructor if somebody has a type that ends up being exactly equivalent to the prelude type
In the generate_serialize
function, there is a serializer_name_overload
argument to change which variable is used to represent the serialization variable.
Despite this, there are two calls in the function that don't use this variable:
ser_loop.line("serializer.write_raw_bytes(&key_bytes)?;");
opt_block.line(&format!("None => serializer.write_special(CBORSpecial::Null),"));
Not sure if these are expected
Currently cddl-codegen
does not support .cbor
or .cborseq
I would be nice to support .cbor
since it's used in multiple places in both the Byron spec and also it's used in the Babbage spec
I think the logic for .cbor
is fairly similar -- just generate the same code as we do now but wrap the generated code in a (de)serialization of a cbor wrapper for the data.
More information: https://datatracker.ietf.org/doc/html/rfc8610#section-3.8.4
I would expect the following definition
addrdistr =
[ bootstrapEraDistr
// singleKeyDistr
]
bytes .cbor addrdistr
to generate this kind of serialization code
impl cbor_event::se::Serialize for AddrdistrEnum {
fn serialize<'se, W: Write>(&self, serializer: &'se mut Serializer<W>) -> cbor_event::Result<&'se mut Serializer<W>> {
// note: this wrapping in `se` is what was added for the `.cbor` wrapper
let mut se = Serializer::new_vec();
let inner_cbor = match self {
AddrdistrEnum::BootstrapEraDistr(x) => x.serialize(&mut se),
AddrdistrEnum::SingleKeyDistr(x) => x.serialize(&mut se),
}?;
serializer.write_bytes(&inner_cbor.finalize())
}
}
and this kind of deserialization code
impl Deserialize for AddrdistrEnum {
fn deserialize<R: BufRead + Seek>(raw: &mut Deserializer<R>) -> Result<Self, DeserializeError> {
(|| -> Result<_, DeserializeError> {
// note: this line here is what was added for the .cbor wrapper
let mut inner = &mut Deserializer::from(std::io::Cursor::new(raw.bytes()?));
let len = inner.array()?;
let mut read_len = CBORReadLen::new(len);
let initial_position = inner.as_mut_ref().seek(SeekFrom::Current(0)).unwrap();
match (|inner: &mut Deserializer<_>| -> Result<_, DeserializeError> {
Ok(BootstrapEraDistr::deserialize_as_embedded_group(inner, len)?)
})(inner)
{
Ok(variant) => return Ok(AddrdistrEnum::BootstrapEraDistr(variant)),
Err(_) => inner.as_mut_ref().seek(SeekFrom::Start(initial_position)).unwrap(),
};
match (|inner: &mut Deserializer<_>| -> Result<_, DeserializeError> {
Ok(SingleKeyDistr::deserialize_as_embedded_group(inner, len)?)
})(inner)
{
Ok(variant) => return Ok(AddrdistrEnum::SingleKeyDistr(variant)),
Err(_) => inner.as_mut_ref().seek(SeekFrom::Start(initial_position)).unwrap(),
};
match len {
cbor_event::Len::Len(_) => read_len.finish()?,
cbor_event::Len::Indefinite => match inner.special()? {
CBORSpecial::Break => read_len.finish()?,
_ => return Err(DeserializeFailure::EndingBreakMissing.into()),
},
}
Err(DeserializeError::new("AddrdistrEnum", DeserializeFailure::NoVariantMatched.into()))
})().map_err(|e| e.annotate("AddrdistrEnum"))
}
}
I think there are three main ways to implement this:
pub struct Outer(Inner);
. Keeping track of this is kind of complicated when you also think about having to support structs and enums - notably because handling structs would mean we have to keep track of whether or not something is cbor-in-cbor inside the intermediate representation. We can't have it in RustField
since it's possibly only once choice of a field is cbor-in-cbor. We can't put it in anything that relies solely on Type2 because the operator lives in Type1.pub struct Outer(Vec<u8>)
, but in the (de)serialization logic we properly add to cbor-in-cbor logic. Additionally, we can provide From<...>
implementations to convert the bytes to the inner typeCbor<Inner>
and then make the struct pub struct Outer(Cbor<Inner>)
. This way we can keep everything in Outer
just working with bytes and put all the cbor-in-cbor logic inside the Cbor<..>
wrapper logic and have some generic From<...>
implementation. This is the simplest solution I think to cover all cases, but the UX is kind of ugly. Also, this solution doesn't work for type aliases that map to primitives (ex: Vec<u8>
) as they don't implement (de)serialize traits (and there isn't a good workaround for that because of #44). Ignoring that issue, the best place for this in the intermediate representation would probably to change the new_type
function to either add extra information to the Alias case or create a new CborInCbor case.Also, these approaches are much harder on anonymous types so it's probably best to only support foo .cbor ident
instead of allowing anonymous types where ident is
Lastly, I also didn't check the relation between this feature and `#6.24
Not sure about cbor support for these types, but these two primitives exist in Rust
We may want to consider 256 & 512 even though they aren't real Rust types as well since they are common in cryptography
CIP25 for example uses CDDL to specify a set of keys that can be present, but allows people to extend the base specification with whichever key they need. This doesn't work with the current generated code as a UnknownKey
error will be thrown. There should be a way to avoid this error either by
* any => any
as a key in the map)If we have to do (2), it may be that we finally have to implement some DSL that parses comment blocks to get information about how to generate the code for a specific type (which would help with #44 anyway)
Here is JS code that reproduces the error, but you could easily write a Rust test for the same thing
const metadata = wasm.Metadata.from_bytes(
Buffer.from("a365636f6c6f72672345433937423665696d616765783a697066733a2f2f697066732f516d557662463273694846474752745a357a613156774e51387934396262746a6d59664659686745383968437132646e616d656a426572727920416c6261", "hex")
);
console.log(metadata);
Currently we only support a single cddl file as input. It would be nice to support multiple CDDL files as input.
Note this depends on #93
I'm not sure if there is a good way to codegen this, but the current codegen logic will not explicitly generate serde::Serialize
& serde::de::Deserialize
traits for types even if they are used as keys in maps which causes a crash on to_json calls
For example, see dcSpark/cardano-multiplatform-lib#115
Currently, when a required field is missing we generate code that looks like this
let key_0 = match key_0 {
Some(x) => x,
None => return Err(DeserializeFailure::MandatoryFieldMissing(Key::Uint(0)).into()),
};
I believe for .default
, all the code is generated exactly as it is now except we change this error to instead return the default value.
Additionally, on serialization, if the field is equal to the default value we just omit it from the cbor notation
This is used in one place in the byron CDDL spec (although byron.cddl
does not mention it)
As of #103 we have resolved dependency ordering (#93), but this, as before, still leads to recursive structures potentially being directly stack-allocated within the rust structs generated. This is not allowed in rust (since undefined sizes), so we've previously had to go in and insert some Box
's in some spots (e.g. where they weren't already heap-allocated e.g. in a Vec
or BTreeMap
).
This is mostly here to document this limitation, but it's possible we could figure out where to insert these Box
s in a reasonable manner at code-gen time to avoid having to hand-edit it later.
Currently in parsing.rs
we have two places where we use apply_type_aliases
because we don't order definitions via a dependency graph
It would be nice to remove this, but this depends on #93
Given the CDDL
bytes .size 64
bytes: [u8; 64]
bytes: Vec<u8>
foo = [
single: [uint]
]
is treated as:
foo = [
single: [* uint]
]
As per the CDDL RFC:
If no occurrence indicator is specified, the group entry is to occur
exactly once (as if 1*1 were specified)
You can work around this by doing:
single_uint = [uint]
foo = [
single: single_uint
]
but it's not ideal. Ideally the fix would make this totally invisible to end-users and have Foo::single
be of type uint
directly with the array part just being a serialization detail.
From byron.cddl, u64 = uint
results in
assertion failed: cddl_ident.0 == \"int\" || super::cddl_prelude(&cddl_ident.0).is_some() ||\n super::is_identifier_user_defined(&cddl_ident.0)', src/intermediate.rs:517:13
Currently, cddl-codegen handles a prelude for types such as u32
That means that if you have a cddl file such as
foo = u32
it will generate the right code the native Rust u32 type used.
However, if you use u32 in a more complex struct, the upstream cddl crate will throw
Example:
table_arr_members = {
arr: u32,
}
Will throw in the upstream cddl crate with missing definition for rule u32
We can try and avoid the issue by adding a way to inject the prelude into the upstream CDDL crate via either visited_rule_idents
or in_standard_prelude
. This approach doesn't work well with composing CDDL specs used in cddl-codegen with other CDDL tools that may not support a way to set a custom prelude though such as tools to generate examples, syntax highlighting, validation, etc.
Or, we could integrate something like #72 and then simply delete our prelude logic
#50 adds support for the .cbor
notation
It does so by having foo = bytes .cbor bar
simply make foo be an alias for bar at the type level and handling the cbor-in-cbor encoding in the (de)serialization logic.
This works fine when embedded as part of a larger object, but when the type is used standalone, it just leaves you with type foo = bar
with no (de)serialization wrapping a-la #88
I also noticed the same issue happens with TaggedData
(foo = #6.24(bytes)
generates the wrong code when standalone)
I think probably the best way to fix this is to extend the upstream cddl library to give it parent pointers for all the AST types. Basically we would add a new feature flag that, when enabled, adds a parent member variable that gets filled in on a 2nd pass of the AST using the visitor utilities in the library
Then, from our library, we can use these parent pointers in parsing.rs to know if a type is top-level (needs to be wrapped) or if it's embedded (use current technique)
Having this kind of parent pointer would also improve the logic implemented in #84 as we could:
@name
definition existsNote that we already support the .cbor operator
According to the CBOR RFC, if strict mode is not being followed, duplicate keys in CBOR maps are allowed. cddl-codegen only supports strict mode and can't handle duplicate keys. It could be of use to some users to be able to support this, especially on a per-map basis to avoid having an ugly API for areas where duplicate keys are not allowed.
The CDDL:
babbage_tx_out = {
0 : address
, 1 : value
, ? 2 : datum_option
, ? 3 : script_ref
}
script_ref = #6.24(bytes .cbor script)
is generating as:
pub type ScriptRef = Vec<u8>;
as if the .cbor
is ignored in this specific case. It should be generating as a Script
instance (serialized as bytes then tagged), or as a newtype (after #112)
Since we have a unit test that is like this but untagged it's likely the tag that's messing things up.
From byron.cddl
block = [0, ebblock]
/ [1, mainblock]
Building the project from git, with Rust versions from 1.56 to nightly, resutls in build failures:
error[E0308]: mismatched types
--> /home/chrysn/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.6.7/src/atof/algorithm/bhcomp.rs:62:24
|
62 | let bytes = bits / Limb::BITS;
| ^^^^^^^^^^ expected `usize`, found `u32`
error[E0277]: cannot divide `usize` by `u32`
--> /home/chrysn/.cargo/registry/src/github.com-1ecc6299db9ec823/lexical-core-0.6.7/src/atof/algorithm/bhcomp.rs:62:22
|
62 | let bytes = bits / Limb::BITS;
| ^ no implementation for `usize / u32`
|
= help: the trait `Div<u32>` is not implemented for `usize`
A blanket cargo update
resolves these but makes other errors show.
Fixing the cddl
crate version to 0.5.2
fixes one of them, with one around BINARY_WRAPPERS
being left over that appears to be due to only partially committed changes internal to this code.
CBOR has support for unordered maps.
This is actually problematic for us because it means that it's possible that Type.from_bytes(a).to_bytes() !== a
which is not-obvious and can lead to strange bug where deserializing and then re-serializing a transaction gets rejected by a node (because signatures depend on the byte order of what you sign)
We can use a crate called linked_hash_map
to maintain a map interface while still providing guarantee on serialization / deserialization order.
We can use this as the table type for CDDL codegen, but this problem also exists for other types defined in shelley.cddl which aren't table types (map types instead) such as TransactionBody
The problem for fixed-key maps is that we represent them in Rust as plain structs with no information about the deserialization order used to parse them.
transaction_body =
{ 0 : [transaction_input]
, 1 : [* transaction_output]
, 2 : coin ; fee
, 3 : uint ; ttl
, ? 4 : [* certificate]
, ? 5 : withdrawals
, ? 6 : update
, ? 7 : metadata_hash
}
As of the fixes in #53 babbage.cddl
now runs through codegen, but there are various compile errors still existing relating to the above types:
u64
, found i128
running against babbage.cddl
we get:
| for (key_bytes, key, value) in key_order {
| ^^^^^^^^^ doesn't have a size known at compile-time
in one spot (TransactionBody::withdrawals). All other of those key_order iterations compile fine.
example:
pool_params = ( operator: pool_keyhash
, vrf_keyhash: vrf_keyhash
, pledge: coin
, cost: coin
, margin: unit_interval
, reward_account: reward_account
, pool_owners: set<addr_keyhash>
, relays: [* relay]
, pool_metadata: pool_metadata / null
)
coin = uint
will cause the coin: coin
in pool_params
to be treated as RustType::Rust(RustIdent("Coin"))
instead of RustType::Alias(AliasIdent::Rust(RustIdent("Coin")), RustType::Primitive(Primitive::U64))
which results in to/from wasm generating incorrect code as it thinks it's a rust struct not a primitive. This is a problem as it will try and take it by reference (important since &u64
params are compile errors in wasm_bindgen) and will clone it excessively (less important).
This would likely be fixed by #93
Currently the output of cddl-codegen is not done in some deterministic way like alphabetical order. This means that a small change in the cddl spec and lead to a large diff in the generated code, making it hard to compare results.
We should define some ordering of the output to avoid this. This seems to be a limitation of https://github.com/carllerche/codegen and they suggest using rustfmt for more formatting rules
It was added to the cddl crate in version 0.9 as well
https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-control-01#section-2.1
For example in [a]
or [* a]
it seems only for generics we ignore the *
and always end up with a single element array. In the non-generic case this is handled properly (I'm not sure which PR fixed this because it used to be an issue for non-generics too with it always interpreting [a] as [* a]
, i.e. the opposite problem as now)
Consider the following:
single_field_struct = [uint]
uint_array = [*uint]
generic_single_field_struct<a> = [a]
generic_array<a> = [*a]
occurence_test = [
single_field_struct,
uint_array,
gen_single_field: generic_single_field_struct<uint>,
gen_array: generic_array<uint>,
]
the fields single_field_struct
and uint_array
correctly generate as a single-uint
struct and a Vec<uint>
respectively, but the last two fields both generate as a single uint
struct even though gen_array
should be handled as a Vec<uint>
here.
We should always respect the occurrence operators to decide what kind of array (expandable vs fixed single element) in all cases.
We need to standardize where to declare map/array wasm wrappers when they're implicit e.g.:
foo = {
implicit_arr: [* bar]
implicit_map: { * abc => xyz }
}
Currently it's fine when there's only one file they could exist in but this could potentially not be the case.
Some first possibilities are:
However for maps 2) is problematic as there are potentially two types and for both both 1)/2) are problematic as there could be two different areas using the same type implicitly potentially.
We could potentially either:
TransactionBodyList
in lib.rs
when it's only used in block.rs
and TransactionBody
is in transaction.rs
?)The int
type needs some work done on it. it is hard-coded right now but we need it to have different traits implemented depending on feature flags passed. There is also no wasm wrapper around it and it does not preserve its encodings. I'm partway through transitioning it to be codegenerated instead of static and implementing those things.
Not sure about floating point spec in CBOR vs floating point implementation in Rust. Additionally, I'm not sure if we support float types at all in the first place such as in FixedValue
(and I would recommend against using them)
Supposed the following CDDL
bootstrapEraDistr = 1
#[wasm_bindgen]
#[derive(Clone, Debug, Eq, Ord, PartialEq, PartialOrd)]
pub struct BootstrapEraDistr {
}
#[wasm_bindgen]
impl BootstrapEraDistr {
pub fn new() -> Self {
Self {
}
}
}
impl Deserialize for BootstrapEraDistr {
fn deserialize<R: BufRead + Seek>(raw: &mut Deserializer<R>) -> Result<Self, DeserializeError> {
(|| -> Result<_, DeserializeError> {
let len = raw.array()?;
let mut read_len = CBORReadLen::new(len);
read_len.read_elems(1)?;
let ret = Self::deserialize_as_embedded_group(raw, len);
match len {
cbor_event::Len::Len(_) => (),
cbor_event::Len::Indefinite => match raw.special()? {
CBORSpecial::Break => (),
_ => return Err(DeserializeFailure::EndingBreakMissing.into()),
},
}
ret
})().map_err(|e| e.annotate("BootstrapEraDistr"))
}
}
impl DeserializeEmbeddedGroup for BootstrapEraDistr {
fn deserialize_as_embedded_group<R: BufRead + Seek>(raw: &mut Deserializer<R>, len: cbor_event::Len) -> Result<Self, DeserializeError> {
(|| -> Result<_, DeserializeError> {
let index_0_value = raw.unsigned_integer()?;
if index_0_value != 1 {
return Err(DeserializeFailure::FixedValueMismatch{ found: Key::Uint(index_0_value), expected: Key::Uint(1) }.into());
}
Ok(())
})().map_err(|e| e.annotate("index_0"))?;
Ok(BootstrapEraDistr {
})
}
}
should not expose Fixed type in member, only needed for serializaiton: Fixed(Uint(1))
I think at some point we had some logic to fast-fail on reserved keywords, but we seem to have missed string
as a reserved keyword.
We should disallow things that are reserved keywords in
String for example ends up generating code like impl TryFrom<String> for String {
which is invalid
Possibly related to this comment: // TODO: do we need to cover any other (rust) reserved keywords?
Some types such as addresses require a custom to/from json that converts them to base58 or bech32. It would be nice if there was a way to get the codegen to avoid adding the automatic to/from json if specified
We blindly import various things in some places for simplicity's sake in the code generation.
Some have been fixed in #137 but others remain:
[] prelude::*
in lib.rs
[] crate::cbor_encodings::*
in serialize.rs
[] super::*
in serialize.rs
[] super::*
in cbor_encodings.rs
[] OrderedHashMap
/ BTreeMap
in every struct file, regardless of if needed
Running on babbage.cddl we get some compile errors in the output code:
outer_len_encoding
a typeT
declared on the enum Option
Currently, we parse rules as they come doing multiple passes. It would be nice if we instead generated a dependency graph of rules and then parsed rules based on this graph
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.