I've been working on disassembling Unity3D archives, and it turns out that they freque

I agree. So, here's the idea: We don't actually need any "post

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Alignment options,about kaitai-io/kaitai_struct

Comments (25)

GreyCat commented on May 18, 2024 7

Um, what is the chunk_size? Actually, right now you can do alignment manually by using _io.pos, something like:

- id: padding
  size: (4 - _io.pos) % 4

That is:

if _io.pos is 0, 4, 8, etc, you'll skip 0 bytes
if _io.pos is 1, 5, 9, you'll skip 3 bytes.
if _io.pos is 2, 6, 10, you'll skip 2 bytes.
if _io.pos is 3, 7, 11, you'll skip 1 byte.

One can use other IO objects, i.e. _root._io, _parent._io, etc, if needed to align to position in some other stream.

from kaitai_struct.

GreyCat commented on May 18, 2024 6

I agree. So, here's the idea:

We don't actually need any "post-attribute" alignment calls. Typically what people care about is getting this exact attribute to be aligned to some kind of an multiple.
We need to cater for bit-level alignemnt and byte-level alignment. Currently and effectively, we have "everything but bit-level constructs (i.e. type: bX) aligned to 1 byte".
We need to have both type-wide alignment default and individual seq fields alignment.

So, to summarize all that, the proposal is:

We specify alignment with align: X, where X can be:
- default, which is the current default behavior (i.e. align everything to byte boundary except stuff clearly working on bit-level)
- bit, which aligns everything to bit level
- a number, which aligns everything to this number of bytes
We can specify type-wide alignment in meta, e.g.

types:
  custom:
    - id: baz
      type: b8

meta:
  align: bit
seq:
  - id: foo
    type: b3 # spans bits 7..5 of byte 0
  - id: bar
    type: custom # spans bits 4..0 of byte 0 + bits 7..5 of byte 1

We can specify per-attribute alignment in the attribute, e.g.

seq:
  - id: foo
    type: b3
  - id: bar
    type: custom
    align: bit

or, for byte alignment, as per @mnakamura1337 original idea:

seq:
  - id: first
    type: u2
  - id: second
    type: u2
    align: 8

from kaitai_struct.

KOLANICH commented on May 18, 2024 4

@fedorg, you can use fields without ids

from kaitai_struct.

LogicAndTrick commented on May 18, 2024

I like the first option you propose, but I'd rename it to skip because it's more obvious to see what's happening:

seq:
  - id: first
    type: u2
  - skip: 8   # reads 8 bytes and throws them away
  - id: second
    type: u2

Or perhaps just support a field with no name that never gets assigned to anything (+ issue #9):

seq:
  - id: first
    type: u2
  - type: b8   # unnamed field - gets thrown away
  - id: second
    type: u2

from kaitai_struct.

GreyCat commented on May 18, 2024

Thanks for the suggestion. I've been thinking of alignment stuff too, but I haven't yet found a nice all-around solution yet.

I've thought of that obvious "sandwiching extra fields" solution, but it has 2 big flaws:

It can potentially become very bothersome to use on some real-life data structures, when you're supposed to align every field, i.e:

seq:
  - align: 8
  - id: field1
    type: some_variable_user_type
  - align: 8
  - id: field2
    type: some_variable_user_type
  - align: 8
  - id: field3
    type: some_variable_user_type
  - align: 8

Even worse, it some cases (i.e. C structs with mixed integer types, laid out by compiler), technically you have to append and prepend that align: X with different sizes, i.e.:

seq:
  - align: 2
  - id: field1
    type: u2
  - align: 2
  - align: 8
  - id: field2
    type: u8
  - align: 8
  - align: 4
  - id: field3
    type: u4
  - align: 4

It's a huge overkill.

It's not really compatible with current YAML model - it implies that every array element is a field, and thus it has an ID. If we're going to allow non-field stuff, we're readily switching to imperative code disguised as YAML, not the real declarative description. Then we'll get ourselves goto operators in place of fields, etc, etc.

I like your "extra attributes" approach, though it still have lots of questions unanswered. For example, if you want to read an array of strz strings, what would be the semantics of:

seq:
  - id: strings
    type: strz
    encoding: UTF-8
    repeat: eos
    post-align: 8

Should it do alignment skips between each of the strings or just once after the whole array?

from kaitai_struct.

GreyCat commented on May 18, 2024

@LogicAndTrick

I like the first option you propose, but I'd rename it to skip because it's more obvious to see what's happening:

That would be extremely misleading. The idea is not to skip exactly 8 bytes, not align, i.e. skip from 0 to 7 bytes to make current stream position divisible by 8. skip: 8 looks very much like just "skipping 8 bytes" directive for me.

from kaitai_struct.

LogicAndTrick commented on May 18, 2024

Oops, I was talking about simple skipping, rather than trying to automatically set up the alignment. I think it's better to keep it simple and just say "throw X number of bytes away before moving on".

from kaitai_struct.

LogicAndTrick commented on May 18, 2024

Now that I'm on the same page: couldn't you specify the width of the whole structure? Basically how C structs are laid out - alignment is a property of an object, not an individual field. Wouldn't something like this be easier to use?

struct_name:
  width: 8 # specifying the width turns on alignment behaviour
  seq:
    - id: one
      type: u4 # first four bytes - 4 left on this 'line'
    - id: two
      type: u8 # doesn't fit - skips 4 bytes before reading

from kaitai_struct.

mnakamura1337 commented on May 18, 2024

@LogicAndTrick The main problem described here is that I don't really know the value of X. Just skipping X bytes is relatively easy already:

  - id: some_fake_name
    size: 8

I don't see how easier it can get. Although I like the idea of dropping "ID is mandatory" requirement. It's not like it helps a lot, at least from the end user's POV. When doing reverse engineering stuff with yet unknown fields, I frequently find myself doing stupid mapping work like:

  - id: unknown1
    type: u4
  - id: unknown2
    type: u4
  - id: unknown3
    type: u2

etc. It's not like placing these "unknownX" names manually exactly helps anything here.

from kaitai_struct.

LogicAndTrick commented on May 18, 2024

Yeah, my bad, I think I understand the issue now :)

from kaitai_struct.

GreyCat commented on May 18, 2024

Basically how C structs are layed out - alignment is a property of an object, not an individual field. Wouldn't something like this be easier to use?

Actually, that's what @mnakamura1337 proposed as his very first idea:

align per-type directive that affects all seq fields in a given type (or maybe even its subtypes)

The question of C struct layout can be incredibly complex and it's definitely not only about "single alignment per structure". There are quite a few tricks there.

By default, integer values starts are aligned (i.e. "pre-aligned" in @mnakamura1337's terminology) by the size of the value. This can be disabled by using stuff like #pragma pack(1) for the rest of the translation or using __attribute__((__packed__)) on a struct.

One can use __attribute__((packed)) on an individual field to disable padding, i.e.:

struct Foo {
    char field1;
    uint32_t field2 __attribute__((__packed__));
    uint16_t field3;
};

would result in 11 22 22 22|22 00 33 33 in the memory.

There's also stuff like __declspec(align(4))in MSVC, which can be applied both to whole structure and at element level. When applied to the whole structure it does not align members, but it pads the overall size of the structure to be a multiple of a given number.

There's a "standard" (but not yet supported by MSVC, as far as I know) alignas specified in C++11. It can be also applied both to the structures and individual fields.

from kaitai_struct.

mnakamura1337 commented on May 18, 2024

I like your "extra attributes" approach, though it still have lots of questions unanswered. For example, if you want to read an array of strz strings, what would be the semantics of:
seq:
  - id: strings
    type: strz
    encoding: UTF-8
    repeat: eos
    post-align: 8
Should it do alignment skips between each of the strings or just once after the whole array?

It doesn't really matter - both are ok and for both versions you can work it around with extra inner types to do precisely what you want. Probably doing align padding skips between each array read would be least surprising for an average user.

from kaitai_struct.

LogicAndTrick commented on May 18, 2024

One can use __attribute__((packed)) on an individual field to disable padding

Huh, I've never seen this kind of thing before, ignore my uninformed suggestions then! The syntax looks horrible, does that kind of stuff actually get used very often?

from kaitai_struct.

GreyCat commented on May 18, 2024

You mean the C syntax or the KSY syntax proposed by @mnakamura1337?

from kaitai_struct.

winterheart commented on May 18, 2024

Hello.

I got this workaround for aligning:

size: ((chunk_size - 1) / 4 + 1) * 4

Where 4 - is alignment to 4 bytes. Still, you need to know whole size of data chunk (chunk_size). This workaround only for data analyzing, I doubt that actual generated parser will eat that.

from kaitai_struct.

winterheart commented on May 18, 2024

chunk_size is size that I got from field, but actual file structure have alignment that chunk_size don't count as real size. Let's say where chunk_size is 13, actually structure will be 16.
Your code works too, thanks.

from kaitai_struct.

fedorg commented on May 18, 2024

Is there any way I could just skip the padding so it doesn't clutter my structure? Right now my genereated objects are littered with dummy fields like padding013.

from kaitai_struct.

KOLANICH commented on May 18, 2024

I guess it's better to configure alignment in meta.

from kaitai_struct.

fireundubh commented on May 18, 2024

Oh. I was experimenting with the IDE and just realized this issue was still open. Any progress on @GreyCat's proposal? I can test.

Also, thanks to @GreyCat for the _io.pos workaround.

from kaitai_struct.

GreyCat commented on May 18, 2024

@fireundubh Unfortunately, it's not implemented yet.

from kaitai_struct.

tan-wei commented on May 18, 2024

First of all, thanks for this awesome tool. It makes me exciting all this month.
When I try to parse a protocol which is based on arbitray bits stream. It's just a simple TLV format protocol, except for all the types are bX (X is any number between 1 to 32). There aren't any delimiters or paddings in the stream.
It's just like:

  | packet1_enum | packet1_bit_length | packet1_body | packet2_enum | packet2_bit_length | packet2_body | ...

Naturaly, we will use packet_enum to switch-on type of packet_body. For all kinds of packets, user-defined type will be added in .ksy file. But it can't work in this situation because of alignment. Is there any work around solution now? Or I just look forward to the issue is closed? :-P

Thanks very much! Hope you have a nice day.

from kaitai_struct.

GreyCat commented on May 18, 2024

@tan-wei Unfortunately, I can't think of a workaround right now, and, moreover, it will probably depend on yet another not implemented feature #112 — i.e. you'll need some way to ignore arbitrary bit-sized packet bodies.

from kaitai_struct.

tan-wei commented on May 18, 2024

@GreyCat Thanks for your answer. The protocol which all types are arbitrary bits is very rare in common life. So the protocol I want to parse can not be parsed with KS according to your reply. I have read #112 jus before. If the feature #112 is implemented, I think the problem is solved.

Thanks for your instant reply. :-)

from kaitai_struct.

smarek commented on May 18, 2024

I'd want to notify this thread of #576 which is use-case where all types are arbitrary bits, and no part of the protocol is byte-aligned (specifically TETRA radio protocol).

My proposition is following: "option to turn off alignment completely"
Some and maybe most of the relevant tickets could be solved if we could turn-off the alignment completely, instead of further configuring it or having to manuallly calculate bit-offset for given field.
Allowing users/developers to rely on DSL-defined bit/byte/word size and order of fields, instead of having to tackle "smart-framework-assumptions"

from kaitai_struct.

jaroslaw-wieczorek commented on May 18, 2024

Why repeat: eos not working for one-bit fields?

use repeat: eos

meta:
  id: test
  file-extension: test
  endian: be

seq:
 - id: values
   type: b1
   repeat: eos

Return:

use repeat: expr

meta:
  id: test
  file-extension: test
  endian: be

seq:
 - id: values
   type: b1
   repeat: expr
   repeat-expr: 8

Return:

Raw data (1 byte):

from kaitai_struct.

Alignment options about kaitai_struct HOT 25 OPEN

Comments (25)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent