Code Monkey home page Code Monkey logo

Comments (8)

programmerjake avatar programmerjake commented on June 26, 2024 1

well, conveniently every type combination that is valid to type pun (so doesn't try to e.g. type pun i32x3 to i64x2 where the last i64 is half undef) already satisfies the alignment requirements if we decide the alignment is gcd(sizeof(element) * length, next_power_of_2(sizeof(element) * length), some_global_constant_limit) (compatible with what i proposed for portable-simd) because:

  • assuming some_global_constant_limit >= sizeof(element) for all possible element types (handles wanting vector alignment to not increase without bound)
  • assuming element types are power-of-2 sized which afaict is true for RVV
  • assuming sizeof(target_element) * target_length <= sizeof(source_element) * source_length aka. that the type pun is valid because the target vector's elements are completely within the source elements and not taking bytes from after the end of the source elements
  • every type pun where sizeof(target_element) <= sizeof(source_element) the alignment works out so the target vector type needs the same or less alignment than the source (always the same alignment if sizeof(target_element) * target_length == sizeof(source_element) * source_length)
  • every type pun where sizeof(target_element) > sizeof(source_element) the assumption that the type pun is valid means that the source vector's length is a multiple of sizeof(target_element) / sizeof(source_element) which is a power of two therefore the alignment again works out so the target vector type needs the same or less alignment than the source (always the same alignment if sizeof(target_element) * target_length == sizeof(source_element) * source_length)

from riscv-elf-psabi-doc.

aswaterman avatar aswaterman commented on June 26, 2024

And we have LMUL in the vector extension, we might also consider that in the alignment if needed.

Most implementations (especially in applications processors) will be designed to run LMUL=1 code efficiently, and so using greater alignment when LMUL>1 will probably not be profitable.

My gut feeling is that using the existing stack alignment (16 bytes) is the way to go, but I don't have a strong argument for this proposal. One consideration is that V requires VLEN >= 128, and VLEN=128 will be a popular choice for many apps processors, so 16 bytes seems like a natural choice.

Of course, when optimizing for a specific target, we can increase the alignment when stack-allocating vectors, without breaking the ABI. So if we picked an ABI alignment of 16 bytes, we wouldn't be screwing the VLEN=256 implementations too badly.

from riscv-elf-psabi-doc.

nick-knight avatar nick-knight commented on June 26, 2024

Although not for the faint of heart, some programmers do perform type-punning on vector registers, reinterpreting the bits with the effect of fissioning or fusing consecutive vector elements. (In fact, the V-extension intrinsics provide auxiliary intrinsics to assist.) Without mandating stricter-than-element-width alignment, when compiling a C program like the following,

// vec_u8 passed on stack, according to current calling convention
void foo (vuint8m1_t vec_u8) {
  // 8-bit arithmetic with very high vector register pressure    
  vuint64m1_t vec_u64 = vreinterpret_v_u8m1_u64m1(vec_u8); // need to reload it
  // 64-bit arithmetic
}

the compiler may need to reload vec_u8 with a vl1re8.v, which may set the wrong expectations on an implementation that internally rearranges vector data for different element widths (to reduce datapath wiring costs). On the other hand, if alignment were >= 8B, the compiler could safely generate the appropriate vl1re*.v.

I admit this is a contrived scenario. I mention this only to point out a subtlety with Kito's first option,

Alignment as the element width

from riscv-elf-psabi-doc.

kito-cheng avatar kito-cheng commented on June 26, 2024

Sounds 128 bits alignment should be most nature choose for the alignment of vector types for all different LMUL, and that's also resolve the potential issue which @nick-knight mentioned, actually I heard same issue during collecting issue from different community guys, so the issue might not be existing in synthetic benchmark/testcase I think.

Although one arguments is we could have VLEN=32 or VLEN=64, 128 bits alignment might waste some stack space for those 2 configurations, but that should be rare configuration, and even zve32* or zve64* configure still could have VLEN >= 128.

This topic will put into next psABI call :)

from riscv-elf-psabi-doc.

workingjubilee avatar workingjubilee commented on June 26, 2024

It's worth noting, since the case of VLEN of 256 or 512 was discussed, that x86-64's AVX, the closest comparison, specifies that __m256 and __m512 are aligned to 32 and 64 bytes. However, in practice, the VEX encoding of these SIMD instructions guarantees, in general, that using these instructions off-alignment is acceptable (with I believe the notable exception of vmovaps?), whereas the SSE instructions and their legacy encodings sometimes enforces 16 byte alignment. So you would not be alone in having these types have a cut-off point at 16 byte alignment, even if it is more "effective" than "by-spec" in that case.

from riscv-elf-psabi-doc.

programmerjake avatar programmerjake commented on June 26, 2024

note that rust's project-portable-simd may define vector types (not necessarily the same ones as are used for C FFI) to require alignment be small enough that the vector types have no padding (all Rust types have size that is a multiple of alignment), this allows reinterpreting pointers to aligned portions of any valid array slice as pointers to vector types (e.g. &[u32] -> &[Simd<u32, 7>]): rust-lang/portable-simd#319

from riscv-elf-psabi-doc.

workingjubilee avatar workingjubilee commented on June 26, 2024

That's not unambiguously decided, however, or at least, I believe the point regarding preferred type punning alignment is a good one.

from riscv-elf-psabi-doc.

kito-cheng avatar kito-cheng commented on June 26, 2024

Further discussion move to here: #380 :)

from riscv-elf-psabi-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.