Code Monkey home page Code Monkey logo

Comments (4)

kito-cheng avatar kito-cheng commented on July 24, 2024

We have an existing issue (#45) for the alignment, which appears to differ between gcc and clang for size > 16 bytes,

Ooops, thanks for dig out this issue which created years ago...I think we should spend some time on standardize that.

but I noticed gcc isn't even compatible with itself (compiler explorer link): for vector size = 2*XLEN, the vector is passed in memory if vectorization is enabled, in integer registers otherwise.

Yeah, that's kind of known issue, --param riscv-autovec-preference=fixed-vlmax is an ABI incompatible option, that should be document at least.

Do we want to pass fixed-size vectors in vector registers if an appropriate vector calling convention is in use? (This would have been a comment on #389 without the above issue.) This would substantially complicate the compatibility story, since the vector calling convention could no longer be treated as a strict superset of the non-vector calling convention, and we may be able to get most of the benefit using module-internal fastcc-type optimizations.

Has some off-list discussion with @lhtin, and let me dump some of our discussion here:

Short answer in my heart is: yes, we should consider pass fixed-size vector in vector register.

However there is really complicate compatibility issue between zvl32b, zvl64b and zvl128b...

NOTE: I didn't use zve32* or zve64* here since those zve* could still combine with zvl128b, and issues describe bellow will be gone, so I use zvl32b and zvl64b would be more precise.

Let me try describe this by two different options: 1) better compatibility, 2) better performance/usability.

  1. better compatibility

If we consider the compatibility among zvl32b, zvl64b and zvl128b, then the we must consider the possible smallest vector, so...pass 32 bits fixed size vector in single vector register, pass 64 bits fixed size vector in two vector registers and pass 128 bits fixed size vector in four vector registers.

That's would be bad design because we can expect linux class RISC-V cpu will having v ext. which have zvl128b, and then this design is waste most of vector register space.

But this the way if we don't want to define multiple ABI/calling convention variant for zvl32b, zvl64b and zvl128b.

  1. better performance/usability

v ext require zvl128b which means vector register is at least 128 bits, so the most intuitive design is pass fixed size vector in single vector register (or m1/LMUL=1 in RVV term) if length is less or equal to 128 and pass 129~256 bit in two vector registers and so on until 1024-bits LMUL=8.

However this design can't be apply on zvl32b and zvl64b, will cause compatibility isssue.


So...here is a aggressive idea is we could design a calling convention with argument:

e.g. void __attribute__ ((riscv_vector_cc(vls-vlen=128))) f (int32x4_t) to declare an function with vector ABI and pass 128-bit in vector register like option 2 mentioned above.

And then default vls-vlen=128, so void __attribute__ ((riscv_vector_cc)) f (int32x4_t) will pass int32x4_t in vector register, so for most user, they don't need to specify the vls-vlen= in the attribute.

How about zvl32b and zvl64b? user must specify the vls-vlen in attribute, or having an option -mdefault-vector-abi-vls-len=[32|64].

This design also come with one more advantage is user can pass 256 bit fixed size vector if they want to optimize program.


Or last alternative is we don't do anything on the psABI land, just let compiler use their module-internal fastcc.

from riscv-elf-psabi-doc.

sorear avatar sorear commented on July 24, 2024

I think I agree that this needs to be parameterized and controlled by (ABI perspective) language-specific mechanisms (riscv-c-api-doc perspective) some combination of GNU attributes, explicitly ABI-affecting compiler options, and implementation-dependent fastcc mechanisms.

We have three options to choose from (or for the compiler to choose from for fastcc) on a per-function basis:

  1. Pass in ceil(N/XLEN) integer registers, for N <= 2*XLEN, in memory otherwise. Efficient for naturally XLEN-aligned integer data, or if the P extension is present; otherwise, the argument registers need either unpack steps or a series of vector slides (possibly with a different SEW than the real computation) before use.
  2. Pass in ceil(N/MINVLEN) vector registers, for N <= 8*MINVLEN and MINVLEN a parameter of the function's calling convention, in memory for too large N. If the runtime VLEN is greater than MINVLEN the actual data will be present in the low-numbered vector registers per the normal rules for vector register groups. This is a calling convention parameter only; it is separate from the VLEN>=X or VLEN=X requirements that may be imposed by function code. Efficient if VLEN = MINVLEN or if the hardware implements fast operations for vl <= maxvl/2.
  3. Always pass in memory. Supports all vector lengths and element sizes with roughly equal efficiency.

Functions using option 2 should probably have call-saved registers under the same rules as eventually adopted for vector types.

Should the default behavior be 1 or 3? If we treat the behavior of gcc without --param riscv-autovec-preference=fixed-vlmax as the de facto ABI, it has to be 1.

The attribute name should express the fact that it is specific to fixed-size vectors. I am thinking something like riscv_fixed_vector_cc(xregs), riscv_fixed_vector_cc(memory), riscv_fixed_vector_cc(vregs(MINVLEN)), with VLEN defaulting to 128. riscv_fixed_vector_cc(vregs) is still a bit of a mouthful, can we shorten it without creating an ambiguity with the scalable vector calling convention?

(Besides the ratification of C23, what else needs to happen before we can start talking about [[riscv::fixed_vector_cc(vregs)]]?)

Maybe, there is an argument for defining riscv_vector_cc as primarily enabling call-saved vector registers, and affecting the fixed vector calling convention as a side effect.

Do you have a sense of the amount of new code being written using fixed-size vectors for RISC-V? If the major use case is legacy code using portable fixed-size vectors or a RISC-V implementation of the SSE / NEON intrinsics, then it would make sense to focus more on fastcc support than defining the attributes. The default / externally visible calling convention needs to be defined in any case.

from riscv-elf-psabi-doc.

Amanieu avatar Amanieu commented on July 24, 2024

This was raised in the context of Rust support for the V extension. The specific concern is in the context of a program compiled without the V extension enabled, but where certain functions are marked #[target_feature(enable = "v")]. This could potentially lead to different functions disagreeing on how to pass fixed-length vectors as arguments.

If the default calling convention allows passing fixed-length vectors in vector registers, then this really should be a separate -mabi variant. After all, defining the calling convention is the entire point of -mabi. Alternatively, a separate opt-in calling convention (such as "vectorcall" on x86) could be used to opt-in to passing fixed-length vectors in vector registers.

This is not a concern for scalable vectors since, unlike fixed-length vectors, no values of this type can be instantiated without the V extension.

from riscv-elf-psabi-doc.

kito-cheng avatar kito-cheng commented on July 24, 2024

For the base calling convention part:
#406

Vector calling convention will be separated PR and create later.

from riscv-elf-psabi-doc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.