Comments (4)
We have an existing issue (#45) for the alignment, which appears to differ between gcc and clang for size > 16 bytes,
Ooops, thanks for dig out this issue which created years ago...I think we should spend some time on standardize that.
but I noticed gcc isn't even compatible with itself (compiler explorer link): for vector size = 2*XLEN, the vector is passed in memory if vectorization is enabled, in integer registers otherwise.
Yeah, that's kind of known issue, --param riscv-autovec-preference=fixed-vlmax
is an ABI incompatible option, that should be document at least.
Do we want to pass fixed-size vectors in vector registers if an appropriate vector calling convention is in use? (This would have been a comment on #389 without the above issue.) This would substantially complicate the compatibility story, since the vector calling convention could no longer be treated as a strict superset of the non-vector calling convention, and we may be able to get most of the benefit using module-internal fastcc-type optimizations.
Has some off-list discussion with @lhtin, and let me dump some of our discussion here:
Short answer in my heart is: yes, we should consider pass fixed-size vector in vector register.
However there is really complicate compatibility issue between zvl32b, zvl64b and zvl128b...
NOTE: I didn't use zve32* or zve64* here since those zve* could still combine with zvl128b, and issues describe bellow will be gone, so I use zvl32b and zvl64b would be more precise.
Let me try describe this by two different options: 1) better compatibility, 2) better performance/usability.
- better compatibility
If we consider the compatibility among zvl32b
, zvl64b
and zvl128b
, then the we must consider the possible smallest vector, so...pass 32 bits fixed size vector in single vector register, pass 64 bits fixed size vector in two vector registers and pass 128 bits fixed size vector in four vector registers.
That's would be bad design because we can expect linux class RISC-V cpu will having v
ext. which have zvl128b
, and then this design is waste most of vector register space.
But this the way if we don't want to define multiple ABI/calling convention variant for zvl32b
, zvl64b
and zvl128b
.
- better performance/usability
v
ext require zvl128b which means vector register is at least 128 bits, so the most intuitive design is pass fixed size vector in single vector register (or m1/LMUL=1 in RVV term) if length is less or equal to 128 and pass 129~256 bit in two vector registers and so on until 1024-bits LMUL=8.
However this design can't be apply on zvl32b
and zvl64b
, will cause compatibility isssue.
So...here is a aggressive idea is we could design a calling convention with argument:
e.g. void __attribute__ ((riscv_vector_cc(vls-vlen=128))) f (int32x4_t)
to declare an function with vector ABI and pass 128-bit in vector register like option 2
mentioned above.
And then default vls-vlen=128
, so void __attribute__ ((riscv_vector_cc)) f (int32x4_t)
will pass int32x4_t in vector register, so for most user, they don't need to specify the vls-vlen=
in the attribute.
How about zvl32b
and zvl64b
? user must specify the vls-vlen in attribute, or having an option -mdefault-vector-abi-vls-len=[32|64]
.
This design also come with one more advantage is user can pass 256 bit fixed size vector if they want to optimize program.
Or last alternative is we don't do anything on the psABI land, just let compiler use their module-internal fastcc.
from riscv-elf-psabi-doc.
I think I agree that this needs to be parameterized and controlled by (ABI perspective) language-specific mechanisms (riscv-c-api-doc perspective) some combination of GNU attributes, explicitly ABI-affecting compiler options, and implementation-dependent fastcc mechanisms.
We have three options to choose from (or for the compiler to choose from for fastcc) on a per-function basis:
- Pass in ceil(N/XLEN) integer registers, for N <= 2*XLEN, in memory otherwise. Efficient for naturally XLEN-aligned integer data, or if the P extension is present; otherwise, the argument registers need either unpack steps or a series of vector slides (possibly with a different SEW than the real computation) before use.
- Pass in ceil(N/MINVLEN) vector registers, for N <= 8*MINVLEN and MINVLEN a parameter of the function's calling convention, in memory for too large N. If the runtime VLEN is greater than MINVLEN the actual data will be present in the low-numbered vector registers per the normal rules for vector register groups. This is a calling convention parameter only; it is separate from the VLEN>=X or VLEN=X requirements that may be imposed by function code. Efficient if VLEN = MINVLEN or if the hardware implements fast operations for vl <= maxvl/2.
- Always pass in memory. Supports all vector lengths and element sizes with roughly equal efficiency.
Functions using option 2 should probably have call-saved registers under the same rules as eventually adopted for vector types.
Should the default behavior be 1 or 3? If we treat the behavior of gcc without --param riscv-autovec-preference=fixed-vlmax
as the de facto ABI, it has to be 1.
The attribute name should express the fact that it is specific to fixed-size vectors. I am thinking something like riscv_fixed_vector_cc(xregs)
, riscv_fixed_vector_cc(memory)
, riscv_fixed_vector_cc(vregs(MINVLEN))
, with VLEN defaulting to 128. riscv_fixed_vector_cc(vregs)
is still a bit of a mouthful, can we shorten it without creating an ambiguity with the scalable vector calling convention?
(Besides the ratification of C23, what else needs to happen before we can start talking about [[riscv::fixed_vector_cc(vregs)]]
?)
Maybe, there is an argument for defining riscv_vector_cc
as primarily enabling call-saved vector registers, and affecting the fixed vector calling convention as a side effect.
Do you have a sense of the amount of new code being written using fixed-size vectors for RISC-V? If the major use case is legacy code using portable fixed-size vectors or a RISC-V implementation of the SSE / NEON intrinsics, then it would make sense to focus more on fastcc support than defining the attributes. The default / externally visible calling convention needs to be defined in any case.
from riscv-elf-psabi-doc.
This was raised in the context of Rust support for the V extension. The specific concern is in the context of a program compiled without the V extension enabled, but where certain functions are marked #[target_feature(enable = "v")]
. This could potentially lead to different functions disagreeing on how to pass fixed-length vectors as arguments.
If the default calling convention allows passing fixed-length vectors in vector registers, then this really should be a separate -mabi
variant. After all, defining the calling convention is the entire point of -mabi
. Alternatively, a separate opt-in calling convention (such as "vectorcall" on x86) could be used to opt-in to passing fixed-length vectors in vector registers.
This is not a concern for scalable vectors since, unlike fixed-length vectors, no values of this type can be instantiated without the V extension.
from riscv-elf-psabi-doc.
For the base calling convention part:
#406
Vector calling convention will be separated PR and create later.
from riscv-elf-psabi-doc.
Related Issues (20)
- Specify relocation overflow checks HOT 1
- Specify a platform reserved register HOT 20
- Should calling convention also define ptrdiff_t? HOT 1
- Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 HOT 3
- Operation semantics of __bf16 datatype HOT 2
- Deprecate R_RISCV_RVC_LUI? HOT 4
- Define GOT-Relative data relocation HOT 8
- Embedding R_RISCV_RELAX to another relocations HOT 7
- Define gp(x3) as global VLENB HOT 4
- Bitfield integer calling convention garbled
- Calling convention uses RV64GQ without definition or reference HOT 3
- Calling convention description of va_list et al. are unclear HOT 2
- Interpretation of floating-point types
- Linux ABI for Pointer Masking HOT 19
- New ABI for stack layout and frame pointer scheme HOT 4
- Change branch from "master" to "main" HOT 1
- Add CREL support HOT 3
- Add RELR support HOT 6
- Question about medlow's single 2 GiB address range HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from riscv-elf-psabi-doc.