riscv-non-isa / riscv-elf-psabi-doc Goto Github PK

View Code? Open in Web Editor NEW

679.0 679.0 159.0 9.37 MB

A RISC-V ELF psABI Document

Home Page: https://jira.riscv.org/browse/RVG-4

License: Creative Commons Attribution 4.0 International

Makefile 14.30% Python 85.70%

riscv-elf-psabi-doc's People

Contributors

Stargazers

Watchers

Forkers

kito-cheng michaeljclark dshorner maxnordlund pkmx nonerkao asb terudeca lu-ping shlevy abbylin0323 sebhub guiwei789 jim-wilson rbarraud yohnwang mhorne zongbox dujianqiang1981 bingoniutou suziyousif heloizamartins hellenavilarosa fintelia paul-walmsley-sifive maskray lenary gnzlbg fengjixuchui edward-jones sturem peaksong aprnath sifive obijuan s5pv210 t-j-teru andrii-i luismarques atishp04 lijinpei xuao-hefei nick-knight ruinland piaopiaohun tjupathfinder bjcolbe liuqingfa konrad-schwarz ebahapo chaoyangnz laokz rouchel wangxiuquan80 luyahan rincewindshat williamwangpeng bwalderman jrtc27 hsiangkai craigblackmore kallisti5 justanotherdot zakk0610 xiaoerlang0359 joshua-401 xerpi saivk rick-yin colinianking orb1t-ua ianchen0119 xypron joe-degs samuelgruetter xukang919 liaoshihua lyctw byronze cmuellner fioraking rui314 doytsujin icenowy wei-zhanpeng arcbbb pcwang-thead patrick-rivos lhtin gezhongfengge palmer-dabbelt minseongg sanbuphy kisssko yanghyu luhaocong kasperk81 cloudspurs xionghul keanusgithub

riscv-elf-psabi-doc's Issues

FP calling convention, mixed int and fp args

Krste pointed out that the FP calling convention isn't explicit about what happens when you have both int and fp args. It only says that FP regs are still used even if the int regs are exhausted. But doesn't say what happens if the int regs haven't been exhausted yet. Or what happens with int args after you start using FP regs for args. In particular, if you have 8 int args and 8 fp args (which are smaller than int/fp regs), then you can pass all of them in registers no matter what order they appear in.

There is apparently an old RISC-V calling convention which can still be found on the web which says otherwise.

Classes with virtual methods appear to be forced to pass in memory

http://refspecs.linuxbase.org/cxxabi-1.83.html#calls implies that classes without nontrivial copy constructors are supposed to be handled like structs, but this code is passing a one-word class in memory:

class A {
public:
virtual void fn();
};

extern void abc(A);
void cba() { A q; abc(q); }

Idea: General syscall ABI

Right now the Linux syscall interface uses a7 for the syscall number and a0-a6 for integer arguments. This is quite adequate for Linux, where syscalls have at most 7 integer arguments; but I think it'd be cleaner to have a general syscall ABI defined with the same functionality as the base ABI.

Straw proposal: syscall number in t0 or t1 (TBD; t1 is friendlier to millicode) before ecall, everything else as for the base calling convention.

RV32E tracking issue

I've been having some questions about RV32E support in LLVM+Clang. It's difficult to consider adding support as the RV32E instruction set extension itself isn't yet "frozen" and the ABI isn't fully documented. A little more documentation on the proposed ABI has been added recently (thanks!), but I thought it would be worth making an issue to track the work that still needs to be done.

As far as I can see, we need:

Syscall ABI for RV32E (#11). Discussion on that issue also mentions dynamic linking
An integer register convention table for RV32E.
Add an RV32E table to "Default ABIs and type sizes" (or else specify if RV32E is identical to RV32G). Are long long and doubles still 8-byte aligned? Is long double still 16 bytes?

Are there other issues that need to be addressed? Please note: I'm simply creating this issue to track what needs to be done, not to claim it - I'm not currently distributing or directly supporting RV32E IP myself.

long double in structs, call/return inconsistency

Nested structs with long doubles (more generally: float types = 2 * XLEN > FLEN) and other zero-byte fields are in some cases passed in GPR pairs. It'd be simpler if we could say that anything with a real long double field (in particular, not a zero-length array of long double) is forced to memory passing.

Alignment for attribute((vector_size(N)) objects should be specified

Both GCC and Clang support this syntax and generate working code even when compiling for a system with no vector support. Therefore, it would be good if we can standardise the ABI details.

FP calling convention: no mention of unions in struct flattening description

Consider the following declarations:

struct s1_ty { float f; int i;  };
struct s2_ty { float f; char c;  };
union u_ty { struct s1_ty s1; struct s2_ty s2; };
struct s3_ty { union u_ty u; };

As it stands, the FP calling convention makes no mention of unions. So if I had a function taking u_ty as an argument I might reasonably pass this according to the integer calling convention. If a function took s3_ty, there's a question of whether a union can ever be "flattened". For instance in this particular case, the first field of the union will always be a float. Presumably even if you did "flatten" the fp field, you'd still consider the second field of the flattened result to be a union of int and char, which would be an "aggregate" rather than an "integer". The degenerate case would be a union of two identical structs, containing a real and an integer.

I assume the intention is that unions are never "flattened"?

Stack alignment is unspecified

e_flags field for ILP32-on-RV64

Presumably, an ILP32-on-RV64 ABI would use ELF32, so the ELF class no longer unambiguously equals XLEN.

This isn't an issue for x86-64's x32 ABI: since x86-64 uses a different ELF machine code than IA-32, there is no ambiguity.

Bit fields are unspecified

The AArch64 AAPCS and x86-64 psABI both make an effort to standardise bit field representation. I don't know how much code is out there that exposes bit fields in its public ABI, but it may be worth specifying something sensible here.

Documentation of TLS data structures

We've been implementing TLS support in lld: https://reviews.llvm.org/D39324, but the documentation regarding the layout of TLS structures is missing.

If I'm understanding correctly from reading the various TLS headers in glibc: tp is set up to point to the end of the TCB and the beginning of the static TLS block; this means that it is neither variant I nor II as described in ELF Handling for Thread-Local Storage. The front of the TCB contains a pointer to the DTV, and each pointer in DTV points to 0x800 past the start of a TLS block to make full use of the range of load/store instructions.

Is it safe to assume that this will be the case across all platforms? These values are hardcoded in bfd as well. What is the motivation behind making tp to point to the end of TCB as opposed to following the two variants defined in the ELF TLS paper?

Calling convention weirdness: mixed struct returns

A struct { double x; long y; } is passed in a GPR and a FPR on call, but in two GPRs on return. It'd be didactically useful for have the rules for return values be precisely the same as for call arguments.

R_RISCV_CALL vs R_RISCV_CALL_PLT

int foo();
int g() { return foo()+1; }

This compiles to R_RISCV_CALL (call foo) in -fno-pic mode and R_RISCV_CALL_PLT (call foo@plt) in -fpie/-fpic mode.

IMHO the distinction is not really useful. We can avoid R_RISCV_CALL and use R_RISCV_CALL_PLT everywhere (also in -fno-pic mode). If the target symbol is non-preemptable (local/hidden/not -shared/etc), the linker can omit PLT creation for R_RISCV_CALL_PLT. AFAICT, the only differences are these lines:

// binutils-gdb/bfd/elfnn-riscv.c
	case R_RISCV_CALL_PLT:
	  /* This symbol requires a procedure linkage table entry.  We
	     actually build the entry in adjust_dynamic_symbol,
	     because this might be a case of linking PIC code without
	     linking in any dynamic objects, in which case we don't
	     need to generate a procedure linkage table after all.  */

	  if (h != NULL)
	    {
	      h->needs_plt = 1;
	      h->plt.refcount += 1;
	    }
	  break;

	case R_RISCV_CALL:
	  /* Handle a call to an undefined weak function.  This won't be
	     relaxed, so we have to handle it here.  */
	  if (h != NULL && h->root.type == bfd_link_hash_undefweak
	      && h->plt.offset == MINUS_ONE)
	    {
	      /* We can use x0 as the base register.  */
	      bfd_vma insn = bfd_get_32 (input_bfd,
					 contents + rel->r_offset + 4);
	      insn &= ~(OP_MASK_RS1 << OP_SH_RS1);
	      bfd_put_32 (input_bfd, insn, contents + rel->r_offset + 4);
	      /* Set the relocation value so that we get 0 after the pc
		 relative adjustment.  */
	      relocation = sec_addr (input_section) + rel->r_offset;
	    }

I don't know much about BFD internals, but it appears the existence of a PLT in -fpie/-fpic code does not make differences. R_RISCV_CALL_PLT is not necessary to force the creation of a PLT if the symbol is non-preemptable.

In -fno-pic code, there may be a so-called "canonical PLT" (created due to an absolute/pc relative relocation to a function). However, its handling has nothing to do with the dichotomy of R_RISCV_CALL{,_PLT}.

I don't know what the weak-undef code does.

On x86_64, newer gas/llvm-mc (after r358652) produces R_X86_64_PLT32 for call foo and call foo@plt. R_X86_64_PLT32 can be optimized to R_X86_64_PC32 if the symbol is non-preemptable.

powerpc32 ELFv1 specifies R_PPC_LOCAL24PC R_PPC_REL24 and R_PPC_PLTREL24. powerpc64 ELFv2 just uses R_PPC64_REL24.

about Calling Convention

RISC-V Integer Calling Convention in web site:"https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md" says that "Aggregates whose total size is no more than XLEN bits are passed in a register, with the fields laid out as though they were passed in memory. If no register is available, the aggregate is passed on the stack. Aggregates whose total size is no more than 2✕XLEN bits are passed in a pair of registers".
Does this mean that the aggregate type of more than 2✕XLEN must be placed on the stack according to the RISC-V calling specification? so the Calling Convention seems to conflict with the code generated by GCC's -fipa-sra optimization option, as this option may use more than two registers when converting aggregate types to scalars. So, does this mean that the use of RISC-V should disable the -fipa-sra option?

Misleading example in floating point calling convention section

The document currently states: "struct { struct { float f[1]; } g[2]; int h; } and struct { float f; float g; int h; } are treated the same"

There are multiple possible fixes depending on the point that is being made. Perhaps { struct { float f[1]; } float g[1]; int h; } is what was intended? However regardless of whether you flatten the struct fields, this is (by my reading) always passed identically to the integer calling convention. struct { struct { float f[1]; } int h[1]; } would perhaps be a better example?

How to relax absolute address?

I get the following disassembly code from an object file

0000000000000000 <init_module>:
    0:   000007b7            lui a5,0x0
             0: R_RISCV_HI20 name
             0: R_RISCV_RELAX    *ABS*
    4:   0007b583            ld  a1,0(a5) # 0 <init_module>
             4: R_RISCV_LO12_I   name
             4: R_RISCV_RELAX    *ABS*
    8:   00000537            lui a0,0x0
             8: R_RISCV_HI20 .LC0
             8: R_RISCV_RELAX    *ABS*
    c:   ff010113            addi    sp,sp,-16
   10:   00050513            mv  a0,a0
             10: R_RISCV_LO12_I  .LC0
             10: R_RISCV_RELAX   *ABS*
   14:   00113423            sd  ra,8(sp)
   18:   00000317            auipc   t1,0x0
             18: R_RISCV_CALL    printk
             18: R_RISCV_RELAX   *ABS*
   1c:   000300e7            jalr    t1
   20:   00813083            ld  ra,8(sp)
   24:   00000513            li  a0,0
   28:   01010113            addi    sp,sp,16
   2c:   00008067            ret

As mentioned in the ABI doc, it makes sense to place an REL entry of type R_RISCV_RELAX in address 18 to relax the function call to a JAL instruction:

Procedure call linker relaxation allows the AUIPC+JALR pair to be relaxed to the JAL instruction when the prodecure or PLT entry is within (-2MiB to +2MiB-1) of the instruction pair.

However, I am confused about the REL entry of type R_RISCV_RELAX in address 0. How to relax an absolute addressing?

Relocations documentation

The meaning of the relocations' operands in the Details column is not documented. Presumably S stands for symbol and A for addend. But maybe I'm interpreting it incorrectly as something seems to be wrong. For instance:

Enum	ELF Reloc Type	Description	Details
1	R_RISCV_32	Runtime relocation	word32 = S + A
39	R_RISCV_SUB32	32-bit label subtraction	word32 = S - A

My understanding of R_RISCV_32 is that it should replace the existing value at some memory location. Therefore the interpretation of word32 = S + A seems straightforward, with S being the referenced symbol and A being an addend offset. So if your assembly file has something like .word some_symbol+4 you'd get a R_RISCV_32 relocation with word32 = &some_symbol + 4. But then that interpretation doesn't work for R_RISCV_SUB32. If you have .word sym2+4-sym1 you get the two relocations R_RISCV_ADD32 sym2+4 and R_RISCV_SUB32 sym1+0. So shouldn't R_RISCV_SUB32 be documented as word32 -= S + A, or something along those lines?

Please clarify this issue and what the operands should refer to.

On RISC-V, lazily bound functions must follow the ABI

See this x86 ABI bug for more details: https://sourceware.org/bugzilla/show_bug.cgi?id=21265

I vote we solve this on RISC-V by just requiring that lazily bound functions follow the standard ABI. This lets us avoid saving the rest of the X registers, and assuming we can do the fixup without any extensions (which seems reasonable, and is what we currently do despite it not actually being enforced) we won't need to save F or V state here.

I don't know where this should go in the manual, @asb?

Document in-memory representation of _Bool

The in-memory representation of _Bool is not documented anywhere.

The SysV64 ABI specifies it as follows:

Booleans, when stored in a memory object, are stored as single byte objects the value
of which is always 0 (false) or 1 (true).

Which section of the ABI would be appropriate to add this type of wording ?

Some parts of the ISA, e.g., the RISC-V Vector ISA, use 0 for false, and ~0 for true. It might be saner to try to make the representation of _Bool to match all contexts, and use ~0 for true everywhere. ~0 is a mask that selects all bits. Being able to just do a "logical and" with _Bool to select all or no bits would be quiet useful and elegant.

Support Thread-Local Storage Descriptors (TLSDESC)

TLSDESC (-mtls-dialect=gnu2) improves traditional General Dynamic and Local Dynamic TLS models (-mtls-dialect=gnu). In the most common case that TLS variables are defined in initially-loaded modules, it simplifies the work in __tls_get_addr and (probably more importantly) switches to a custom calling convention that doesn't clobber any register ("preserve any registers they modify"). This speedup is significant.

The linker may relax TLSDESC code sequence to Initial Exec (targeting an executable, the symbol is preemptable) or Local Exec (target an executable, the symbol is non-preemptable) if applicable.

The initial (static) and outstanding (dynamic) relocation types for TLSDESC have to be defined, as well as how static relocation types are relaxed to Initial Exec and Local Exec models.

TLSDESC is currently available on x86, x86-64, arm and aarch64.

x86: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-x86.txt
ARM: https://www.fsfla.org/~lxoliva/writeups/TLS/RFC-TLSDESC-ARM.txt (the published paper was referenced by ELF for the Arm 64-bit Architecture (AArch64) and ARM 32-bit)

As I understand it, TLSDESC is a strict improvement, so it might be worth defaulting to TLSDESC and probably deprecating relocation types for General Dynamic/Local Dynamic.

Consider standardising which stack slot fp points to

In the ARM world, there were complaints that GCC and LLVM ended up making different choices regarding the stack slot fp points to. See here and here. There is a desire for "fast unwinding", i.e. unwinding without having to use DWARF metadata.

Given that the frame pointer will likely be omitted in the common case and the prevalence of optimisations like shrink wrapping, this feels like a minor point. However it did cause pain in the ARM community so thought I'd raise it here for consideration.

Size of standard C types should be described

There is a table in the v2.1 spec which should be moved here. It might also be worth expanding to include a mapping of _Complex and _Bool.

The table in the v2.1 specification lists long double as 16 bytes in RV32. Is that correct given that RV32IMFDQ is specifically disallowed in the spec? Or is the idea that it's a handy way to access 128-bit fp implemented via software emulation routines?

_Complex complexity

The AMD64 psABI specifies that _Complex FOO is always exactly the same as struct { FOO re; FOO im; }. What is implemented in RISC-V gcc is significantly more complicated. For instance, a _Complex float can be passed in three different ways on RV64G hard-float:

As two adjacent FPRs, like fa4 and fa5, if there are 0–6 argument slots used already. This passes them in the recoded float format.
Multiplexed into a single 64-bit GPR a7 if there are 7 argument slots used.
Passed on the stack with 32-bit alignment if 8 or more argument slots used.

If we keep this, it'll add complexity to the psABI document, libffi, and other non-gcc programs which need to interwork with gcc such as llvm and various non-libffi ffis. If we don't it'll require gcc changes and break compiled programs with complex number arguments (probably just LAPACK at this point). That complexity would only be justifiable if it gave a "sufficiently large" performance benefit to "sufficiently many" programs, but I don't have a great understanding of the tradeoffs here.

Attn @aswaterman @kito-cheng because this is a gcc calling convention issue. I am going to be filing a few more issues for complicated spots in the calling convention so we can decide which to keep.

The MVP calling convention is something close to the following:

All arguments which are larger than 2 * XLEN, have nontrivial copy constructors, or that are larger than XLEN and contain (transitively) a floating point field larger than FLEN, are passed in memory and replaced in the argument list by a pointer.
If the return value would need to be passed in memory if it were an argument, it is written into a caller-allocated buffer which becomes the first argument, and then replaced by void.
All arguments are expanded to a minimum of XLEN bits size and alignment with unspecified padding.
All arguments are placed in a notional struct, following normal alignment rules (which will always be XLEN or 2 * XLEN)
The first 8 * XLEN bits of the struct are passed in registers a0 - a7, except for rule (7).
If an argument narrower than XLEN consists of a single field of integral type, it is widened according to the sign of its type up to 32 bits and then sign extended to XLEN.
If FLEN >= XLEN and a [XLEN, FLEN]-aligned portion of the first 8 * XLEN of the argument list represents a field of floating point type (possibly inside a struct, but not if inside a union), that argument will be placed in the faX register of the same number instead. If FLEN = 2 * XLEN this will leave both aX and a(X+1) empty. (If FLEN = 4 * XLEN then this rule cannot trigger because floats will be passed in memory by rule 1.) This rule is not applied for arguments to a variadic function which do not correspond to a formal parameter.
sp points at the portion of the argument struct which is not passed in registers.
On exit, the return value is mapped to registers as if it were the only argument of a function, following rules 3–7.

(Not 100% certain about the sign extension rule here; I'd also like to write up a rationale in a bit to see if this makes sense)

R_RISCV_32_PCREL relocation is undocumented

I see binutils 2.29 added a new R_RISCV_32_PCREL relocation. Could somebody please document its purpose and semantics?

Too many floating point ABI variants?

The rules for the floating point calling convention result in rather a high number of mutually incompatible ABIs. Is this desirable?

Currently we have:

RV32E
RV32I
RV32IF
RV32G (RV32IFD)
RV64I
RV64IF
RV64G (RV64IFD)
RV64IFDQ

Plus of course RV128 in the future. There is no RV32IFDQ as this combination is disallowed by the ISA specification.

Clarity on rules for bitfields in fp+int structs eligible for the FP calling convention

There is a mention for bitfields, but it's still slightly under-documented.

The behaviour I'm observing from GCC is that:

struct s1 { float f; long long int j : 32; } and similar are passed as FPR+GPR on ILP32D (should be clarified in the doc that this is still allowable, despite long long int being 64 bits.
A struct with a float + any number of zero-width bitfields is passed in an FPR, but passed according to the integer calling convention if there is e.g. a zero-width bitfield + a int or non-zero-width bitfield

Can someone please confirm the above are expected behaviour, suitable for documentation in the psabi-doc?

Alignment of scalar values should be specified

The calling convention description is incomplete without specifying the alignment of scalar values. Of course the alignment is usually sizeof(ty), but some 32-bit ABIs allow doubles to be 4-byte aligned.

Claiming the calling convention

I've just done a bunch of tests to nail this down for the libffi port, so I should probably document this very soon.

Spilling fp registers with length > FLEN

Section "Hardware floating-point calling convention" specifies only usage of argument registers (argument passing and value returning). It should also specify that floating point values wider than FLEN stored in callee-saved registers aren't be preserved across function call. This allows spilling registers by code compiled without support of Q or D.

Sign extension of int when passing int+fp struct?

When using the hard FP ABI, a struct containing just an int+fp may be passed in one GPR and one FPR. If the integer is < XLEN, is its extension to XLEN bits specified? i.e. is it passed as if it were a scalar parameter (extended according to the sign of the type up to 32-bits, then sign extended to XLEN), or are the bits beyond the width of the original type undefined?

DWARF register mappings

The DWARF specification says that ABI implementations should define the hardware register -> DWARF register mappings (§2.6.1.1.2). It seems that this is not included in the RISC-V psABI, nor can I seem to find this mapping anywhere else in official RISC-V documentation. As an example, the x86-64 psABI documents the register mapping in §3.6.2.

For context, this is needed in Mono to do unwinding. E.g. for x86-64: https://github.com/mono/mono/blob/aec2773e1db0799479161688d6161f5d5ce586a3/mono/mini/unwind.c#L46-L51

Clarification For Empty Struct Handling in C++

This issue is about my interpretation of the ABI and GCC's implementation of the ABI. GCC is definitely broken in one specific case, but what the fix is depends on the ABI, and it's not obvious to me exactly what the existing text means.

Lets start with an example of what GCC does wrong, this is actually a C++ test, not C, that is important:

#define MAKE_STRUCT_PASSING_TEST(type,val)                              \
  static struct struct_ ## type ## _t                                   \
  {                                                                     \
    struct { } e;                                                       \
    struct { type f; } s;                                               \
  } global_struct_ ## type = { {}, { val } };                           \
                                                                        \
  static bool                                                           \
  check_struct_ ## type (struct_ ## type ## _t obj)                     \
  {                                                                     \
    return (obj.s.f == global_struct_ ## type .s.f);                    \
  }                                                                     \
                                                                        \
  int                                                                   \
  main ()                                                               \
  {                                                                     \
    bool result = check_struct_ ## type ( global_struct_ ## type );     \
    return result ? 0 : 1;                                              \
  }

MAKE_STRUCT_PASSING_TEST(float,2.5)

The relevant part of the ABI document is this:

Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. This is not the case for C++, which requires them to be sized types.

As I said, this is a C++ test, so the empty sub-struct will have size 1-byte, but with trailing padding will account for 4-bytes in the containing struct.

Currently GCC is a bit of a mess in its handling of these cases. In the example above with a float argument GCC tries to ignore the empty struct, but then incorrectly passes the float field (it actually passes the empty struct instead). If the above example is changed to contain an integer field then GCC correctly passes the full struct, including the empty part and the integer field using two integer registers.

The quoted part of the ABI document doesn't really say what should happen with empty C++ structs, it seems to me the ABI document simply states that such structures are non-zero sized, which is true.

However, as far as I understand it, the content of that non-zero sized space is undefined, and as such a compiler could ignore the empty structure in C++ just as it does in C.

In the above case the caller could pass the non-empty fields and the callee can simply make up content with which to fill the non-zero sized empty struct, this feels just as valid as passing over some undefined bytes.

I currently see two routes forward, one would be we change the above quoted part to read something like:

Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. In C++ empty structs and unions are required to be sized types, however, as their content is undefined, they are similarly ignored when passing arguments.

We then fix GCC to correctly ignore the empty structs in C++, this would fix the float case which is just broken, but would be ABI breaking in GCC for the case of passing an empty struct and an integer.

Alternatively, we make the intention of the ABI clearer by changing the above quoted text to read something like:

Empty structs or union arguments or return values are ignored by C compilers which support them as a non-standard extension. This is not the case for C++, which requires them to be sized types, and these are passed as integer arguments.

We then fix GCC to correctly pass over the empty structs in C++, this would fix the float case which is currently just broken, and would maintain the existing ABI in GCC for the integer case. However, the final ABI would be slightly less efficient in this case (which is probably an edge case anyway, so we probably don't care too much about efficiency).

Stack boundary for RV32E

The stack pointer need only be aligned to a 32-bit boundary.

We only require stack pointer aligned to 32-bit before, however we allow RV32EF and RV32EFD now, it's might be a problem for RV32EFD.

Here is two possible solution:

Change to 64-bit boundary.
More ABI...? ILP32E, ILP32EF and ILP32ED.

Clarifying the behaviour of "callee-save" floating point registers in the soft-float ABI

This document should clarify whether fs0-fs11 should be considered callee-save when compiling for the soft-float ABI. e.g. -march=rv32imaf -mabi=ilp32.

I would suggest that all floating point registers must be considered temporaries when compiling for the soft float ABI. Consider linking object file F.o compiled with -march=rv32imaf -mabi=ilp32 and D.o compiled with -march=rv32imafd -mabi=ilp32. If code in D.o stores a double in fs0 and code in F.o attempts to spill it, the value would obviously be corrupted. We could of course define new bits in e_flags and linker behaviour to try catch these problems at link time (ensuring code compiled with FLEN=0 is only linked with code compiled with one other FLEN value, e.g. a mix of objects with flen=0 and flen=32 is fine, but not flen=0 and flen=32 and flen=64).

Customized relocation type

In our current CPU design, we extend RISC-V ISA by adding several instructions, such as GP-implied load/store. To handle these instructions, we also extend relocation type. However, there does not have any rule or guideline to add customized relocation type currently. To avid conflict, in our current design, the new relocation id is assigned from 255 and in decreasing order. Could anyone have any idea or propose to deal with this issue?

Documentation does not capture different addresses in R_RISCV_PCREL_HI20 and R_RISCV_PCREL_LO12_[IS]

The documentation says the following about the PCREL relocations:

23 R_RISCV_PCREL_HI20 PC-relative reference %pcrel_hi(symbol) (U-Type)
24 R_RISCV_PCREL_LO12_I PC-relative reference %pcrel_lo(symbol) (I-Type)
25 R_RISCV_PCREL_LO12_S PC-relative reference %pcrel_lo(symbol) (S-Type)

This lead me to believe that the same data address would be referenced in the HI20 as the LO12.

However, upon compiling an assembly file containing an access to a global of the following form:

ld t4,my_global+8

and inspecting the relocation table generated with elfdump, I discovered that the symbols referenced by the PCREL_HI20 and the PCREL_LO12_I relocations were different:

entry: 4
        r_offset: 0x10
        r_info: 0x400000017
        r_addend: 8

entry: 5
        r_offset: 0x10
        r_info: 0x33
        r_addend: 8

entry: 6
        r_offset: 0x14
        r_info: 0x2900000018
        r_addend: 0

Looking at these symbols, the one referenced by the PCREL_HI20 was the global:

entry: 4
        st_name: my_global
        st_value: 0
        st_size: 16
        st_info: STT_OBJECT STB_LOCAL
        st_shndx: 3

But the one referenced by the PCREL_LO12_I was a label into the text section:

entry: 45
        st_name: .L0 
        st_value: 0x44
        st_size: 0
        st_info: STT_NOTYPE STB_LOCAL
        st_shndx: 1

It appears that the PCREL_HI20 part refers to the data address being accessed, while the PCREL_LO12 part refers to a label indicating the start PC being accessed relative to. After going on this journey of discovery, I did find that the examples in the RISCV asm documentation could have given me this clue, although the example that describes this behavior points it out very subtly:

1:      auipc a0, %pcrel_hi(msg)    # load msg(hi)
        addi a0, a0, %pcrel_lo(1b)  # load msg(lo)

It's pretty easy to not notice that 1b is in pcrel_lo rather than msg, especially if you weren't looking for this piece of information specifically.

Would it be possible to update the documentation to indicate this behavior more clearly?

No mention of which register is the frame pointer (fp)

There's a missing trivial indication that x8 (s0) is the frame pointer. See for example https://riscv.org/wp-content/uploads/2015/01/riscv-calling.pdf .

Predefined macros

We have a fair number of RISC-V predefined macros which are either useless or redundant with standard macros or gcc extensions. We have an opportunity to remove them now and then spec the macros that RISC-V C compilers should implement. More details exist on private emails right now, but I'll make a PR soon.

Vararg details are underspecified

At the bare minimum we should specify the va_list struct. The document should probably go much further and give a detailed run-down, much like the AArch64 and x86-64 psABI docs.

Document _Alignof(max_align_t)

The "Defualt ABIs and C type sizes" section does not document the value of _Alignof(max_align_t), which is implementation-defined and part of the platform ABI.

long double has an alignment requirement of 16, which forces _Alignof(max_align_t) >= 16 on the 32-bit and 64-bit ISAs.

All memory allocations returned by malloc for sizes >= _Alignof(max_align_t) need to be aligned to a _Alignof(max_align_t) boundary, so from this POV, it makes sense to pick it "as large as necessary, as small as possible".

The SysV 32 and 64-bit ABIs set _Alignof(max_align_t) == 16, and this would also make sense here AFAICT.

The only reason to pick a higher value is if we plan to have types in the ISA with a fundamental alignment larger than 16 bytes according to the C standard.

double under -m32 -mfloat-abi=soft

The same reasoning as for long double under -m64 would apply here and double values, being = 2 * XLEN and > FLEN, should be passed exclusively in memory.

PLT for RVE

Hi, guys

There isn't t3 in RVE. Thus, I propose a solution following: replacing t2 with t0 and replacing t3 with t2 in the entries.

Dong

...The first entry in the PLT occupies two 16 byte entries:

1: auipc t0, %pcrel_hi(.got.plt)
sub t1, t1, t2 # shifted .got.plt offset + hdr size + 12
l[w|d] t2, %pcrel_lo(1b)(t0) # _dl_runtime_resolve
addi t1, t1, -(hdr size + 12) # shifted .got.plt offset
addi t0, t0, %pcrel_lo(1b) # &.got.plt
srli t1, t1, log2(16/PTRSIZE) # .got.plt offset
l[w|d] t0, PTRSIZE(t0) # link map
jr t2
Subsequent function entry stubs in the PLT take up 16 bytes and load a function pointer from the GOT. On the first call to a function, the entry redirects to the first PLT entry which calls _dl_runtime_resolve and fills in the GOT entry for subsequent calls to the function:

1: auipc t2, %pcrel_hi([email protected])
l[w|d] t2, %pcrel_lo(1b)(t2)
jalr t1, t2
nop

Using RVE ABI with rv32i.

The current docs say
The RV32E calling convention may only be used with the RV32E ISA, hence the role of registers x16-x31 and f0-f31 is not defined. A future version of this specification may relax this constraint.

But it is easy to make rve work with rv32i by making the extra registers call clobbered (i.e caller saved). This is how FP regs already work when using for instance the ilp32 ABI with the rv32gc architecture. This is how the proposed EABI works with registers x16-x31. And this is also how the gcc RVE implementation works, though it needed a small bug fix to make a6 and a7 usage depend on the ABI instead of the architecture. So I think this should be allowed.

Floating-Point Control and Status Register (FCSR) not mentioned

The Floating-Point Control and Status Register (FCSR) contains reserved bits, the rounding mode (frm) and the accrued exception flags (fflags).

The frm are probably a system-wide setting.

What about the fflags? Are they supposed to be preserved across function calls?

Two clarifications.

Hi,

I assume that packed structs are always passed on the stack? Can't find it documented?

Also, an RV32IMAD (no F extension), should it still have float arguments in float registers? (Unlikely combination, but should be documented).

Syscall ABI for RV32E

Current syscall abi implement in newlib is pass argument in $a0~$a3 and syscall number in $a7[1]

However $a7 is not avariable in RV32E, so let put in $a4, $a5?

[1] https://github.com/riscv/riscv-newlib/blob/riscv-newlib-2.4.0/libgloss/riscv/machine/syscall.h

Document implications of signal handling

There's no red zone so the stack pointer must be decreased before any data can be stored in the frame. From openhwgroup/cv32e40p#10 .

Should we constrain the target label of a %pcrel_lo to be in the same section?

From the wording, it is unclear to me if the following is acceptable or not

.text
.globl _start

foo:
    addi a1, a0, %pcrel_lo(label)
    ret

.section .text.new_section
_start:
label:
    auipc a0, %pcrel_hi(bar)
    j foo

bar:
    ret

GNU ld complains with dangerous relocation: %pcrel_lo missing matching %pcrel_hi, so perhaps there is an underlying assumption that the reference and the label must be found in the same section?

I think the assumption is reasonable but I fail to see the text explicitly constraining this. For example, something like this

The `R_RISCV_PCREL_LO12_I` or `R_RISCV_PCREL_LO12_S` relocations contain
a label pointing to an instruction with a `R_RISCV_PCREL_HI20` relocation
entry that points to the target symbol:

 - At label: `R_RISCV_PCREL_HI20` relocation entry ⟶ symbol
- - `R_RISCV_PCREL_LO12_I` relocation entry ⟶ label
+ - `R_RISCV_PCREL_LO12_I` relocation entry ⟶ label. The reference to the label
+   and the label definition must be in the same section.

What do you think?

Thanks!

Document the non-atomic relaxation pitfall

This should be in the main document somewhere but right now I'm writing this primarily for @PkmX .

Consider the following:

.option norvc
.option relax
__global_pointer$:
.skip 2028
L1:
addi a0, a0, %lo(L3)  // (D)
L2:
lui a1, %hi(L1) // (A)
addi a1, a1, %lo(L1) // (B)
lui a0, %hi(L3) // (C)
j L1
L3:

Relaxation of one instruction at a time in address order with immediate update of symbol values will break this code. When (D) is visited, L3 - __global_pointer$ is out of range for a 12-bit immediate so the addi has to be kept, but when (C) is visited that offset is in-range because (A) + (B) can be relaxed to a single instruction. Relaxing (C) then causes a0 to be not set when control flow reaches (D).

It is necessary for all linked %hi / %lo pairs to make a consistent decision regarding whether the relaxation should be performed; since there's no metadata explicitly linking the pairs, the easiest way to arrange this is with a two-pass algorithm:

Visit each relaxable relocation, determine if it can be relaxed, and perform byte modifications but do not change any symbol values. The current binutils implementation uses R_RISCV_DELETE pseudo-relocations to record bytes which need to be deleted in the next phase; since it is never written into object files it is not an actual ABI relocation.
Perform all pending byte deletions.

As an additional benefit, the two-pass algorithm is linear time and somewhat parallelizable.

Alignment requirements of small aggregates/scalars passed on the stack should be specified

Imagine I have a double or a struct containing a double on an RV32I system. This probably has 8-byte alignment (see #19). In the case that there are no GPRs available, will the stack slot it is written to be 4 or 8-byte aligned? (i.e. aligned according to the original type, or according to xlen).

There's a related case with a struct containing an int32 on an RV64I system. The struct has 4-byte alignment, but when I fail to allocate a GPR for it there's the choice of assigning 4-byte sized 4-byte aligned stack slot, an 8-byte size 4-byte aligned stack slot, or a 8-byte size 8-byte aligned stack slot. I think "bits past the end of an aggregate whose size in bits is not divisible by XLEN, are undefined" suggests an 8-byte stack slot, but the alignment is still unspecified.

Is the intended alignment rule max(xlen_align, object_align) for these cases?