In the ARM world, there were complaints that GCC and LLVM ended up making different ch

The GCC docs say <div class="snippet-clipboard-content notranslate position-relati

Consider standardising which stack slot fp points to about riscv-elf-psabi-doc HOT 14 CLOSED

riscv-non-isa commented on July 20, 2024

Consider standardising which stack slot fp points to

from riscv-elf-psabi-doc.

Comments (14)

palmer-dabbelt commented on July 20, 2024

I think we should standardize this. I see no reason to pick one approach over the other, is there any reason this isn't an arbitrary choice? If it is arbitrary, can we just pick what GCC is doing and write it down?

Here's my test

$ cat frametest.c 
extern void inc(int *i);

int fptest(void)
{
	int i = 0;
	inc(&i);
	return i;
}
$ riscv64-unknown-linux-gnu-gcc frametest.c -o - -S -O3 -fno-omit-frame-pointer
	.file	"frametest.c"
	.option nopic
	.text
	.align	2
	.globl	fptest
	.type	fptest, @function
fptest:
	add	sp,sp,-32
	sd	s0,16(sp)
	add	s0,sp,32
	add	a0,s0,-20
	sd	ra,24(sp)
	sw	zero,-20(s0)
	call	inc
	ld	ra,24(sp)
	lw	a0,-20(s0)
	ld	s0,16(sp)
	add	sp,sp,32
	jr	ra
	.size	fptest, .-fptest
	.ident	"GCC: (GNU) 6.1.0"

which looks to me like the return link would be at -8(fp), so we're matching what LLVM does for ARM. You should verify that, though, as I'm very bad at arithmetic.

from riscv-elf-psabi-doc.

asb commented on July 20, 2024

This is the most usual thing to do, apparently the ARM GCC behaviour is a holdover from the old procedure calling standard (ref).

One of the *Sanitizer devs kindly gave me some extra feedback on this - he made the point that although fp elimination is common, typically with a sanitizer the top frames of a stack trace are in code compiled by you anyway. Plus even besides the performance issue, JITted code often doesn't have the DWARF info you might otherwise use to determine what fp is pointing to.

from riscv-elf-psabi-doc.

sorear commented on July 20, 2024

We'd get better compression if we used positive offsets from the frame pointer. Is that possible to arrange without breaking whatever invariant LLVM relies upon?

from riscv-elf-psabi-doc.

jim-wilson commented on July 20, 2024

You get a better testcase if you add an alloca call, or alternatively, use a variable length array, to force the compiler to use the fp for local variables. But alloca is easier.

gamma02:2025$ cat tmp.c
#include <alloca.h>

extern void inc(int *i);

int fptest(void)
{
  int i = 0;
  inc(&i);

  int *ptr = alloca (64);
  *ptr = 0;
  inc(ptr);

  return *ptr + i;
}
gamma02:2026$ cat tmp.s
	.file	"tmp.c"
	.option nopic
	.text
	.align	1
	.globl	fptest
	.type	fptest, @function
fptest:
	addi	sp,sp,-48
	sd	ra,40(sp)
	sd	s0,32(sp)
	sd	s1,24(sp)
	addi	s0,sp,48
	sw	zero,-36(s0)
	addi	a0,s0,-36
	call	inc
	addi	sp,sp,-80
	mv	s1,sp
	sw	zero,0(s1)
	mv	a0,s1
	call	inc
	lw	a0,0(s1)
	lw	a5,-36(s0)
	addw	a0,a0,a5
	addi	sp,s0,-48
	ld	ra,40(sp)
	ld	s0,32(sp)
	ld	s1,24(sp)
	addi	sp,sp,48
	jr	ra
	.size	fptest, .-fptest
	.ident	"GCC: (GNU) 7.2.0"
gamma02:2027$

The compiler does not use negative offsets in the prologue/epilogue, as they are all sp relative. We do use a negative offset from fp to access the local variable i.

The fp currently points at the top of the register save area, which is above the local variable area. We could use a positive fp offset if we moved the fp down below the register save area and the local variable area. However, if we did that, it would no longer be possible to do stack frame unwinding without access to dwarf2 unwind info, as the offset to the frame pointer would now include local var and register save areas, which are both variable size. The compiler knows the sizes of them, but a run-time stack unwinder would not. It is not safe to assume that you can calculate the size by disassembling code, because optimizations like shrink wrapping means the frame size may change during the function. And optimizations like basic block reordering means you can't assume a linear scan of the code will work. Also, functions can have multiple entry and exit points. Etc. If you want easy stack unwinding ala x86, then the frame pointer needs to point to the top of the register save area so that the ra is always a fixed offset from it.

MIPS made the mistake of making the FP point to the bottom of the stack frame, and stack unwinding has never been reliable on MIPS without the dwarf2 unwind info. Some people (like Cisco) have gone to great lengths to try to make this work without dwarf2 unwind info, and they have not been successful.

Of course, easy stack unwinding is not guaranteed by any standard. It is just that so many people have written so much code for x86 where easy stack unwind is possible, that they now expect every other target to provide easy stack unwinding also. If you want to someday replace x86, then you need to provide this feature, which means fp must be at the top of the stack frame near where the ra is stored.

from riscv-elf-psabi-doc.

sorear commented on July 20, 2024

Do unwinders actually need the CFA? As long as the relative relationship between the fp and ra is maintained, can the pair not go at the bottom of the frame instead of the top?

from riscv-elf-psabi-doc.

jim-wilson commented on July 20, 2024

You need the CFA if there is a variable offset between fp and the fp/ra save area. You get a variable offset if the fp doesn't point at the register save area.

Moving the fp/ra to the bottom of the frame sounds odd. That would mean having two register save areas. Unless maybe we put the local variable area above the register save area. Moving stuff around has the risk of creating accidental ABI incompatibilities though.

If we only move the fp/ra save area, that breaks -msave-restore. If we move the entire register save area, I think that still breaks -msave-restore, just not as badly, as we would need different offsets for fp/ra, putting them at the bottom instead of the top, which would be an ABI change.

Moving either fp/ra or all of them probably breaks -fstack-protector, as you can no longer use a single canary to separate the compiler controlled part of the frame from the user controlled part of the frame. This is probably not a commonly used option, but for people that need it, living without it is not an option, as code security is important.

There may also be other stuff that breaks. I just mentioned the first two I thought of.

from riscv-elf-psabi-doc.

sorear commented on July 20, 2024

If the Zcmp extension is ratified in its current public review form, gcc and clang will very likely have an ABI change flag day (only affecting -fno-omit-frame-pointer users); see linked issue.

from riscv-elf-psabi-doc.

enh-google commented on July 20, 2024

the current state is pretty annoying here --- it seems like GCC made the "wrong" choice initially (that is; did something different to every other widely-used architecture[1]), and since then llvm has copy & pasted that mistake, and now we're getting to copy & paste it into any code that uses __builtin_frame_address(0). (and, even more annoyingly, since this does just seem to be "repeated copy & paste of an accident", there aren't any good references on the internet. https://maskray.me/blog/2020-11-08-stack-unwinding is about as good as it gets with its "On RISC-V and LoongArch[2], use fp[-2] instead of fp[0].", or there's the llvm hack https://reviews.llvm.org/D87579 where the developers wonder why this is, but no-one has anything better than "determined experimentally".)

you might think this doesn't matter much as divergences from other architectures go, since it's only for debugging code, and only "quick and dirty" debugging code at that (since a serious crash debugger needs to use a proper DWARF-based unwind). but -- as someone currently going through an OS looking at this -- you'd be surprised how many people want to debug their programs, and you'd be surprised how many want a "quick and dirty" (well, "quick" anyway) solution. and all of those currently need a risc-v specific hack.

is there anything we can do about this mistake[3] before it gets any more entrenched? (the "we only care about 2c microcontrollers" assumption that no-one wants/needs -fno-omit-frame-pointer on riscvarchive/riscv-code-size-reduction#194 is sad too, and makes me think we're stuck with this mistake now, so we may as well at least document it?)

if not, can we at least document "yeah, someone messed up, don't try to find a reason behind this [or question why i'm sending you this dodgy-looking patch for your nice clean portable-to-everything-except-risc-v code], it's just the way it is". the current situation of "risc-v is different from everyone else, but no-one knows why, and there's no documentation easily findable on the internet" seems like the worst of all worlds.

having looked at llvm trying to find a comment explaining why we need __builtin_frame_address(0) - 16, i learned that sparc is so different it needs its own file. but i'm happy to call sparc dead commercially, and they have the excuse of being older than many software engineers. risc-v is too new to use that argument.
so hilariously, this has even been copy & pasted into a different architecture now too!
i know this is an arbitrary decision in a technical sense, but as "speaker for the ecosystem that wants the same code to 'just work' across multiple architectures", deviation that doesn't actually gain anything is a mistake.

from riscv-elf-psabi-doc.

palmer-dabbelt commented on July 20, 2024

The GCC docs say

The frame is the area on the stack that holds local variables and saved
registers.  The frame address is normally the address of the first word
pushed on to the stack by the function.  However, the exact definition
depends upon the processor and the calling convention.  If the processor
has a dedicated frame pointer register, and the function has a frame,
then @code{__builtin_frame_address} returns the value of the frame
pointer register.

and my simple example seems to return FP

$ cat builtin_frame_address.c 
void * bfa (void)
{
  return __builtin_frame_address(0);
}
$ riscv64-unknown-linux-gnu-gcc builtin_frame_address.c -O3 -S -o-
bfa:
	addi	sp,sp,-16
	sd	s0,8(sp)
	addi	s0,sp,16
	mv	a0,s0
	ld	s0,8(sp)
	addi	sp,sp,16
	jr	ra

So i think this isn't quite a bug, as it's within what the docs allow. Being different is a headache, though. As far as I know it wasn't really on purpose, it's just what GCC happened to do. We probably didn't even really think about this when doing the original GCC port.

I wouldn't be opposed to having some sort of -mno-weird-buildin-frame-address type argument that makes this match everyone else, with some way for source code to interrogate that (maybe a builtin define?). I'm not sure that would actually make things better, though, as this is the sort of thing that bites people who don't know to look for it. I don't think we can just change the default over as it'll break the code that has been ported, and I'm not sure if having __builtin_frame_address(0) return something that's not FP would be legal.

from riscv-elf-psabi-doc.

enh-google commented on July 20, 2024

I'm not sure that would actually make things better, though...

yeah, especially because then there's no one fix for risc-v --- "now you have two problems" :-(

this really should be documented as an unfortunate difference from other architectures though, if only to make it easier for folks to get those "here's the hack you need for risc-v" patches accepted.

from riscv-elf-psabi-doc.

jrtc27 commented on July 20, 2024

I mean, PowerPC is different, S/390 is different, MIPS is different (and rather broken), and although you can chase the frames on Arm as if it were x86 the pointers themselves aren't at a well-defined location within the frame so it's less useful than x86. Plus you have SPARC and IA-64 that make things even weirder due to register windowing. Yes, ideally RISC-V would have looked more like x86 so people could continue to pretend this was a target-independent thing, but in reality it never has been and I don't think it's all that surprising that unwinding requires target-dependent knowledge. Arguably on architectures where you don't have a hardware call stack like x86 the RISC-V model makes more sense as it cleanly delimits the start and end of the frame; RA and SP being saved outside of [SP, FP) only gets done on other architectures because that's what you naturally got on x86 and so was chosen pragmatically for compatibility rather than because it made sense as a model.

It is of course many years too late to change for RISC-V so let's try and be more constructive and focus on what we can do, namely document our standard.

from riscv-elf-psabi-doc.

sorear commented on July 20, 2024

Taking that as a "no" on "should the order of the saved frame pointer and return address be swapped for compatibility with the order enforced by cm.push {ra,s0}, 0".

from riscv-elf-psabi-doc.

jrtc27 commented on July 20, 2024

Zcmp needs to not break this de-facto ABI (that should become "real" ABI) if that's what you mean? Whether that's changing the ISA extension to be more useful or just not caring and saying "don't use those instructions" in the case where unwinding needs to work I don't really care, that's up to the ISA designers to decide what trade-offs they want to make.

from riscv-elf-psabi-doc.

ilovepi commented on July 20, 2024

Judging by the content of this thread, the decision about the Frame Pointer ABI is settled (for better or worse). Is this now documented anywhere? If not, is there a reason why we haven't done so?

from riscv-elf-psabi-doc.

Consider standardising which stack slot fp points to about riscv-elf-psabi-doc HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent