The current code generator is a pretty janky mess. This is mostly because I don't really have a solid mental model for what steps are necessary to translate MIR into something like x64. There's a lot of edgecases and stuff to think about, and things which we can neatly ignore in the higher-level MIR become painfully present in the code generator. Things like:
lea
ing function pointers while mov
ing constants (as is done in 0836bf8)
mov
ing to the 8-bit registers doesn't clear the high-bits, so results look wrong
- which kinds of operands are legal varies by instruction, so extra instructions may need to be added or operands may be flipped around
- handling values which are 64-bits or larger is hard! they don't fit neatly into registers and must live on the stack, so whether we refer to some value directly through a register or indirectly through a pointer depends on its size
There is a lot of thorny stuff here, and I'm not really sure how to work around them. I think one very fundamental issue with the current approach is that the code generator treats registers as if they are exactly like variables in the source, just with a limited amount. But this isn't the case; registers are not like variables at all!
Variables |
Registers |
Unbounded size |
Bounded (usually small) size |
Unbounded number |
Very limited number |
Limited scope |
Global |
Hard to accidentally overwrite |
Clobbering happens all the time |
Yes, keeping things in registers is a hugely beneficial optimization, but it should be just that - an optimization. Because what is it that does have all of the properties of variables? Stack-allocated locals!
This rant of sorts was spurred on by the commit linked above, which revealed to me just how borked the current setup is. To be concrete, use the compiler as is on the commit above and compile with --no-eval
the following code
fun main (?: 1) = apply (id, 9)
fun apply (f: 10 -> 10, x) = f x
fun id (x: 10) = x
The expected result of this should be 9
, because it just passes 9
to the identity function through some convoluted means. But what you actually get if you link and execute is instead something like this:
corollary master * = $ link /entry:wmain /subsystem:console .\artifacts\test.lib Kernel32.lib /out:artifacts\test.exe
Microsoft (R) Incremental Linker Version 14.33.31630.0
Copyright (C) Microsoft Corporation. All rights reserved.
corollary master * = $ .\artifacts\test.exe
corollary master * = $ echo $LASTEXITCODE
663425033
That's not 9
! Is the code generator wrong? Well, kind of. See, 663425033
in hex is 278b1009
. And as it turns out, the lowest byte will always be 09
when you execute. The code generator is actually working precisely as intended. The problem lies in apply
, whose assembly code looks something like
_napply:
push rbp
mov rbp,rsp
mov rdx,rdi
mov dil,sil
call rdx
leave
ret
It expects x
in sil
(lower 8-bits of rsi
) and f
in rdi
, but f
expects its argument (a number) in dil
(lower 8-bits of rdi
), so apply
shuffles things around to make that happen. The problem is that mov dil, sil
doesn't actually clear the upper 56 bits of rdi
, so when control eventually comes back to the entry point and it passes the value in rdi
to ExitProcess
(or exit
or whatever), the only relevant stuff is in the lowest part of the register and everything else contains garbage.
I could presumably add in a bunch of extra code to make sure registers are zero when I don't use all of them, but I don't think that would actually solve anything. As mentioned, I think the problem is much more fundamental and a thorough restructuring is required to make this more maintainable, more correct, and less ad-hoc.