kavon / ghc-llvm Goto Github PK
View Code? Open in Web Editor NEWimprovements to llvm to support functional-language compilers, e.g., ghc
License: Other
improvements to llvm to support functional-language compilers, e.g., ghc
License: Other
Whether or not you emit a an x86ISD::CALL
or x86ISD::CPS_CALL
, somehow after isel is finished, we end up with a broken MBB. In particular, notice that vreg1
is not defined, and really should be vreg2
.
ISEL: Match complete!
===== Instruction selection ends:
Selected selection DAG: BB#0 'test:'
SelectionDAG has 18 nodes:
t0: ch = EntryToken
t5: i64,ch,glue = ADJCALLSTACKDOWN64 TargetConstant:i64<0>, TargetConstant:i64<0>, TargetConstant:i32<0>, t0
t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
t7: ch,glue = CopyToReg t5:1, Register:i32 %EDI, t2
t10: ch,glue = CALL64pcrel32 TargetGlobalAddress:i64<i32 (i32)* @someCallee> 0, Register:i32 %EDI, RegisterMask:Untyped, t7, t7:1
t11: i64,ch,glue = ADJCALLSTACKUP64 TargetConstant:i64<0>, TargetConstant:i64<0>, t10, t10:1
t13: i32,ch,glue = CopyFromReg t11:1, Register:i32 %EAX, t11:2
t15: i32,ch = CopyFromReg t0, Register:i32 %vreg1
t17: ch,glue = CopyToReg t13:1, Register:i32 %EAX, t15
t18: ch = RET TargetConstant:i32<0>, Register:i32 %EAX, t17, t17:1
Total amount of phi nodes to update: 0
*** MachineFunction at end of ISel ***
# Machine code for function test: IsSSA, TracksLiveness
Function Live Ins: %EDI in %vreg0
BB#0: derived from LLVM BB %0
Live Ins: %EDI
%vreg0<def> = COPY %EDI; GR32:%vreg0
ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%EDI<def> = COPY %vreg0; GR32:%vreg0
CALL64pcrel32 <ga:@someCallee>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %EDI<imp-use>, %RSP<imp-def>, %EAX<imp-def>
ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%vreg2<def> = COPY %EAX; GR32:%vreg2
%EAX<def> = COPY %vreg1; GR32:%vreg1
RET 0, %EAX
Assertion failed: (MRI->getVRegDef(reg) && "Register use before def!"), function HandleVirtRegUse, file /Users/kavon/msr/llvm-dev/src/lib/CodeGen/LiveVariables.cpp, line 133.
0 llc 0x0000000110e752cc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1 llc 0x0000000110e75759 PrintStackTraceSignalHandler(void*) + 25
2 llc 0x0000000110e724e9 llvm::sys::RunSignalHandlers() + 425
3 llc 0x0000000110e759d2 SignalHandler(int) + 354
4 libsystem_platform.dylib 0x00007fff83d3252a _sigtramp + 26
5 libsystem_platform.dylib 0x00007fff509a2e50 _sigtramp + 3435596096
6 libsystem_c.dylib 0x00007fff8c26a6df abort + 129
7 libsystem_c.dylib 0x00007fff8c231dd8 basename + 0
8 llc 0x000000010fdc875d llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock*, llvm::MachineInstr&) + 157
9 llc 0x000000010fdcb2c8 llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) + 1080
10 llc 0x000000010fdcb9d2 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int) + 930
11 llc 0x000000010fdcc204 llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) + 644
12 llc 0x000000010fe84ae1 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 449
13 llc 0x00000001103b120f llvm::FPPassManager::runOnFunction(llvm::Function&) + 399
14 llc 0x00000001103b1525 llvm::FPPassManager::runOnModule(llvm::Module&) + 117
15 llc 0x00000001103b220f (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 1967
16 llc 0x00000001103b17e6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 342
17 llc 0x00000001103b2e01 llvm::legacy::PassManager::run(llvm::Module&) + 33
18 llc 0x000000010f25ffd2 compileModule(char**, llvm::LLVMContext&) + 21938
19 llc 0x000000010f25a6d7 main + 2471
20 libdyld.dylib 0x00007fff965755ad start + 1
Stack dump:
0. Program arguments: llc -debug-only=isel intrinsic.ll
1. Running pass 'Function Pass Manager' on module 'intrinsic.ll'.
2. Running pass 'Live Variable Analysis' on function '@test'
One of the poor register choices made by LLVM for the function pointer in a tail call is due to not updating X86RegisterInfo::getGPRsForTailCall
for the GHC calling convention. the GR64_TC
register class is specialized to C conventions:
// in X86InstrInfo.td
// GPRs available for tailcall.
// It represents GR32_TC, GR64_TC or GR64_TCW64.
def ptr_rc_tailcall : PointerLikeRegClass<4>; // the 4 corresponds to the Kind in X86RegisterInfo::getPointerRegClass
// in X86RegisterInfo.td
def GR64_TC : RegisterClass<"X86", [i64], 64, (add RAX, RCX, RDX, RSI, RDI,
R8, R9, R11, RIP)>;
See if you can dig around further and try to improve other register choices made?
It seems that if you use too many float/double values in call/return, the llc
does a horrible job for LLC where it no longer eliminates the passing of undef
values around:
declare ghccc {i64, i64} @bar (i64, i64)
declare ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} @llvm.experimental.cpscall.x ({i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*, i64, i32, i16, ...)
define ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} @foo (i64 %x, i64* %y, float %a, double %b) naked {
entryBlock:
%x1 = add i64 %x, 1
%fnPtr = bitcast i64* %y to {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*
%retVals = call ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %fnPtr(i64 %x1, i64* undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef)
;;;;; WITH INTRINSIC
; %retVals = call ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} ({i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*, i64, i32, i16, ...) @llvm.experimental.cpscall.x(
; {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)* %fnPtr, i64 1337, i32 2, i16 1,
; i64 %x1, i64* %y, float undef, double undef, float undef, double undef, float undef, double undef, float undef,
; double undef, float undef, double undef, float undef, double undef, float undef)
%newX = extractvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, 2
; %newY = extractvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, 3
; %fnPtr = inttoptr i64 %newY to {i64, i64} (i64, i64)*
; %retVals2 = call ghccc {i64, i64} %fnPtr(i64 %newX, i64 %newY), !cps.retpt !{!"kungfoo", !"56"}
; %newX2 = extractvalue {i64, i64} %retVals2, 0
; %newY2 = extractvalue {i64, i64} %retVals2, 1
%diffX = fsub float %newX, 1.0
%updated = insertvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, float %diffX, 2
ret {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %updated
}
generates:
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 11
.section __TEXT,__literal4,4byte_literals
.p2align 2
LCPI0_0:
.long 3212836864 ## float -1
.section __TEXT,__text,regular,pure_instructions
.globl _foo
.p2align 4, 0x90
_foo: ## @foo
.cfi_startproc
## BB#0: ## %entryBlock
movq %r13, 40(%rsp) ## 8-byte Spill
incq %rbp
leaq 48(%rsp), %r13
callq *%r12
movq 48(%rsp), %rdx
movq 56(%rsp), %rcx
movsd 72(%rsp), %xmm8 ## xmm8 = mem[0],zero
movss 80(%rsp), %xmm9 ## xmm9 = mem[0],zero,zero,zero
movsd 88(%rsp), %xmm10 ## xmm10 = mem[0],zero
movss 96(%rsp), %xmm11 ## xmm11 = mem[0],zero,zero,zero
movsd 104(%rsp), %xmm12 ## xmm12 = mem[0],zero
movss 112(%rsp), %xmm5 ## xmm5 = mem[0],zero,zero,zero
movsd 120(%rsp), %xmm6 ## xmm6 = mem[0],zero
movss 128(%rsp), %xmm7 ## xmm7 = mem[0],zero,zero,zero
movsd 136(%rsp), %xmm0 ## xmm0 = mem[0],zero
movss 144(%rsp), %xmm1 ## xmm1 = mem[0],zero,zero,zero
movsd 152(%rsp), %xmm2 ## xmm2 = mem[0],zero
movss 160(%rsp), %xmm3 ## xmm3 = mem[0],zero,zero,zero
movss 64(%rsp), %xmm4 ## xmm4 = mem[0],zero,zero,zero
addss LCPI0_0(%rip), %xmm4
movq 40(%rsp), %rax ## 8-byte Reload
movss %xmm3, 112(%rax)
movsd %xmm2, 104(%rax)
movss %xmm1, 96(%rax)
movsd %xmm0, 88(%rax)
movss %xmm7, 80(%rax)
movsd %xmm6, 72(%rax)
movss %xmm5, 64(%rax)
movsd %xmm12, 56(%rax)
movss %xmm11, 48(%rax)
movsd %xmm10, 40(%rax)
movss %xmm9, 32(%rax)
movsd %xmm8, 24(%rax)
movq %rcx, 8(%rax)
movq %rdx, (%rax)
movss %xmm4, 16(%rax)
movq (%rax), %rcx
jmpq *%rcx ## TAILCALL
.cfi_endproc
.subsections_via_symbols
In the NCG, we see code such as
leaq _s2IX_closure$def+1(%rip), %rax
but in LLVM we see an add instuction emitted. We should find a way to get LLVM to emit the above instead of
leaq _s2IX_closure$def(%rip), %rax
addq $1, %rax
We may have to add this as an optimization to LLC, since we happen to do this a lot but other languages probably don't.
It looks like the stack protector implementation does something similar to what we want to do, which involves splitting apart blocks during LLVM IR -> SelectionDAG IR translation. To quote:
"While one can not handle multiple IR level basic blocks at the SelectionDAG Level, one can generate multiple machine basic blocks for one IR level basic block [in SelectionDAGBuilder]. This is how we handle bit tests and switches."
They have a very detailed comment describing the steps and process to do the block splitting.
In X86IselLowering.cpp, we have the following checks. Notice that we do not get sibling call optimization if we use musttail, which is silly. Change this so IsSibcall
is set to true if it is a musttail.
bool IsMustTail = CLI.CS && CLI.CS->isMustTailCall();
if (IsMustTail) {
// Force this to be a tail call. The verifier rules are enough to ensure
// that we can lower this successfully without moving the return address
// around.
isTailCall = true;
} else if (isTailCall) {
// Check if it's really possible to do a tail call.
isTailCall = IsEligibleForTailCallOptimization(Callee, CallConv,
isVarArg, SR != NotStructReturn,
MF.getFunction()->hasStructRetAttr(), CLI.RetTy,
Outs, OutVals, Ins, DAG);
// Sibcalls are automatically detected tailcalls which do not require
// ABI changes.
if (!MF.getTarget().Options.GuaranteedTailCallOpt && isTailCall)
IsSibcall = true;
As you can see below, if one of the arguments to the CPSCALL is an undef
, or in MBB speak IMPLICIT_DEF
value, we're free to eliminate that binding, which leaves the TCRETURN
referencing an unbound register.
I think the solution to this is to use the call-sequence start/end instructions to protect the physical register bindings. I have a feeling that's part of what their purpose is.
handleOverflow:
%retFromGC = call ghccc {i8**, i64} @doGC(i8** %newSP, i64 undef), !cps.retpt !0
==================================================================
BB#6: derived from LLVM BB %handleOverflow
Predecessors according to CFG: BB#5
ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%R13<def> = COPY %vreg9; GR64:%vreg9
%vreg24<def> = IMPLICIT_DEF; GR64:%vreg24
%RBP<def> = COPY %vreg24; GR64:%vreg24
CPSCALLd64 <ga:@doGC>, <regmask>, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>, %RSP<imp-def>, %R13<imp-def>, %RBP<imp-def>
ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%vreg25<def> = COPY %R13; GR64:%vreg25
%vreg26<def> = COPY %RBP; GR64:%vreg26
%vreg10<def> = COPY %vreg25; GR64:%vreg10,%vreg25
%vreg11<def> = COPY %vreg26; GR64:%vreg11,%vreg26
JMP_1 <BB#5>
Successors according to CFG: BB#5(?%)
=======================================================================
BB#6: derived from LLVM BB %handleOverflow
Predecessors according to CFG: BB#5
ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%R13<def> = COPY %vreg9; GR64:%vreg9
%vreg24<def> = IMPLICIT_DEF; GR64:%vreg24
%RBP<def> = COPY %vreg24; GR64:%vreg24
ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%vreg30<def> = LEA64r %RIP, 1, %noreg, <MCSym=LretptStandin2>, %noreg; GR64:%vreg30
MOV64mr %RBP, 1, %noreg, 52, %noreg, %vreg30<kill>; GR64:%vreg30
TCRETURNdi64 <ga:@doGC>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>
Successors according to CFG: BB#12(?%)
BB#12: EH LANDING PAD
Live Ins: %R13 %RBP
Predecessors according to CFG: BB#6
EH_LABEL <MCSym=LretptStandin2>
%vreg25<def> = COPY %R13<kill>; GR64:%vreg25
%vreg26<def> = COPY %RBP<kill>; GR64:%vreg26
%vreg10<def> = COPY %vreg25; GR64:%vreg10,%vreg25
%vreg11<def> = COPY %vreg26; GR64:%vreg11,%vreg26
JMP_1 <BB#5>
Successors according to CFG: BB#5(0x80000000 / 0x80000000 = 100.00%)
========================================================================
BB#10: derived from LLVM BB %handleOverflow
Predecessors according to CFG: BB#8
ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%R13<def> = COPY %vreg9; GR64:%vreg9
ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
%vreg30<def> = LEA64r %RIP, 1, %noreg, <MCSym=LretptStandin2>, %noreg; GR64:%vreg30
MOV64mr %RBP<undef>, 1, %noreg, 52, %noreg, %vreg30; GR64:%vreg30
TCRETURNdi64 <ga:@doGC>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>
Successors according to CFG: BB#11(?%)
========================================================================
*** Bad machine code: Using an undefined physical register ***
- function: foo
- basic block: BB#10 handleOverflow (0x7fcb2b8153c8)
- instruction: TCRETURNdi64
- operand 4: %RBP<imp-use>
LLVM ERROR: Found 1 machine code errors.
spectral/hartel/typecheck
segfaults due to a jump to an address that is not code. This program will fault even if compiled without GHC optimizations, and the failure happens regularly (only input 600 tested so far).
LBB247_10: ## in Loop: Header=BB247_5 Depth=1
pushq %rax
movq %rbx, %rax
LBB247_5: ## %cbJG
## =>This Inner Loop Header: Depth=1
movq 8(%rbp), %r14
movq 16(%rbp), %rcx
movl %eax, %edx
andl $7, %edx
addq $-2, %rdx
cmpq $3, %rdx
ja LBB247_7
## BB#6: ## %cbJG
## in Loop: Header=BB247_5 Depth=1
movslq (%rsi,%rdx,4), %rdx
addq %rsi, %rdx
jmpq *%rdx # <<<< sometimes a bad value
What is odd here is that we have an indirect jump was not marked as a TAILCALL
by LLVM... and the computation of the address is quite unusual. Currently, I'm trying to figure out how we even get this code in the _rDQ_info$def
function.
IMO, the CSE happening in isel as a side-effect of the design of SDNode is not a good idea. It causes the store/load of a constant across a call, when it's better to rematieralize it.
It seems something unusual is broken, so almost none of the nofib
benchmarks compile right now.
I'm going to guess it has to do with generating symbol aliases in LLVM IR, since I see a few things broken with Mac like this as well.
==nofib== bernouilli: time to link bernouilli follows...
Main.o:(.text+0x240): undefined reference to `_S6L1_srt'
Main.o:(.text+0x3b0): undefined reference to `_S6L1_srt'
Main.o:(.text+0x560): undefined reference to `_S6L1_srt'
Main.o:(.text+0x580): undefined reference to `_S6L1_srt'
Main.o:(.text+0x5a0): undefined reference to `_S6L1_srt'
Main.o:(.text+0x7d0): more undefined references to `_S6L1_srt' follow
Main.o:(.text+0x1df0): undefined reference to `_u7tZ_srtd$def'
Main.o:(.text+0x1e20): undefined reference to `_u7u0_srtd$def'
Main.o:(.text+0x1f20): undefined reference to `_S6L1_srt'
Main.o:(.text+0x1f80): undefined reference to `_u7u1_srtd$def'
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
<<ghc: 17518096 bytes, 8 GCs, 572628/943064 avg/max bytes residency (2 samples), 30M in use, 0.001 INIT (0.001 elapsed), 0.023 MUT (0.406 elapsed), 0.014 GC (0.014 elapsed) :ghc>>
../../mk/target.mk:46: recipe for target 'bernouilli' failed
make[1]: *** [bernouilli] Error 1
Failed making all in bernouilli: 1
../mk/ghc-recurse.mk:65: recipe for target 'all' failed
make: *** [all] Error 1
kavon@tormenta:~/haskell/alt-ghc/nofib/imaginary$
Right now we emit:
LCAFE0:
movq (%r13), %rax
jmpq *%rax ## TAILCALL
for a CPS_RET
. We can save an instruction if we add a special pattern as we do for CPSCALLm64
that looks for the load, so that we can emit a TCRETURNmi64
instead, and get:
LCAFE0:
jmpq *(%r13) ## TAILCALL
It turns out that "operand bundles", which are essentially metadata for calls, are illegal to drop by the optimizer. Thus, with a combination of a special lowering for the GHC calling convention, and these bundles to provide the additional necessary information down the pipeline, we can remove the need for adding an intrinsic to the LLVM IR entirely:
call void @y() [ "ghc_id"(i64 123910421), "ghc_spArgNum"(i32 3) ]
See ~6:14 in Hal's talk here: https://www.youtube.com/watch?v=jII0AcgU_5c
in imaginary/bernouilli
/Users/kavon/msr/ghc/inplace/bin/ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -fllvm -rtsopts -c Main.hs -o Main.o
Main.hs:26:21: warning: [-Wtabs]
Tab character found here.
Please use spaces instead.
|
26 | bernoulli 1 = -(1%2)
| ^^^^
WARNING: file compiler/simplCore/OccurAnal.hs, line 2163 Just 1 []
<unknown>:0:0: error: error: invalid symbol redefinition
Assertion failed: (MBB->getParent() == Entry.MF && "MachineFunction Parent changed"), function getAddrLabelSymbolToEmit, file /Users/kavon/msr/llvm-dev/src/lib/CodeGen/MachineModuleInfo.cpp, line 147.
0 llc 0x000000010f9c29dc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1 llc 0x000000010f9c2e69 PrintStackTraceSignalHandler(void*) + 25
2 llc 0x000000010f9bfbc9 llvm::sys::RunSignalHandlers() + 425
3 llc 0x000000010f9c30e2 SignalHandler(int) + 354
4 libsystem_platform.dylib 0x00007fff9056b52a _sigtramp + 26
5 libsystem_platform.dylib 0x00007fae9942fe48 _sigtramp + 149702968
6 libsystem_c.dylib 0x00007fff90aac6df abort + 129
7 libsystem_c.dylib 0x00007fff90a73dd8 basename + 0
8 llc 0x000000010ea05b5d llvm::MMIAddrLabelMap::getAddrLabelSymbolToEmit(llvm::MachineBasicBlock*, llvm::Twine*) + 253
9 llc 0x000000010ea06c0f llvm::MachineModuleInfo::getAddrLabelSymbolsWithRequest(llvm::MachineBasicBlock const*, llvm::Twine*) + 175
10 llc 0x000000010e09c7c5 llvm::MachineModuleInfo::getAddrLabelSymbol(llvm::MachineBasicBlock const*, llvm::Twine*) + 37
11 llc 0x000000010dfc60b1 llvm::X86TargetLowering::EmitCPSCall(llvm::MachineInstr&, llvm::MachineBasicBlock*, unsigned int) const + 4625
12 llc 0x000000010dfc8139 llvm::X86TargetLowering::EmitInstrWithCustomInserter(llvm::MachineInstr&, llvm::MachineBasicBlock*) const + 5689
13 llc 0x000000010e844a15 (anonymous namespace)::ExpandISelPseudos::runOnMachineFunction(llvm::MachineFunction&) + 245
14 llc 0x000000010e9c7491 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 449
15 llc 0x000000010eefd4df llvm::FPPassManager::runOnFunction(llvm::Function&) + 399
16 llc 0x000000010eefd7f5 llvm::FPPassManager::runOnModule(llvm::Module&) + 117
17 llc 0x000000010eefe4df (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 1967
18 llc 0x000000010eefdab6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 342
19 llc 0x000000010eeff0d1 llvm::legacy::PassManager::run(llvm::Module&) + 33
20 llc 0x000000010ddfaa8a compileModule(char**, llvm::LLVMContext&) + 23146
21 llc 0x000000010ddf4c7e main + 2910
22 libdyld.dylib 0x00007fff946f75ad start + 1
Stack dump:
0. Program arguments: llc -O2 -disable-machine-cse -relocation-model=pic /var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_3.bc -o /var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_4.lm_s --enable-tbaa=true -mattr=+sse2
1. Running pass 'Function Pass Manager' on module '/var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_3.bc'.
2. Running pass 'Expand ISel Pseudo-instructions' on function '@"r6Gi_info$def"'
Originally found in imaginary/integrate
, but this invalid RA isn't hit by that particular program. It's probably the cause of other invalid instruction bugs though:
# AFTER LLC
LBB0_5: ## %c6sZ
## in Loop: Header=BB0_4 Depth=1
movq $24, 904(%r13)
movq %rax, %r12
leaq L7133701809754890517_1(%rip), %rax
movq %rax, (%rbp)
jmp _stg_gc_pp ## TAILCALL
LBB0_6: ## in Loop: Header=BB0_4 Depth=1
L7133701809754890517_1:
.p2align 4, 0x90
LBB0_4: ## =>This Inner Loop Header: Depth=1
movq %r12, %rax
addq $24, %rax
cmpq %rax, 856(%r13)
jb LBB0_5
# AFTER MANGLER
LBB0_5: ## %c6sZ
## in Loop: Header=BB0_4 Depth=1
movq $24, 904(%r13)
movq %rax, %r12
leaq L7133701809754890517_1(%rip), %rax
movq %rax, (%rbp)
jmp _stg_gc_pp ## TAILCALL
LBB0_6: ## in Loop: Header=BB0_4 Depth=1
.p2align 4, 0x90
.quad 0
.quad 30
L7133701809754890517_1:
.p2align 4, 0x90
LBB0_4: ## =>This Inner Loop Header: Depth=1
movq %r12, %rax
addq $24, %rax
cmpq %rax, 856(%r13)
jb LBB0_5
# AFTER DUMPING EXECUTABLE
0000000100001781 movq $0x18, %cs:0x388(%r13)
000000010000178d movq %rax, %r12
0000000100001790 leaq 0x19(%rip), %rax
0000000100001797 movq %rax, (%rbp)
000000010000179b jmp 0x1000d8538
00000001000017a0 addb %al, (%rax)
00000001000017a2 addb %al, (%rax)
00000001000017a4 addb %al, (%rax)
00000001000017a6 addb %al, (%rax)
00000001000017a8 .byte 0x1e #bad opcode
00000001000017a9 addb %al, (%rax)ย
00000001000017ab addb %al, (%rax)
00000001000017ad addb %al, (%rax)
00000001000017af addb %cl, -0x20(%rcx,%rcx,4)
00000001000017b3 addq $0x18, %rax
00000001000017b7 cmpq %rax, 0x358(%r13)
00000001000017be jb 0x100001782
_stg_gc_pp:
00000001000d8538 leaq 0x29(%rip), %rax
# WHEN TRYING TO FIND CORRECT RA:
# NOTE: the assembly above is missing the MOV because
# the disassembler doesn't handle the TNTC info properly.
#
# movq has encoding: REX.W + 8B /r
# REX prefixes are 1 byte, and the /r is apparently 1 byte:
(lldb) di -s 0x1000017a9 -c 5
integrate`Main_ints_info:
0x1000017a9 <+145>: addb %al, (%rax)
0x1000017ab <+147>: addb %al, (%rax)
0x1000017ad <+149>: addb %al, (%rax)
0x1000017af <+151>: addb %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq $0x18, %rax
(lldb) di -s 0x1000017b3 -c 5
integrate`Main_ints_info:
0x1000017b3 <+155>: addq $0x18, %rax
0x1000017b7 <+159>: cmpq %rax, 0x358(%r13)
0x1000017be <+166>: jb 0x100001782 ; <+106>
0x1000017c0 <+168>: leaq 0xb58f1(%rip), %rcx ; ghczmprim_GHCziTypes_ZC_con_info
0x1000017c7 <+175>: movq %rcx, 0x8(%r12)
(lldb) di -s 0x1000017AF -c 5
integrate`Main_ints_info:
0x1000017af <+151>: addb %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq $0x18, %rax
0x1000017b7 <+159>: cmpq %rax, 0x358(%r13)
0x1000017be <+166>: jb 0x100001782 ; <+106>
0x1000017c0 <+168>: leaq 0xb58f1(%rip), %rcx ; ghczmprim_GHCziTypes_ZC_con_info
(lldb) di -s 0x1000017AE -c 5
integrate`Main_ints_info:
0x1000017ae <+150>: addb %al, (%rax)
0x1000017b0 <+152>: movq %r12, %rax
0x1000017b3 <+155>: addq $0x18, %rax
0x1000017b7 <+159>: cmpq %rax, 0x358(%r13)
0x1000017be <+166>: jb 0x100001782 ; <+106>
(lldb) di -s 0x1000017AD -c 5
integrate`Main_ints_info:
0x1000017ad <+149>: addb %al, (%rax)
0x1000017af <+151>: addb %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq $0x18, %rax
0x1000017b7 <+159>: cmpq %rax, 0x358(%r13)
0x1000017be <+166>: jb 0x100001782 ; <+106>
(lldb) di -s 0x1000017b0 -c 5
integrate`Main_ints_info:
0x1000017b0 <+152>: movq %r12, %rax
0x1000017b3 <+155>: addq $0x18, %rax
0x1000017b7 <+159>: cmpq %rax, 0x358(%r13)
0x1000017be <+166>: jb 0x100001782 ; <+106>
0x1000017c0 <+168>: leaq 0xb58f1(%rip), %rcx ; ghczmprim_GHCziTypes_ZC_con_info
(lldb)
# 0x1000017b0 is the proper return address, not 0x1000017a9
a great example of this seems to be the @llvm.addressofreturnaddress instrinsic. here are some places you can start digging around:
โค ag LowerADDROFRETURNADDR
lib/Target/X86/X86ISelLowering.h
1173: SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
lib/Target/X86/X86ISelLowering.cpp
20612:SDValue X86TargetLowering::LowerADDROFRETURNADDR(SDValue Op,
23827: case ISD::ADDROFRETURNADDR: return LowerADDROFRETURNADDR(Op, DAG);
kavon@cronus:~/m/l/src|masterโ
โค ag ADDROFRETURNADDR
include/llvm/CodeGen/ISDOpcodes.h
73: FRAMEADDR, RETURNADDR, ADDROFRETURNADDR,
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1015: case ISD::ADDROFRETURNADDR:
lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
103: case ISD::ADDROFRETURNADDR: return "ADDROFRETURNADDR";
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4800: setValue(&I, DAG.getNode(ISD::ADDROFRETURNADDR, sdl,
lib/Target/X86/X86ISelLowering.h
1173: SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;
lib/Target/X86/X86ISelLowering.cpp
20612:SDValue X86TargetLowering::LowerADDROFRETURNADDR(SDValue Op,
23827: case ISD::ADDROFRETURNADDR: return LowerADDROFRETURNADDR(Op, DAG);
Example from GHC's testsuite/tests/typecheck/should_compile/T7408.hs
%ln1tv = getelementptr inbounds i64, i64* %ln1sP, i32 107
%ln1tT = call ghccc @llvm.experimental.cpscall(
{ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64 } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1tM, i64 7133701809754871206, i32 0, i16 1,
i64* %ln1sP, .. ETC ..) ; <<<
Instcombine turns the base of the GEP, %ln1sP
, into an i8*
(with a new name), which no longer matches up with the callee's type when we check it during verification.
%ln1tv = getelementptr inbounds i8, i8* %ln1sM, i64 856
; ...
; ...
i8* %ln1sM, .. ETC ..) ; <<< callee expected i64*
Instead of teaching instcombine to update the callee, it's easier to change the type of equality applied to parameters vs args, to allow for arguments that can be losslessly bitcasted to the parameter type.
If some calling convention actually relies on the differences of pointee type, this suggested fix will not work. AFAIK pointers will become opaque in LLVM soon anyways.
At the moment, cpscall is only supported on x86 anyways, so modify the following:
void TargetPassConfig::addISelPrepare() {
addPreISel();
addPass(createCPSCallPrepPass());
// ...
}
so that we add the pass in the target specific PreISel function, as discussed in #11
Check out the code (at the very bottom of this post) in imaginary/integrate
. Part of the main loop in the calculation makes many non-tail calls, all of which return and perform a heap test. The only difference in the below is the block branched to after returning from the GC.
An idea to identify the redundancy and replace these blocks with something like this:
%addr = blockaddress(@foo, %block1)
br label %commonBlock
commonBlock:
; call and return from GC
indirectbr label %addr [label %block1], [label %block2] ; ...
Main challenges:
Potential Solution:
One nice way of designing it might be to say that the intrinsic takes a block address (label) as an argument. Then the intrinsic returns a token, from which the return value and return target (label) are returned. Then, we can perform an indirectbr on the return target. Since an indirectbr must include all possible targets, if the target list includes only one label, we can simply turn that into a standard unconditional br. Otherwise, we can perform the combining optimization?
I think the main point here is that, we can solve the problem in commonBlock
in the example above where %addr
is live across the call by simply saying that we can "pass" the block address through the intrinsic, using the token thing.
The ugly code below, followed by its output:
c6AZ: ; preds = %c6zU
%ln6LF = getelementptr inbounds i64, i64* %Base_Var.7, i64 113
store i64 16, i64* %ln6LF, align 8, !tbaa !4
%ln6LN = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890954, i32 0, i16 1, i64* %Base_Var.7, i64* %Sp_Var.7, i64* %ln6JH, i64 %R1_Var.7, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.7) #2
%ln6LO = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 0
%ln6LP = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 1
%ln6LQ = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 2
%ln6LR = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 3
%ln6LS = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 9
br label %c6zU
c6AW: ; preds = %c6zM
%ln6LU = getelementptr inbounds i64, i64* %Base_Var.6, i64 113
store i64 16, i64* %ln6LU, align 8, !tbaa !4
%ln6M2 = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890946, i32 0, i16 1, i64* %Base_Var.6, i64* %Sp_Var.6, i64* %ln6IH, i64 %R1_Var.6, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.6) #2
%ln6M3 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 0
%ln6M4 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 1
%ln6M5 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 2
%ln6M6 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 3
%ln6M7 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 9
br label %c6zM
c6AT: ; preds = %c6zB
%ln6M9 = getelementptr inbounds i64, i64* %Base_Var.5, i64 113
store i64 16, i64* %ln6M9, align 8, !tbaa !4
%ln6Mh = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890935, i32 0, i16 1, i64* %Base_Var.5, i64* %Sp_Var.5, i64* %ln6HE, i64 %R1_Var.5, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.5) #2
%ln6Mi = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 0
%ln6Mj = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 1
%ln6Mk = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 2
%ln6Ml = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 3
%ln6Mm = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 9
br label %c6zB
c6AQ: ; preds = %c6zq
%ln6Mo = getelementptr inbounds i64, i64* %Base_Var.4, i64 113
store i64 16, i64* %ln6Mo, align 8, !tbaa !4
%ln6Mw = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890924, i32 0, i16 1, i64* %Base_Var.4, i64* %Sp_Var.4, i64* %ln6GD, i64 %R1_Var.4, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.4) #2
%ln6Mx = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 0
%ln6My = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 1
%ln6Mz = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 2
%ln6MA = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 3
%ln6MB = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 9
br label %c6zq
c6AN: ; preds = %c6zf
%ln6MD = getelementptr inbounds i64, i64* %Base_Var.3, i64 113
store i64 16, i64* %ln6MD, align 8, !tbaa !4
%ln6ML = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890913, i32 0, i16 1, i64* %Base_Var.3, i64* %Sp_Var.3, i64* %ln6FG, i64 %R1_Var.3, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.3) #2
%ln6MM = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 0
%ln6MN = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 1
%ln6MO = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 2
%ln6MP = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 3
%ln6MQ = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 9
br label %c6zf
c6AK: ; preds = %c6z4
%ln6MS = getelementptr inbounds i64, i64* %Base_Var.2, i64 113
store i64 16, i64* %ln6MS, align 8, !tbaa !4
%ln6N0 = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890902, i32 0, i16 1, i64* %Base_Var.2, i64* %Sp_Var.2, i64* %ln6EF, i64 %R1_Var.2, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.2) #2
%ln6N1 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 0
%ln6N2 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 1
%ln6N3 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 2
%ln6N4 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 3
%ln6N5 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 9
br label %c6z4
c6AH: ; preds = %c6yT
%ln6N7 = getelementptr inbounds i64, i64* %Base_Var.1, i64 113
store i64 16, i64* %ln6N7, align 8, !tbaa !4
%ln6Nf = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890891, i32 0, i16 1, i64* %Base_Var.1, i64* %Sp_Var.1, i64* %ln6DC, i64 %R1_Var.1, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.1) #2
%ln6Ng = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 0
%ln6Nh = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 1
%ln6Ni = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 2
%ln6Nj = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 3
%ln6Nk = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 9
br label %c6yT
c6AE: ; preds = %c6yE
%ln6Nm = getelementptr inbounds i64, i64* %Base_Var.0, i64 113
store i64 16, i64* %ln6Nm, align 8, !tbaa !4
%ln6Nu = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890876, i32 0, i16 1, i64* %Base_Var.0, i64* %Sp_Var.0, i64* %ln6Ct, i64 %R1_Var.0, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.0) #2
%ln6Nv = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 0
%ln6Nw = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 1
%ln6Nx = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 2
%ln6Ny = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 3
%ln6Nz = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 9
br label %c6yE
.p2align 4, 0x90
LBB4_29: ## %c6AZ
## in Loop: Header=BB4_26 Depth=1
movq $16, 904(%r13)
movq %rax, %r12
leaq L7133701809754890954_1(%rip), %rax
movq %rax, (%rbp)
jmp _stg_gc_unpt_r1 ## TAILCALL
# ...
.p2align 4, 0x90
LBB4_31: ## %c6AW
## in Loop: Header=BB4_23 Depth=1
movq $16, 904(%r13)
movq %rax, %r12
leaq L7133701809754890946_1(%rip), %rax
movq %rax, (%rbp)
jmp _stg_gc_unpt_r1 ## TAILCALL
# ...
.p2align 4, 0x90
LBB4_33: ## %c6AT
## in Loop: Header=BB4_20 Depth=1
movq $16, 904(%r13)
movq %rax, %r12
leaq L7133701809754890935_1(%rip), %rax
movq %rax, (%rbp)
jmp _stg_gc_unpt_r1 ## TAILCALL
"The register %R14 needs to be live in to BB#1, but is missing from the live-in list.
LLVM ERROR: Invalid global physical register"
When compiling x2n1
.
From LiveRangeCalc::findReachingDefs
on the given code.
176B BB#1: derived from LLVM BB %c71x
Predecessors according to CFG: BB#0
192B ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
208B %XMM0<def> = COPY %vreg252; FR64:%vreg252
224B CALL64pcrel32 <ga:@rintDouble>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %XMM0<imp-use,kill>, %RSP<imp-def>, %XMM0<imp-def>
240B ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
256B %vreg255<def> = COPY %XMM0<kill>; FR64:%vreg255
288B %vreg256<def,tied1> = ADD64ri8 %vreg256<tied0>, -8, %EFLAGS<imp-def,dead>; GR64:%vreg256
304B ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
320B %R13<def> = COPY %vreg242; GR64:%vreg242
336B %RBP<def> = COPY %vreg256; GR64:%vreg256
352B %R12<def> = COPY %vreg244; GR64:%vreg244
368B %RBX<def> = COPY %vreg245; GR64:%vreg245
384B %R15<def> = COPY %vreg251; GR64:%vreg251
400B %XMM1<def> = COPY %vreg255; FR64:%vreg255
416B ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
432B %vreg508<def> = LEA64r %RIP, 1, %noreg, <MCSym=L7133701809754892656_0>, %noreg; GR64:%vreg508
448B MOV64mr %RBP, 1, %noreg, 0, %noreg, %vreg508; GR64:%vreg508
464B TCRETURNdi64 <ga:@integerzmgmp_GHCziIntegerziType_decodeDoubleInteger_info>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>, %R12<imp-use>, %RBX<imp-use>, %R14<imp-use,undef>, %RSI<imp-use,undef>, %RDI<imp-use,undef>, %R8<imp-use,undef>, %R9<imp-use,undef>, %R15<imp-use>, %XMM1<imp-use>
Successors according to CFG: BB#2(?%)
480B BB#2: ADDRESS TAKEN
Live Ins: %R13 %RBP %R12 %RBX %R14 %RSI %RDI %R8 %R9 %R15 %XMM1 %XMM2 %XMM3 %XMM4 %XMM5 %XMM6 %XMM8 %XMM9 %XMM10 %XMM11 %XMM12 %XMM13
Predecessors according to CFG: BB#1
496B %vreg22<def> = COPY %R13; GR64:%vreg22
512B %vreg376<def> = COPY %RBP; GR64:%vreg376
528B %vreg24<def> = COPY %R12; GR64:%vreg24
544B %vreg25<def> = COPY %RBX; GR64:%vreg25
560B %vreg26<def> = COPY %R14<kill>; GR64:%vreg26
576B %vreg27<def> = COPY %R15; GR64:%vreg27
Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)
need to add checks for well-formed cpscalls in the verifier pass of the following properties:
i64*
, or at least, of pointer type.In the example below, the constant global's address is spilled and reloaded from the C stack across this call, which we should not have happen.
; ModuleID = 'spill_nt.ll'
source_filename = "spill_nt.ll"
@ghczmprim_GHCziTypes_True_closure = external global i8
declare ghccc { i64, i64* } @bar(i64, i64*)
; Function Attrs: naked
define ghccc { i64, i64* } @foo(i64 %__x, i64* %__y) #0 {
cazq:
%lnbeg = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
%lnbeh = add i64 %lnbeg, 2
%retVals = call ghccc { i64, i64* } ({ i64, i64* } (i64, i64*)*, i64, i32, i16, ...)
@llvm.experimental.cpscall.sl_i64p0i64s.p0f_sl_i64p0i64si64p0i64f(
{ i64, i64* } (i64, i64*)* @bar, i64 1337, i32 2, i16 1,
i64 %lnbeh, i64* %__y)
%lnbeg2 = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
%lnbeh2 = add i64 %lnbeg2, 2
%updated = insertvalue { i64, i64* } %retVals, i64 %lnbeh2, 0
ret { i64, i64* } %updated
}
declare ghccc { i64, i64* }
@llvm.experimental.cpscall.sl_i64p0i64s.p0f_sl_i64p0i64si64p0i64f(
{ i64, i64* } (i64, i64*)*, i64, i32, i16, ...)
attributes #0 = { naked }
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 11
.globl _foo
.p2align 4, 0x90
_foo: ## @foo
.cfi_startproc
## BB#0: ## %cazq
movq _ghczmprim_GHCziTypes_True_closure@GOTPCREL(%rip), %r13
addq $2, %r13
movq %r13, (%rsp) ## 8-byte Spill
leaq L1337_0(%rip), %rax
movq %rax, 2(%rbp)
jmp _bar ## TAILCALL
LBB0_1:
L1337_0:
movq (%rsp), %r13 ## 8-byte Reload
movq (%r13), %rax
jmpq *%rax ## TAILCALL
.cfi_endproc
.subsections_via_symbols
As seen in #11 , CSE for constants is something that is valid to do across a vanilla call in LLVM.
Consider the following program:
declare void @foo(i64)
@extern_glob = external global i8
define void @bar() {
%x_addr = ptrtoint i8* @extern_glob to i64
%x = add i64 %x_addr, 2
call void @foo(i64 %x)
%y_addr = ptrtoint i8* @extern_glob to i64
%y = add i64 %y_addr, 2
call void @foo(i64 %y)
ret void
}
Currently, LLVM's optimizations, like early-cse
or gvn
always inline the address calculation like so:
define void @bar() {
call void @foo(i64 add (i64 ptrtoint (i8* @extern_glob to i64), i64 2))
call void @foo(i64 add (i64 ptrtoint (i8* @extern_glob to i64), i64 2))
ret void
}
But, it's valid to also produce the following:
define void @bar() {
%x_addr = ptrtoint i8* @extern_glob to i64
%x = add i64 %x_addr, 2
call void @foo(i64 %x)
call void @foo(i64 %x)
ret void
}
If the calls were specifically cpscall
s, this type of CSE introduces a value that is live across the cpscall
, which will be allocated an LLVM stack slot.
Questions:
cpscallprep
to enforce our requirement?We can do this for all backends that support tail calls by modifying this function:
SelectionDAGBuilder::LowerCallTo
In here, we basically just want to do,
if( callingConv == ghccc )
isTailCall = true;
after the "isInTailPosition" check because that will be a no. I think after that we're okay because that's the only time we check whether the call intruction is followed by a ret?
Michal brought up the existing @llvm.invariant
intrinsics for marking a region of memory during a section of a block that we know is not mutable.
I think this or something like it would help the LLVM optimizer gain information during analysis.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.