kavon / ghc-llvm Goto Github PK

improvements to llvm to support functional-language compilers, e.g., ghc

License: Other

CMake 0.23% Shell 0.05% Go 0.08% C++ 39.69% OCaml 0.19% Python 0.41% C 0.22% SourcePawn 0.01% Objective-C 0.26% LLVM 43.76% Mirah 1.16% Assembly 13.90% Roff 0.01% Logos 0.01% Perl 0.02% Emacs Lisp 0.01% CSS 0.01% Batchfile 0.01% Vim Script 0.01%

compilers ghc haskell llvm

ghc-llvm's People

Contributors

Stargazers

Watchers

ghc-llvm's Issues

CPSCall Intrinsic Selection

Whether or not you emit a an x86ISD::CALL or x86ISD::CPS_CALL, somehow after isel is finished, we end up with a broken MBB. In particular, notice that vreg1 is not defined, and really should be vreg2.

ISEL: Match complete!
===== Instruction selection ends:
Selected selection DAG: BB#0 'test:'
SelectionDAG has 18 nodes:
  t0: ch = EntryToken
    t5: i64,ch,glue = ADJCALLSTACKDOWN64 TargetConstant:i64<0>, TargetConstant:i64<0>, TargetConstant:i32<0>, t0
    t2: i32,ch = CopyFromReg t0, Register:i32 %vreg0
  t7: ch,glue = CopyToReg t5:1, Register:i32 %EDI, t2
  t10: ch,glue = CALL64pcrel32 TargetGlobalAddress:i64<i32 (i32)* @someCallee> 0, Register:i32 %EDI, RegisterMask:Untyped, t7, t7:1
  t11: i64,ch,glue = ADJCALLSTACKUP64 TargetConstant:i64<0>, TargetConstant:i64<0>, t10, t10:1
    t13: i32,ch,glue = CopyFromReg t11:1, Register:i32 %EAX, t11:2
    t15: i32,ch = CopyFromReg t0, Register:i32 %vreg1
  t17: ch,glue = CopyToReg t13:1, Register:i32 %EAX, t15
  t18: ch = RET TargetConstant:i32<0>, Register:i32 %EAX, t17, t17:1


Total amount of phi nodes to update: 0
*** MachineFunction at end of ISel ***
# Machine code for function test: IsSSA, TracksLiveness
Function Live Ins: %EDI in %vreg0

BB#0: derived from LLVM BB %0
    Live Ins: %EDI
	%vreg0<def> = COPY %EDI; GR32:%vreg0
	ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%EDI<def> = COPY %vreg0; GR32:%vreg0
	CALL64pcrel32 <ga:@someCallee>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %EDI<imp-use>, %RSP<imp-def>, %EAX<imp-def>
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg2<def> = COPY %EAX; GR32:%vreg2
	%EAX<def> = COPY %vreg1; GR32:%vreg1
	RET 0, %EAX

Assertion failed: (MRI->getVRegDef(reg) && "Register use before def!"), function HandleVirtRegUse, file /Users/kavon/msr/llvm-dev/src/lib/CodeGen/LiveVariables.cpp, line 133.
0  llc                      0x0000000110e752cc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1  llc                      0x0000000110e75759 PrintStackTraceSignalHandler(void*) + 25
2  llc                      0x0000000110e724e9 llvm::sys::RunSignalHandlers() + 425
3  llc                      0x0000000110e759d2 SignalHandler(int) + 354
4  libsystem_platform.dylib 0x00007fff83d3252a _sigtramp + 26
5  libsystem_platform.dylib 0x00007fff509a2e50 _sigtramp + 3435596096
6  libsystem_c.dylib        0x00007fff8c26a6df abort + 129
7  libsystem_c.dylib        0x00007fff8c231dd8 basename + 0
8  llc                      0x000000010fdc875d llvm::LiveVariables::HandleVirtRegUse(unsigned int, llvm::MachineBasicBlock*, llvm::MachineInstr&) + 157
9  llc                      0x000000010fdcb2c8 llvm::LiveVariables::runOnInstr(llvm::MachineInstr&, llvm::SmallVectorImpl<unsigned int>&) + 1080
10 llc                      0x000000010fdcb9d2 llvm::LiveVariables::runOnBlock(llvm::MachineBasicBlock*, unsigned int) + 930
11 llc                      0x000000010fdcc204 llvm::LiveVariables::runOnMachineFunction(llvm::MachineFunction&) + 644
12 llc                      0x000000010fe84ae1 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 449
13 llc                      0x00000001103b120f llvm::FPPassManager::runOnFunction(llvm::Function&) + 399
14 llc                      0x00000001103b1525 llvm::FPPassManager::runOnModule(llvm::Module&) + 117
15 llc                      0x00000001103b220f (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 1967
16 llc                      0x00000001103b17e6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 342
17 llc                      0x00000001103b2e01 llvm::legacy::PassManager::run(llvm::Module&) + 33
18 llc                      0x000000010f25ffd2 compileModule(char**, llvm::LLVMContext&) + 21938
19 llc                      0x000000010f25a6d7 main + 2471
20 libdyld.dylib            0x00007fff965755ad start + 1
Stack dump:
0.	Program arguments: llc -debug-only=isel intrinsic.ll 
1.	Running pass 'Function Pass Manager' on module 'intrinsic.ll'.
2.	Running pass 'Live Variable Analysis' on function '@test'

register choices for tail calls

One of the poor register choices made by LLVM for the function pointer in a tail call is due to not updating X86RegisterInfo::getGPRsForTailCall for the GHC calling convention. the GR64_TC register class is specialized to C conventions:

// in X86InstrInfo.td
// GPRs available for tailcall.
// It represents GR32_TC, GR64_TC or GR64_TCW64.
def ptr_rc_tailcall : PointerLikeRegClass<4>;  // the 4 corresponds to the Kind in X86RegisterInfo::getPointerRegClass

// in X86RegisterInfo.td
def GR64_TC   : RegisterClass<"X86", [i64], 64, (add RAX, RCX, RDX, RSI, RDI,
                                                     R8, R9, R11, RIP)>;

See if you can dig around further and try to improve other register choices made?

GHCC float register conventions

It seems that if you use too many float/double values in call/return, the llc does a horrible job for LLC where it no longer eliminates the passing of undef values around:


declare ghccc {i64, i64} @bar (i64, i64)

declare ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} @llvm.experimental.cpscall.x ({i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*, i64, i32, i16, ...)

define ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} @foo (i64 %x, i64* %y, float %a, double %b) naked {
entryBlock:
  %x1 = add i64 %x, 1
  %fnPtr = bitcast i64* %y to {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*

  %retVals = call ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %fnPtr(i64 %x1, i64* undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef, double undef, float undef)
  
  ;;;;; WITH INTRINSIC
  ; %retVals = call ghccc {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} ({i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)*, i64, i32, i16, ...) @llvm.experimental.cpscall.x(
  ;                   {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} (i64, i64*, float, double, float, double, float, double, float, double, float, double, float, double, float)* %fnPtr, i64 1337, i32 2, i16 1,
  ;                   i64 %x1, i64* %y, float undef, double undef, float undef, double undef, float undef, double undef, float undef,
  ;                   double undef, float undef, double undef, float undef, double undef, float undef)
  
  %newX = extractvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, 2
  ; %newY = extractvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, 3

  ; %fnPtr = inttoptr i64 %newY to {i64, i64} (i64, i64)*
  ; %retVals2 = call ghccc {i64, i64} %fnPtr(i64 %newX, i64 %newY), !cps.retpt !{!"kungfoo", !"56"}
  
  ; %newX2 = extractvalue {i64, i64} %retVals2, 0
  ; %newY2 = extractvalue {i64, i64} %retVals2, 1

  %diffX = fsub float %newX, 1.0
  %updated = insertvalue {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %retVals, float %diffX, 2
  
  ret {i64, i64, float, double, float, double, float, double, float, double, float, double, float, double, float} %updated
}

generates:

	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 11
	.section	__TEXT,__literal4,4byte_literals
	.p2align	2
LCPI0_0:
	.long	3212836864              ## float -1
	.section	__TEXT,__text,regular,pure_instructions
	.globl	_foo
	.p2align	4, 0x90
_foo:                                   ## @foo
	.cfi_startproc
## BB#0:                                ## %entryBlock
	movq	%r13, 40(%rsp)          ## 8-byte Spill
	incq	%rbp
	leaq	48(%rsp), %r13
	callq	*%r12
	movq	48(%rsp), %rdx
	movq	56(%rsp), %rcx
	movsd	72(%rsp), %xmm8         ## xmm8 = mem[0],zero
	movss	80(%rsp), %xmm9         ## xmm9 = mem[0],zero,zero,zero
	movsd	88(%rsp), %xmm10        ## xmm10 = mem[0],zero
	movss	96(%rsp), %xmm11        ## xmm11 = mem[0],zero,zero,zero
	movsd	104(%rsp), %xmm12       ## xmm12 = mem[0],zero
	movss	112(%rsp), %xmm5        ## xmm5 = mem[0],zero,zero,zero
	movsd	120(%rsp), %xmm6        ## xmm6 = mem[0],zero
	movss	128(%rsp), %xmm7        ## xmm7 = mem[0],zero,zero,zero
	movsd	136(%rsp), %xmm0        ## xmm0 = mem[0],zero
	movss	144(%rsp), %xmm1        ## xmm1 = mem[0],zero,zero,zero
	movsd	152(%rsp), %xmm2        ## xmm2 = mem[0],zero
	movss	160(%rsp), %xmm3        ## xmm3 = mem[0],zero,zero,zero
	movss	64(%rsp), %xmm4         ## xmm4 = mem[0],zero,zero,zero
	addss	LCPI0_0(%rip), %xmm4
	movq	40(%rsp), %rax          ## 8-byte Reload
	movss	%xmm3, 112(%rax)
	movsd	%xmm2, 104(%rax)
	movss	%xmm1, 96(%rax)
	movsd	%xmm0, 88(%rax)
	movss	%xmm7, 80(%rax)
	movsd	%xmm6, 72(%rax)
	movss	%xmm5, 64(%rax)
	movsd	%xmm12, 56(%rax)
	movss	%xmm11, 48(%rax)
	movsd	%xmm10, 40(%rax)
	movss	%xmm9, 32(%rax)
	movsd	%xmm8, 24(%rax)
	movq	%rcx, 8(%rax)
	movq	%rdx, (%rax)
	movss	%xmm4, 16(%rax)
	movq	(%rax), %rcx
	jmpq	*%rcx                   ## TAILCALL
	.cfi_endproc


.subsections_via_symbols

LLVM does a runtime add of tagged closure obj

In the NCG, we see code such as

leaq	_s2IX_closure$def+1(%rip), %rax

but in LLVM we see an add instuction emitted. We should find a way to get LLVM to emit the above instead of

leaq	   _s2IX_closure$def(%rip), %rax
addq  $1, %rax

We may have to add this as an optimization to LLC, since we happen to do this a lot but other languages probably don't.

Surgery Location

It looks like the stack protector implementation does something similar to what we want to do, which involves splitting apart blocks during LLVM IR -> SelectionDAG IR translation. To quote:

"While one can not handle multiple IR level basic blocks at the SelectionDAG Level, one can generate multiple machine basic blocks for one IR level basic block [in SelectionDAGBuilder]. This is how we handle bit tests and switches."

They have a very detailed comment describing the steps and process to do the block splitting.

musttail and sibling calls

In X86IselLowering.cpp, we have the following checks. Notice that we do not get sibling call optimization if we use musttail, which is silly. Change this so IsSibcall is set to true if it is a musttail.

bool IsMustTail = CLI.CS && CLI.CS->isMustTailCall();
  if (IsMustTail) {
    // Force this to be a tail call.  The verifier rules are enough to ensure
    // that we can lower this successfully without moving the return address
    // around.
    isTailCall = true;
  } else if (isTailCall) {
    // Check if it's really possible to do a tail call.
    isTailCall = IsEligibleForTailCallOptimization(Callee, CallConv,
                    isVarArg, SR != NotStructReturn,
                    MF.getFunction()->hasStructRetAttr(), CLI.RetTy,
                    Outs, OutVals, Ins, DAG);

    // Sibcalls are automatically detected tailcalls which do not require
    // ABI changes.
    if (!MF.getTarget().Options.GuaranteedTailCallOpt && isTailCall)
      IsSibcall = true;

Undef Arguments to a CPS Call

As you can see below, if one of the arguments to the CPSCALL is an undef, or in MBB speak IMPLICIT_DEF value, we're free to eliminate that binding, which leaves the TCRETURN referencing an unbound register.

I think the solution to this is to use the call-sequence start/end instructions to protect the physical register bindings. I have a feeling that's part of what their purpose is.

handleOverflow:
  %retFromGC = call ghccc {i8**, i64} @doGC(i8** %newSP, i64 undef), !cps.retpt !0

==================================================================

BB#6: derived from LLVM BB %handleOverflow
    Predecessors according to CFG: BB#5
	ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%R13<def> = COPY %vreg9; GR64:%vreg9
	%vreg24<def> = IMPLICIT_DEF; GR64:%vreg24
	%RBP<def> = COPY %vreg24; GR64:%vreg24
	CPSCALLd64 <ga:@doGC>, <regmask>, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>, %RSP<imp-def>, %R13<imp-def>, %RBP<imp-def>
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg25<def> = COPY %R13; GR64:%vreg25
	%vreg26<def> = COPY %RBP; GR64:%vreg26
	%vreg10<def> = COPY %vreg25; GR64:%vreg10,%vreg25
	%vreg11<def> = COPY %vreg26; GR64:%vreg11,%vreg26
	JMP_1 <BB#5>
    Successors according to CFG: BB#5(?%)

=======================================================================

BB#6: derived from LLVM BB %handleOverflow
    Predecessors according to CFG: BB#5
	ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%R13<def> = COPY %vreg9; GR64:%vreg9
	%vreg24<def> = IMPLICIT_DEF; GR64:%vreg24
	%RBP<def> = COPY %vreg24; GR64:%vreg24
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg30<def> = LEA64r %RIP, 1, %noreg, <MCSym=LretptStandin2>, %noreg; GR64:%vreg30
	MOV64mr %RBP, 1, %noreg, 52, %noreg, %vreg30<kill>; GR64:%vreg30
	TCRETURNdi64 <ga:@doGC>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>
    Successors according to CFG: BB#12(?%)

BB#12: EH LANDING PAD
    Live Ins: %R13 %RBP
    Predecessors according to CFG: BB#6
	EH_LABEL <MCSym=LretptStandin2>
	%vreg25<def> = COPY %R13<kill>; GR64:%vreg25
	%vreg26<def> = COPY %RBP<kill>; GR64:%vreg26
	%vreg10<def> = COPY %vreg25; GR64:%vreg10,%vreg25
	%vreg11<def> = COPY %vreg26; GR64:%vreg11,%vreg26
	JMP_1 <BB#5>
    Successors according to CFG: BB#5(0x80000000 / 0x80000000 = 100.00%)


========================================================================


BB#10: derived from LLVM BB %handleOverflow
    Predecessors according to CFG: BB#8
	ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%R13<def> = COPY %vreg9; GR64:%vreg9
	ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
	%vreg30<def> = LEA64r %RIP, 1, %noreg, <MCSym=LretptStandin2>, %noreg; GR64:%vreg30
	MOV64mr %RBP<undef>, 1, %noreg, 52, %noreg, %vreg30; GR64:%vreg30
	TCRETURNdi64 <ga:@doGC>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>
    Successors according to CFG: BB#11(?%)

========================================================================

*** Bad machine code: Using an undefined physical register ***
- function:    foo
- basic block: BB#10 handleOverflow (0x7fcb2b8153c8)
- instruction: TCRETURNdi64
- operand 4:   %RBP<imp-use>
LLVM ERROR: Found 1 machine code errors.

Jump Table Index Introduction

spectral/hartel/typecheck segfaults due to a jump to an address that is not code. This program will fault even if compiled without GHC optimizations, and the failure happens regularly (only input 600 tested so far).

LBB247_10:                              ##   in Loop: Header=BB247_5 Depth=1
	pushq	%rax
	movq	%rbx, %rax
LBB247_5:                               ## %cbJG
                                        ## =>This Inner Loop Header: Depth=1
	movq	8(%rbp), %r14
	movq	16(%rbp), %rcx
	movl	%eax, %edx
	andl	$7, %edx
	addq	$-2, %rdx
	cmpq	$3, %rdx
	ja	LBB247_7
## BB#6:                                ## %cbJG
                                        ##   in Loop: Header=BB247_5 Depth=1
	movslq	(%rsi,%rdx,4), %rdx
	addq	%rsi, %rdx
	jmpq	*%rdx               #  <<<< sometimes a bad value

What is odd here is that we have an indirect jump was not marked as a TAILCALL by LLVM... and the computation of the address is quite unusual. Currently, I'm trying to figure out how we even get this code in the _rDQ_info$def function.

Craft C example of CSE during isel producing worse code

IMO, the CSE happening in isel as a side-effect of the design of SDNode is not a good idea. It causes the store/load of a constant across a call, when it's better to rematieralize it.

Linux Compatibility

It seems something unusual is broken, so almost none of the nofib benchmarks compile right now.

I'm going to guess it has to do with generating symbol aliases in LLVM IR, since I see a few things broken with Mac like this as well.

==nofib== bernouilli: time to link bernouilli follows...
Main.o:(.text+0x240): undefined reference to `_S6L1_srt'
Main.o:(.text+0x3b0): undefined reference to `_S6L1_srt'
Main.o:(.text+0x560): undefined reference to `_S6L1_srt'
Main.o:(.text+0x580): undefined reference to `_S6L1_srt'
Main.o:(.text+0x5a0): undefined reference to `_S6L1_srt'
Main.o:(.text+0x7d0): more undefined references to `_S6L1_srt' follow
Main.o:(.text+0x1df0): undefined reference to `_u7tZ_srtd$def'
Main.o:(.text+0x1e20): undefined reference to `_u7u0_srtd$def'
Main.o:(.text+0x1f20): undefined reference to `_S6L1_srt'
Main.o:(.text+0x1f80): undefined reference to `_u7u1_srtd$def'
collect2: error: ld returned 1 exit status
`gcc' failed in phase `Linker'. (Exit code: 1)
<<ghc: 17518096 bytes, 8 GCs, 572628/943064 avg/max bytes residency (2 samples), 30M in use, 0.001 INIT (0.001 elapsed), 0.023 MUT (0.406 elapsed), 0.014 GC (0.014 elapsed) :ghc>>
../../mk/target.mk:46: recipe for target 'bernouilli' failed
make[1]: *** [bernouilli] Error 1
Failed making all in bernouilli: 1
../mk/ghc-recurse.mk:65: recipe for target 'all' failed
make: *** [all] Error 1
kavon@tormenta:~/haskell/alt-ghc/nofib/imaginary$

improving cps_ret codegen

Right now we emit:

LCAFE0:
	movq	(%r13), %rax
	jmpq	*%rax                   ## TAILCALL

for a CPS_RET. We can save an instruction if we add a special pattern as we do for CPSCALLm64 that looks for the load, so that we can emit a TCRETURNmi64 instead, and get:

LCAFE0:
	jmpq	*(%r13)                   ## TAILCALL

Convention + Operand Bundle instead of an Intrinsic

It turns out that "operand bundles", which are essentially metadata for calls, are illegal to drop by the optimizer. Thus, with a combination of a special lowering for the GHC calling convention, and these bundles to provide the additional necessary information down the pipeline, we can remove the need for adding an intrinsic to the LLVM IR entirely:

call void @y() [ "ghc_id"(i64 123910421), "ghc_spArgNum"(i32 3) ]

See ~6:14 in Hal's talk here: https://www.youtube.com/watch?v=jII0AcgU_5c

Revert changes to MMI to support MBBs

in imaginary/bernouilli

/Users/kavon/msr/ghc/inplace/bin/ghc-stage2 -O2 -Rghc-timing -H32m -hisuf hi -fllvm -rtsopts -c Main.hs -o Main.o

Main.hs:26:21: warning: [-Wtabs]
    Tab character found here.
    Please use spaces instead.
   |
26 | bernoulli 1 = -(1%2)	
   |                     ^^^^
WARNING: file compiler/simplCore/OccurAnal.hs, line 2163 Just 1 []

<unknown>:0:0: error:  error: invalid symbol redefinition
Assertion failed: (MBB->getParent() == Entry.MF && "MachineFunction Parent changed"), function getAddrLabelSymbolToEmit, file /Users/kavon/msr/llvm-dev/src/lib/CodeGen/MachineModuleInfo.cpp, line 147.
0  llc                      0x000000010f9c29dc llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 60
1  llc                      0x000000010f9c2e69 PrintStackTraceSignalHandler(void*) + 25
2  llc                      0x000000010f9bfbc9 llvm::sys::RunSignalHandlers() + 425
3  llc                      0x000000010f9c30e2 SignalHandler(int) + 354
4  libsystem_platform.dylib 0x00007fff9056b52a _sigtramp + 26
5  libsystem_platform.dylib 0x00007fae9942fe48 _sigtramp + 149702968
6  libsystem_c.dylib        0x00007fff90aac6df abort + 129
7  libsystem_c.dylib        0x00007fff90a73dd8 basename + 0
8  llc                      0x000000010ea05b5d llvm::MMIAddrLabelMap::getAddrLabelSymbolToEmit(llvm::MachineBasicBlock*, llvm::Twine*) + 253
9  llc                      0x000000010ea06c0f llvm::MachineModuleInfo::getAddrLabelSymbolsWithRequest(llvm::MachineBasicBlock const*, llvm::Twine*) + 175
10 llc                      0x000000010e09c7c5 llvm::MachineModuleInfo::getAddrLabelSymbol(llvm::MachineBasicBlock const*, llvm::Twine*) + 37
11 llc                      0x000000010dfc60b1 llvm::X86TargetLowering::EmitCPSCall(llvm::MachineInstr&, llvm::MachineBasicBlock*, unsigned int) const + 4625
12 llc                      0x000000010dfc8139 llvm::X86TargetLowering::EmitInstrWithCustomInserter(llvm::MachineInstr&, llvm::MachineBasicBlock*) const + 5689
13 llc                      0x000000010e844a15 (anonymous namespace)::ExpandISelPseudos::runOnMachineFunction(llvm::MachineFunction&) + 245
14 llc                      0x000000010e9c7491 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 449
15 llc                      0x000000010eefd4df llvm::FPPassManager::runOnFunction(llvm::Function&) + 399
16 llc                      0x000000010eefd7f5 llvm::FPPassManager::runOnModule(llvm::Module&) + 117
17 llc                      0x000000010eefe4df (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 1967
18 llc                      0x000000010eefdab6 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 342
19 llc                      0x000000010eeff0d1 llvm::legacy::PassManager::run(llvm::Module&) + 33
20 llc                      0x000000010ddfaa8a compileModule(char**, llvm::LLVMContext&) + 23146
21 llc                      0x000000010ddf4c7e main + 2910
22 libdyld.dylib            0x00007fff946f75ad start + 1
Stack dump:
0.	Program arguments: llc -O2 -disable-machine-cse -relocation-model=pic /var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_3.bc -o /var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_4.lm_s --enable-tbaa=true -mattr=+sse2 
1.	Running pass 'Function Pass Manager' on module '/var/folders/4k/zvw3b3px7851gfzyk28m84cm0000gn/T/ghc27954_0/ghc_3.bc'.
2.	Running pass 'Expand ISel Pseudo-instructions' on function '@"r6Gi_info$def"'

Invalid RAs

Originally found in imaginary/integrate, but this invalid RA isn't hit by that particular program. It's probably the cause of other invalid instruction bugs though:

# AFTER LLC
LBB0_5:                                 ## %c6sZ
                                        ##   in Loop: Header=BB0_4 Depth=1
	movq	$24, 904(%r13)
	movq	%rax, %r12
	leaq	L7133701809754890517_1(%rip), %rax
	movq	%rax, (%rbp)
	jmp	_stg_gc_pp              ## TAILCALL
LBB0_6:                                 ##   in Loop: Header=BB0_4 Depth=1
L7133701809754890517_1:
	.p2align	4, 0x90
LBB0_4:                                 ## =>This Inner Loop Header: Depth=1
	movq	%r12, %rax
	addq	$24, %rax
	cmpq	%rax, 856(%r13)
	jb	LBB0_5



# AFTER MANGLER
LBB0_5:                                 ## %c6sZ
                                        ##   in Loop: Header=BB0_4 Depth=1
	movq	$24, 904(%r13)
	movq	%rax, %r12
	leaq	L7133701809754890517_1(%rip), %rax
	movq	%rax, (%rbp)
	jmp	_stg_gc_pp              ## TAILCALL
LBB0_6:                                 ##   in Loop: Header=BB0_4 Depth=1
	.p2align	4, 0x90
	.quad	0
	.quad	30
L7133701809754890517_1:	
	.p2align	4, 0x90
LBB0_4:                                 ## =>This Inner Loop Header: Depth=1
	movq	%r12, %rax
	addq	$24, %rax
	cmpq	%rax, 856(%r13)
	jb	LBB0_5



# AFTER DUMPING EXECUTABLE

0000000100001781	movq	$0x18, %cs:0x388(%r13)
000000010000178d	movq	%rax, %r12
0000000100001790	leaq	0x19(%rip), %rax
0000000100001797	movq	%rax, (%rbp)
000000010000179b	jmp	0x1000d8538
00000001000017a0	addb	%al, (%rax)
00000001000017a2	addb	%al, (%rax)
00000001000017a4	addb	%al, (%rax)
00000001000017a6	addb	%al, (%rax)
00000001000017a8	.byte 0x1e #bad opcode
00000001000017a9	addb	%al, (%rax) 
00000001000017ab	addb	%al, (%rax)
00000001000017ad	addb	%al, (%rax)
00000001000017af	addb	%cl, -0x20(%rcx,%rcx,4)
00000001000017b3	addq	$0x18, %rax
00000001000017b7	cmpq	%rax, 0x358(%r13)
00000001000017be	jb	0x100001782

_stg_gc_pp:
00000001000d8538	leaq	0x29(%rip), %rax

# WHEN TRYING TO FIND CORRECT RA: 
# NOTE: the assembly above is missing the MOV because
# the disassembler doesn't handle the TNTC info properly.
#
# movq has encoding: REX.W + 8B /r
# REX prefixes are 1 byte, and the /r is apparently 1 byte:


(lldb) di -s 0x1000017a9 -c 5
integrate`Main_ints_info:
0x1000017a9 <+145>: addb   %al, (%rax)
0x1000017ab <+147>: addb   %al, (%rax)
0x1000017ad <+149>: addb   %al, (%rax)
0x1000017af <+151>: addb   %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq   $0x18, %rax
(lldb) di -s 0x1000017b3 -c 5
integrate`Main_ints_info:
0x1000017b3 <+155>: addq   $0x18, %rax
0x1000017b7 <+159>: cmpq   %rax, 0x358(%r13)
0x1000017be <+166>: jb     0x100001782               ; <+106>
0x1000017c0 <+168>: leaq   0xb58f1(%rip), %rcx       ; ghczmprim_GHCziTypes_ZC_con_info
0x1000017c7 <+175>: movq   %rcx, 0x8(%r12)
(lldb) di -s 0x1000017AF -c 5
integrate`Main_ints_info:
0x1000017af <+151>: addb   %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq   $0x18, %rax
0x1000017b7 <+159>: cmpq   %rax, 0x358(%r13)
0x1000017be <+166>: jb     0x100001782               ; <+106>
0x1000017c0 <+168>: leaq   0xb58f1(%rip), %rcx       ; ghczmprim_GHCziTypes_ZC_con_info
(lldb) di -s 0x1000017AE -c 5
integrate`Main_ints_info:
0x1000017ae <+150>: addb   %al, (%rax)
0x1000017b0 <+152>: movq   %r12, %rax
0x1000017b3 <+155>: addq   $0x18, %rax
0x1000017b7 <+159>: cmpq   %rax, 0x358(%r13)
0x1000017be <+166>: jb     0x100001782               ; <+106>
(lldb) di -s 0x1000017AD -c 5
integrate`Main_ints_info:
0x1000017ad <+149>: addb   %al, (%rax)
0x1000017af <+151>: addb   %cl, -0x20(%rcx,%rcx,4)
0x1000017b3 <+155>: addq   $0x18, %rax
0x1000017b7 <+159>: cmpq   %rax, 0x358(%r13)
0x1000017be <+166>: jb     0x100001782               ; <+106>
(lldb) di -s 0x1000017b0 -c 5
integrate`Main_ints_info:
0x1000017b0 <+152>: movq   %r12, %rax
0x1000017b3 <+155>: addq   $0x18, %rax
0x1000017b7 <+159>: cmpq   %rax, 0x358(%r13)
0x1000017be <+166>: jb     0x100001782               ; <+106>
0x1000017c0 <+168>: leaq   0xb58f1(%rip), %rcx       ; ghczmprim_GHCziTypes_ZC_con_info
(lldb) 

# 0x1000017b0 is the proper return address, not 0x1000017a9

custom lowering for intrinsics

a great example of this seems to be the @llvm.addressofreturnaddress instrinsic. here are some places you can start digging around:

➤ ag LowerADDROFRETURNADDR
lib/Target/X86/X86ISelLowering.h
1173:    SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;

lib/Target/X86/X86ISelLowering.cpp
20612:SDValue X86TargetLowering::LowerADDROFRETURNADDR(SDValue Op,
23827:  case ISD::ADDROFRETURNADDR:   return LowerADDROFRETURNADDR(Op, DAG);
kavon@cronus:~/m/l/src|master✓
➤ ag ADDROFRETURNADDR
include/llvm/CodeGen/ISDOpcodes.h
73:    FRAMEADDR, RETURNADDR, ADDROFRETURNADDR,

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1015:  case ISD::ADDROFRETURNADDR:

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp
103:  case ISD::ADDROFRETURNADDR:           return "ADDROFRETURNADDR";

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
4800:    setValue(&I, DAG.getNode(ISD::ADDROFRETURNADDR, sdl,

lib/Target/X86/X86ISelLowering.h
1173:    SDValue LowerADDROFRETURNADDR(SDValue Op, SelectionDAG &DAG) const;

lib/Target/X86/X86ISelLowering.cpp
20612:SDValue X86TargetLowering::LowerADDROFRETURNADDR(SDValue Op,
23827:  case ISD::ADDROFRETURNADDR:   return LowerADDROFRETURNADDR(Op, DAG);

Instcombine Changing CPSCall Argument Types

Example from GHC's testsuite/tests/typecheck/should_compile/T7408.hs

%ln1tv = getelementptr inbounds i64, i64* %ln1sP, i32 107
%ln1tT = call ghccc @llvm.experimental.cpscall(
        { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64 } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* %ln1tM, i64 7133701809754871206, i32 0, i16 1, 
              i64* %ln1sP, .. ETC ..)       ; <<<

Instcombine turns the base of the GEP, %ln1sP, into an i8* (with a new name), which no longer matches up with the callee's type when we check it during verification.

%ln1tv = getelementptr inbounds i8, i8* %ln1sM, i64 856
; ...
; ...
              i8* %ln1sM, .. ETC ..)       ; <<< callee expected i64*

Instead of teaching instcombine to update the callee, it's easier to change the type of equality applied to parameters vs args, to allow for arguments that can be losslessly bitcasted to the parameter type.

If some calling convention actually relies on the differences of pointee type, this suggested fix will not work. AFAIK pointers will become opaque in LLVM soon anyways.

Only run cpscallprep if using x86

At the moment, cpscall is only supported on x86 anyways, so modify the following:

void TargetPassConfig::addISelPrepare() {
  addPreISel();

  addPass(createCPSCallPrepPass());
// ...
}

so that we add the pass in the target specific PreISel function, as discussed in #11

GC Entry Redundancy Elimination

Check out the code (at the very bottom of this post) in imaginary/integrate. Part of the main loop in the calculation makes many non-tail calls, all of which return and perform a heap test. The only difference in the below is the block branched to after returning from the GC.

An idea to identify the redundancy and replace these blocks with something like this:

  %addr = blockaddress(@foo, %block1)
  br label %commonBlock

commonBlock:
  ; call and return from GC
  indirectbr label %addr [label %block1], [label %block2]  ; ...

Main challenges:

The return address is the only unique part of the IR we're generating, but it's hidden away in the IR.

Potential Solution:
One nice way of designing it might be to say that the intrinsic takes a block address (label) as an argument. Then the intrinsic returns a token, from which the return value and return target (label) are returned. Then, we can perform an indirectbr on the return target. Since an indirectbr must include all possible targets, if the target list includes only one label, we can simply turn that into a standard unconditional br. Otherwise, we can perform the combining optimization?

I think the main point here is that, we can solve the problem in commonBlock in the example above where %addr is live across the call by simply saying that we can "pass" the block address through the intrinsic, using the token thing.

The ugly code below, followed by its output:

c6AZ:                                             ; preds = %c6zU
  %ln6LF = getelementptr inbounds i64, i64* %Base_Var.7, i64 113
  store i64 16, i64* %ln6LF, align 8, !tbaa !4
  %ln6LN = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890954, i32 0, i16 1, i64* %Base_Var.7, i64* %Sp_Var.7, i64* %ln6JH, i64 %R1_Var.7, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.7) #2
  %ln6LO = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 0
  %ln6LP = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 1
  %ln6LQ = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 2
  %ln6LR = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 3
  %ln6LS = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6LN, 9
  br label %c6zU

c6AW:                                             ; preds = %c6zM
  %ln6LU = getelementptr inbounds i64, i64* %Base_Var.6, i64 113
  store i64 16, i64* %ln6LU, align 8, !tbaa !4
  %ln6M2 = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890946, i32 0, i16 1, i64* %Base_Var.6, i64* %Sp_Var.6, i64* %ln6IH, i64 %R1_Var.6, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.6) #2
  %ln6M3 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 0
  %ln6M4 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 1
  %ln6M5 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 2
  %ln6M6 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 3
  %ln6M7 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6M2, 9
  br label %c6zM

c6AT:                                             ; preds = %c6zB
  %ln6M9 = getelementptr inbounds i64, i64* %Base_Var.5, i64 113
  store i64 16, i64* %ln6M9, align 8, !tbaa !4
  %ln6Mh = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890935, i32 0, i16 1, i64* %Base_Var.5, i64* %Sp_Var.5, i64* %ln6HE, i64 %R1_Var.5, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.5) #2
  %ln6Mi = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 0
  %ln6Mj = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 1
  %ln6Mk = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 2
  %ln6Ml = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 3
  %ln6Mm = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mh, 9
  br label %c6zB

c6AQ:                                             ; preds = %c6zq
  %ln6Mo = getelementptr inbounds i64, i64* %Base_Var.4, i64 113
  store i64 16, i64* %ln6Mo, align 8, !tbaa !4
  %ln6Mw = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890924, i32 0, i16 1, i64* %Base_Var.4, i64* %Sp_Var.4, i64* %ln6GD, i64 %R1_Var.4, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.4) #2
  %ln6Mx = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 0
  %ln6My = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 1
  %ln6Mz = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 2
  %ln6MA = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 3
  %ln6MB = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Mw, 9
  br label %c6zq

c6AN:                                             ; preds = %c6zf
  %ln6MD = getelementptr inbounds i64, i64* %Base_Var.3, i64 113
  store i64 16, i64* %ln6MD, align 8, !tbaa !4
  %ln6ML = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890913, i32 0, i16 1, i64* %Base_Var.3, i64* %Sp_Var.3, i64* %ln6FG, i64 %R1_Var.3, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.3) #2
  %ln6MM = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 0
  %ln6MN = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 1
  %ln6MO = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 2
  %ln6MP = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 3
  %ln6MQ = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6ML, 9
  br label %c6zf

c6AK:                                             ; preds = %c6z4
  %ln6MS = getelementptr inbounds i64, i64* %Base_Var.2, i64 113
  store i64 16, i64* %ln6MS, align 8, !tbaa !4
  %ln6N0 = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890902, i32 0, i16 1, i64* %Base_Var.2, i64* %Sp_Var.2, i64* %ln6EF, i64 %R1_Var.2, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.2) #2
  %ln6N1 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 0
  %ln6N2 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 1
  %ln6N3 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 2
  %ln6N4 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 3
  %ln6N5 = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6N0, 9
  br label %c6z4

c6AH:                                             ; preds = %c6yT
  %ln6N7 = getelementptr inbounds i64, i64* %Base_Var.1, i64 113
  store i64 16, i64* %ln6N7, align 8, !tbaa !4
  %ln6Nf = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890891, i32 0, i16 1, i64* %Base_Var.1, i64* %Sp_Var.1, i64* %ln6DC, i64 %R1_Var.1, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.1) #2
  %ln6Ng = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 0
  %ln6Nh = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 1
  %ln6Ni = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 2
  %ln6Nj = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 3
  %ln6Nk = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nf, 9
  br label %c6yT

c6AE:                                             ; preds = %c6yE
  %ln6Nm = getelementptr inbounds i64, i64* %Base_Var.0, i64 113
  store i64 16, i64* %ln6Nm, align 8, !tbaa !4
  %ln6Nu = call ghccc { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } ({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*, i64, i32, i16, ...) @llvm.experimental.cpscall.sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64s.p0f_sl_p0i64p0i64p0i64i64i64i64i64i64i64i64f32f64f32f64f32f64f32f64f32f64f32f64sp0i64p0i64p0i64i64i64i64i64i64i64i64f({ i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* bitcast (i8* @stg_gc_unpt_r1 to { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*), i64 7133701809754890876, i32 0, i16 1, i64* %Base_Var.0, i64* %Sp_Var.0, i64* %ln6Ct, i64 %R1_Var.0, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %SpLim_Var.0) #2
  %ln6Nv = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 0
  %ln6Nw = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 1
  %ln6Nx = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 2
  %ln6Ny = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 3
  %ln6Nz = extractvalue { i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64, float, double, float, double, float, double, float, double, float, double, float, double } %ln6Nu, 9
  br label %c6yE

	.p2align	4, 0x90
LBB4_29:                                ## %c6AZ
                                        ##   in Loop: Header=BB4_26 Depth=1
	movq	$16, 904(%r13)
	movq	%rax, %r12
	leaq	L7133701809754890954_1(%rip), %rax
	movq	%rax, (%rbp)
	jmp	_stg_gc_unpt_r1         ## TAILCALL

# ...

	.p2align	4, 0x90
LBB4_31:                                ## %c6AW
                                        ##   in Loop: Header=BB4_23 Depth=1
	movq	$16, 904(%r13)
	movq	%rax, %r12
	leaq	L7133701809754890946_1(%rip), %rax
	movq	%rax, (%rbp)
	jmp	_stg_gc_unpt_r1         ## TAILCALL

# ...

	.p2align	4, 0x90
LBB4_33:                                ## %c6AT
                                        ##   in Loop: Header=BB4_20 Depth=1
	movq	$16, 904(%r13)
	movq	%rax, %r12
	leaq	L7133701809754890935_1(%rip), %rax
	movq	%rax, (%rbp)
	jmp	_stg_gc_unpt_r1         ## TAILCALL

Failing Assertion: undef'd register "should be live-in to block"

"The register %R14 needs to be live in to BB#1, but is missing from the live-in list.
LLVM ERROR: Invalid global physical register"

When compiling x2n1.

From LiveRangeCalc::findReachingDefs on the given code.

176B  BB#1: derived from LLVM BB %c71x
      Predecessors according to CFG: BB#0
192B    ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
208B    %XMM0<def> = COPY %vreg252; FR64:%vreg252
224B    CALL64pcrel32 <ga:@rintDouble>, <regmask %BH %BL %BP %BPL %BX %EBP %EBX %RBP %RBX %R12 %R13 %R14 %R15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use>, %XMM0<imp-use,kill>, %RSP<imp-def>, %XMM0<imp-def>
240B    ADJCALLSTACKUP64 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
256B    %vreg255<def> = COPY %XMM0<kill>; FR64:%vreg255
288B    %vreg256<def,tied1> = ADD64ri8 %vreg256<tied0>, -8, %EFLAGS<imp-def,dead>; GR64:%vreg256
304B    ADJCALLSTACKDOWN64 0, 0, 0, %RSP<imp-def,dead>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
320B    %R13<def> = COPY %vreg242; GR64:%vreg242
336B    %RBP<def> = COPY %vreg256; GR64:%vreg256
352B    %R12<def> = COPY %vreg244; GR64:%vreg244
368B    %RBX<def> = COPY %vreg245; GR64:%vreg245
384B    %R15<def> = COPY %vreg251; GR64:%vreg251
400B    %XMM1<def> = COPY %vreg255; FR64:%vreg255
416B    ADJCALLSTACKUP64 0, 0, %RSP<imp-def>, %EFLAGS<imp-def,dead>, %RSP<imp-use>
432B    %vreg508<def> = LEA64r %RIP, 1, %noreg, <MCSym=L7133701809754892656_0>, %noreg; GR64:%vreg508
448B    MOV64mr %RBP, 1, %noreg, 0, %noreg, %vreg508; GR64:%vreg508
464B    TCRETURNdi64 <ga:@integerzmgmp_GHCziIntegerziType_decodeDoubleInteger_info>, 0, %RSP<imp-use>, %R13<imp-use>, %RBP<imp-use>, %R12<imp-use>, %RBX<imp-use>, %R14<imp-use,undef>, %RSI<imp-use,undef>, %RDI<imp-use,undef>, %R8<imp-use,undef>, %R9<imp-use,undef>, %R15<imp-use>, %XMM1<imp-use>
      Successors according to CFG: BB#2(?%)

480B  BB#2: ADDRESS TAKEN
      Live Ins: %R13 %RBP %R12 %RBX %R14 %RSI %RDI %R8 %R9 %R15 %XMM1 %XMM2 %XMM3 %XMM4 %XMM5 %XMM6 %XMM8 %XMM9 %XMM10 %XMM11 %XMM12 %XMM13
      Predecessors according to CFG: BB#1
496B    %vreg22<def> = COPY %R13; GR64:%vreg22
512B    %vreg376<def> = COPY %RBP; GR64:%vreg376
528B    %vreg24<def> = COPY %R12; GR64:%vreg24
544B    %vreg25<def> = COPY %RBX; GR64:%vreg25
560B    %vreg26<def> = COPY %R14<kill>; GR64:%vreg26
576B    %vreg27<def> = COPY %R15; GR64:%vreg27
      Successors according to CFG: BB#3(0x80000000 / 0x80000000 = 100.00%)

Verifying cpscall usage

need to add checks for well-formed cpscalls in the verifier pass of the following properties:

Type check the function pointer against the arguments passed and the call instruction's return type.
Check that there are 3 constant integer arguments
Check that the argnum argument is in range and of type i64*, or at least, of pointer type.

Spill/Reload of Constant across CPSCall

In the example below, the constant global's address is spilled and reloaded from the C stack across this call, which we should not have happen.

; ModuleID = 'spill_nt.ll'
source_filename = "spill_nt.ll"

@ghczmprim_GHCziTypes_True_closure = external global i8

declare ghccc { i64, i64* } @bar(i64, i64*)

; Function Attrs: naked
define ghccc { i64, i64* } @foo(i64 %__x, i64* %__y) #0 {
cazq:
  %lnbeg = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
  %lnbeh = add i64 %lnbeg, 2
  %retVals = call ghccc { i64, i64* } ({ i64, i64* } (i64, i64*)*, i64, i32, i16, ...) 
  		@llvm.experimental.cpscall.sl_i64p0i64s.p0f_sl_i64p0i64si64p0i64f(
  			{ i64, i64* } (i64, i64*)* @bar, i64 1337, i32 2, i16 1, 
  			i64 %lnbeh, i64* %__y)
  %lnbeg2 = ptrtoint i8* @ghczmprim_GHCziTypes_True_closure to i64
  %lnbeh2 = add i64 %lnbeg2, 2
  %updated = insertvalue { i64, i64* } %retVals, i64 %lnbeh2, 0
  ret { i64, i64* } %updated
}

declare ghccc { i64, i64* } 
	@llvm.experimental.cpscall.sl_i64p0i64s.p0f_sl_i64p0i64si64p0i64f(
		{ i64, i64* } (i64, i64*)*, i64, i32, i16, ...)

attributes #0 = { naked }

	.section	__TEXT,__text,regular,pure_instructions
	.macosx_version_min 10, 11
	.globl	_foo
	.p2align	4, 0x90
_foo:                                   ## @foo
	.cfi_startproc
## BB#0:                                ## %cazq
	movq	_ghczmprim_GHCziTypes_True_closure@GOTPCREL(%rip), %r13
	addq	$2, %r13
	movq	%r13, (%rsp)            ## 8-byte Spill
	leaq	L1337_0(%rip), %rax
	movq	%rax, 2(%rbp)
	jmp	_bar                    ## TAILCALL
LBB0_1:
L1337_0:
	movq	(%rsp), %r13            ## 8-byte Reload
	movq	(%r13), %rax
	jmpq	*%rax                   ## TAILCALL
	.cfi_endproc


.subsections_via_symbols

Ensure IR->IR CSE does not break CPS calls

As seen in #11 , CSE for constants is something that is valid to do across a vanilla call in LLVM.
Consider the following program:

declare void @foo(i64)

@extern_glob = external global i8

define void @bar() {
  %x_addr = ptrtoint i8* @extern_glob to i64
  %x = add i64 %x_addr, 2
  call void @foo(i64 %x)
  %y_addr = ptrtoint i8* @extern_glob to i64
  %y = add i64 %y_addr, 2
  call void @foo(i64 %y)
  ret void
}

Currently, LLVM's optimizations, like early-cse or gvn always inline the address calculation like so:

define void @bar() {
  call void @foo(i64 add (i64 ptrtoint (i8* @extern_glob to i64), i64 2))
  call void @foo(i64 add (i64 ptrtoint (i8* @extern_glob to i64), i64 2))
  ret void
}

But, it's valid to also produce the following:

define void @bar() {
  %x_addr = ptrtoint i8* @extern_glob to i64
  %x = add i64 %x_addr, 2
  call void @foo(i64 %x)
  call void @foo(i64 %x)
  ret void
}

If the calls were specifically cpscalls, this type of CSE introduces a value that is live across the cpscall, which will be allocated an LLVM stack slot.

Questions:

How do we know the latter will not happen, or rather, why does the latter transformation not occur?
Can we rely on it never occurring?
Can we check liveness information during cpscallprep to enforce our requirement?

Lowering ghccc as a tail call

We can do this for all backends that support tail calls by modifying this function:

SelectionDAGBuilder::LowerCallTo

In here, we basically just want to do,

if( callingConv == ghccc )
  isTailCall = true;

after the "isInTailPosition" check because that will be a no. I think after that we're okay because that's the only time we check whether the call intruction is followed by a ret?

Dataflow via existing LLVM intrinsics.

Michal brought up the existing @llvm.invariant intrinsics for marking a region of memory during a section of a block that we know is not mutable.

I think this or something like it would help the LLVM optimizer gain information during analysis.

kavon / ghc-llvm Goto Github PK

ghc-llvm's People

Contributors

Stargazers

Watchers

ghc-llvm's Issues

Recommend Projects

Recommend Topics

Recommend Org