Code Monkey home page Code Monkey logo

llvm-mos's Introduction

LLVM-MOS

LLVM-MOS is a LLVM fork supporting the MOS 65xx series of microprocessors.

For more information about this project, please see llvm-mos.org.

Notice

The llvm-mos project is not officially affiliated with or endorsed by the LLVM Foundation or LLVM project. Our project is a fork of LLVM that provides a new backend/target; our project is based on LLVM, not a part of LLVM. Our use of LLVM or other related trademarks does not imply affiliation or endorsement.

This repository only contains the core llvm-mos utilities, and it doesn't form a complete toolchain. Accordingly, there are no official binary releases for this repository; it's for internal development only.

Please see our SDK to get started.

Building LLVM-MOS

If you wish to modify the compiler, then you'll need to compile LLVM-MOS from source.

Generally, compiling LLVM-MOS follows the same convention as compiling LLVM. First, please review the hardware and software requirements for building LLVM.

Once you meet those requirements, you may use the following formula within your build environment:

Clone the LLVM-MOS repository

On Linux and MacOS:

git clone https://github.com/llvm-mos/llvm-mos.git

On Windows:

git clone --config core.autocrlf=false https://github.com/llvm-mos/llvm-mos.git

If you fail to use the --config flag as above, then verification tests will fail on Windows.

Configure the LLVM-MOS project

cd llvm-mos
cmake -C clang/cmake/caches/MOS.cmake [-G <generator>] -S llvm -B build [...]

This configuration command seeds the CMake cache with values from MOS.cmake. Feel free to review and adjust these values for your environment.

Additional options can be added to the cmake command, which override the values provided in MOS.cmake. A handful are listed below. For a complete list of options, see Building LLVM with CMake.

  • -G <generator> --- Lets you choose the CMake generator for your build environment. CMake will try to automatically detect your build tools and use them; however, it's recommended to install Ninja and pass Ninja as the parameter to the -G command.

  • -DLLVM_ENABLE_PROJECTS=... --- semicolon-separated list of the LLVM sub-projects you'd like to additionally build. Can include any of: clang, clang-tools-extra, lldb, or lld.

  • -DCMAKE_INSTALL_PREFIX=directory --- Specify for directory the full path name of where you want the LLVM tools and libraries to be installed (default /usr/local).

  • -DCMAKE_BUILD_TYPE=type --- Valid options for type are Debug, Release, RelWithDebInfo, and MinSizeRel. Default is MinSizeRel, if you are using the MOS.cmake cache file.

  • -DLLVM_ENABLE_ASSERTIONS=On --- Compile with assertion checks enabled (default is Yes for Debug builds, No for all other build types).

Build the LLVM-MOS project

cmake --build build [-- [options] <target>]

The default target will build all of LLVM. The check-all target will run the regression tests. The distribution target will build a collection of all the LLVM-MOS tools, suitable for redistribution.

CMake will generate targets for each tool and library, and most LLVM sub-projects generate their own check-<project> target.

Running a serial build will be slow. To improve speed, try running a parallel build. That's done by default in Ninja; for make, use the option -j NNN, where NNN is the number of parallel jobs, e.g. the number of CPUs you have.

Help us out

We need your help! Please review the issue tracker, please review the current state of the code, and jump in and help us with pull requests for bug fixes.

All LLVM-MOS code should observe the LLVM coding standards. clang-format and clang-tidy are aids for this; this repo contains appropriate configuration files for them.

Code should be appropriately documented and well tested. We're not quite as picky as the upstream LLVM project, but a compiler is too complex a project to thrive without a high bar for code quality.

You submit issue requests via the issue tracker. Please note, we don't have the bandwidth yet to handle "why dosent my pogrem compil" type requests; it helps to do at least some legwork to figure out what's going on first. Small reproducers are tremendously helpful, and for more finnicky issues, they're essentially required.

Additionally, the current state of our documentation at https://llvm-mos.org can always use improvements and clarifications.

llvm-mos's People

Contributors

akyrtzi avatar arsenm avatar chandlerc avatar chapuni avatar d0k avatar ddunbar avatar douggregor avatar dwblaikie avatar echristo avatar espindola avatar fhahn avatar isanbard avatar jdevlieghere avatar kazutakahirata avatar klausler avatar labath avatar lattner avatar lebedevri avatar lhames avatar maskray avatar nico avatar nikic avatar rksimon avatar rnk avatar rotateright avatar rui314 avatar stoklund avatar tkremenek avatar topperc avatar zygoloid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

llvm-mos's Issues

Data type narrowing and LSR

Due to memory and computational constraints inherent to the 6502 architecture, it is important to keep data types as narrow as possible. llvm-mos should be able to detect the maximum range of a variable and narrow it down whenever possible. Currently, if the programmer uses wider types than necessary, it can lead to unnecessary complexity in the output program.

The following code shows this issue:

char gStuff[64];

void foo(char x) {
    for (char i = 0; i < 64; i++) {
        gStuff[i] = i;
    }
}

When the program above is compiled in -O2 mode, the output is imperfect but fairly tight:

foo:                                    ; @foo
; %bb.0:                                ; %entry
	ldx	#0
LBB0_1:                                 ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
	txa
	sta	gStuff,x
	inx
	cpx	#64
	bne	LBB0_1
; %bb.2:                                ; %for.cond.cleanup
	rts
.Lfunc_end0:

However, changing the type of i from char to int results in a drastic increase in complexity:

foo:                                    ; @foo
; %bb.0:                                ; %entry
	ldx	#0
	lda	#0
	ldy	#0
	sty	__foo_sstk                      ; 1-byte Folded Spill
LBB0_1:                                 ; %for.body
                                        ; =>This Inner Loop Header: Depth=1
	sta	gStuff,x
	clv
	ldy	__foo_sstk                      ; 1-byte Folded Reload
	sty	mos8(__rc3)
	cpy	#0
	bcs	LBB0_3
; %bb.2:                                ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	bit	__set_v
LBB0_3:                                 ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	sta	mos8(__rc2)
	cmp	#63
	ldy	#0
	bcs	LBB0_5
; %bb.4:                                ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	ldy	#1
LBB0_5:                                 ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	lda	mos8(__rc3)
	cmp	#0
	beq	LBB0_8
; %bb.6:                                ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	ldy	#1
	bvs	LBB0_8
; %bb.7:                                ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	ldy	#0
LBB0_8:                                 ; %for.body
                                        ;   in Loop: Header=BB0_1 Depth=1
	lda	mos8(__rc2)
	clc
	adc	#1
	sta	mos8(__rc2)
	lda	mos8(__rc3)
	adc	#0
	sta	__foo_sstk                      ; 1-byte Folded Spill
	lda	mos8(__rc2)
	inx
                                        ; kill: def $ylsb killed $ylsb def $y
	cpy	#0
	bne	LBB0_1
; %bb.9:                                ; %for.cond.cleanup
	rts

This occurs despite the program performing the same task. Since the variable i is restricted to the range 0 .. 64, the type can be narrowed to an 8-bit integer.

llvm-mos should be able to detect simple cases where a variable can be narrowed, and make it so.

"Unsupported physical register copy" when copying between MOSReg1Class

I'm getting an assert in MOSInstrInfo::copyPhysRegImpl() when codegen attempts to copy an Anyi1 to another.

It's also not immediately clear what the best way is, to copy the bit back and forth between C, V, A, X, and Y. I'm not sure whether you intended copies from 1 bit to 8 bit registers, to kill the other 7 bits of the register, when you're copying C or V to an 8 bit type.

The mechanism I imagined, for getting data in and out of P, would be PHP/PLA and PHA/PLP pairs. I am confused what your strategy is on i1? Woz stored i1's in the high bit of 8 bit values: https://www.pagetable.com/?p=33

The following can be inserted into postrapseudo.mir to make sure the ALSB<->C cases are handled somehow. All the i1 copy types should probably be checked though...

---
name: alsb_c
# CHECK-LABEL: name: alsb_c
body: |
  bb.0.entry:
    liveins: $c
    renamable $alsb = COPY killed renamable $c, implicit-def $a
...
---
name: c_alsb
# CHECK-LABEL: name: c_alsb
body: |
  bb.0.entry:
    liveins: $c
    renamable $c = COPY killed renamable $alsb
...

Implement setjmp/longjmp support

Setjmp and Longjmp routines should be written supported target platforms an included alongside whatever minimal libc we have, before a full libc is ported. These aren't required for a freestanding C implementation, but they're the riskiest part of a full C library, so having them will provide early validation for our approach.

Implement and/or stub all libcalls emitted by the legalizer or instruction selector

All libcalls capable of being emitted by the legalizer or instruction selector need to actually point to something in the SDK. Minimal implementations should be provided for each.

Some of these libcalls (e.g., soft float routines) will require a considerable amount of code to implement, and we probably shouldn't write it ourselves. We'd want to incorporate an off-the-shelf solution written in C. Accordingly, this issue should be solved piecemeal, with each libcall going in at some point after enough of the C spec has been implemented to correctly compile the libraries used to implement that libcall.

Write 65xx simulator

In order to test generated code easily, we want to simulate CPUs within the 65xx family.

Although LLVM doesn't currently have any facility for enabling the generation of simulators, the more I look at this problem, the more I think that the simplest and most direct way to write one, is to create a novel tablegen backend for it.

This sounds nastier than I think it will actually be. And, if I'm right, the result will be immediately useful upstream.

Purposely, I don't intend to design a cycle-accurate emulator, for several reasons. One, no code generated by llvm-mos should be timing sensitive. Two, we specifically need to be able to simulate code running on all the odd 65xx variants out there. And we don't have netlists for, say, the 65816 or the Ricoh processors. Third, supporting additional processors in the simulator would be a small and incremental amount of work.

The simulated target will purposely have a trivial virtual architecture. Code will be loaded at a fixed address ($0200), and simulator code can thunk to the underlying operating system, by reading and writing a memory range at a fixed location ($FFxx for 6502, $FFFFxx for the 65816.)

If I structure this code correctly, then the tablegen backend should be able to generate both a traditional stepped simulator, as well a qemu-style optimized backend that treats non-branching code as invariant. and thus compiles blocks of 65xx code to extremely fast native code. Not that we care about high-speed simulation specifically for the 65xx, but I think that the LLVM community would care a lot about high-speed CPU simulation in llvm. Like... a LOT.

See also Klaus2k5's code for validating emulators. https://github.com/Klaus2m5/6502_65C02_functional_tests

Propagate --num-imag-ptr choices to object files

If different sources are compiled with different --num-imag-ptr settings, comedy will ensue.

The linker ought to be able to verify that the number of imaginary pointers is the same for all objects being linked, and it should refuse to run, with a detailed explanation, if this is not so.

Determine interrupt handling calling convention in C

If a C function is used as an interrupt handler immediately from assembly, that function must preserve any caller-preserved registers that it might modify. (Callee-saved registers already must be preserved by all C functions, including the interrupt handler). This causes a tradeoff between performance of interrupt handlers on one end, and the size of the ZP register file. RISC targets with 30 or so register do actually save all of them on interrupt, but that's not reasonable on the 6502, where such saving would be far too expensive, and there can be up to 256 registers to save.

A traditional way that handwritten ASM deals by this is by assigning nonoverlapping ZP regions to interrupt handlers and the rest of the program. That way, nothing needs to be saved/restored on interrupt. The user would need to set this up themselves via the way they compile and link their binary together; the mechanism for this needs to be determined.

Alternatively, the only other simple way I know of to reduce the interrupt overhead to an acceptable level is to cap the number of available ZP registers to something like 32 or 16. This would not necessarily preclude the use of the rest of the ZP; static stack or memory could be allocated there by optimization passes to be developed later. This isn't as quite as performant as the first approach though, since long parameter lists would have to be passed through the stack. In practice though, it's unlikely that LLVM will be able to take advantage of such a large register set, so we may want to do this even if the first approach is also possible.

There's one more known approach, based on the observation that wherever the interrupt came from itself has a caller. That caller must function correctly no matter what caller saved variables its callee, and transitively the interrupt handler, writes to. That means that the only caller-saved registers that actually need to be saved by the handler are those live in function in which the interrupt occurred.

A zero page register could be reserved to mark the "possibly-live high-water-mark" on entry to each function call. The interrupt handler could then look at this location to see how many zero-page registers it needs to save/restore. The downside is that this variable would need to be modified in every single C function's prologue. Assembly language routines would also need to do this to avoid their ZP registers from being clobbered, which is quite onerous.

65816 support

The 65816 shares a similar instruction architecture to its 8-bit counterpart. Register sizes are wider, and there are more addressing modes.

It seems likely that the 65816 wider register format, could speed up a number of llvm-mos operations.

I can think of several strategies for modifying or adding a back end specifically for the 816. We could lie to codegen and claim that the 65816 is actually a 32-bit processor, of which only the first 24 bits are used when rendering machine instructions.

Pseudo-instruction Conflict Rematerialization

Instead of saving and restoring a conflicting physical register around a pseudoinstruction, it may be possible to rematerialize the definition afterwards. Similar logic to that in the register allocator can be used for this purpose.

If the Register Scavenger ends up being used for pseudo-instruction expansion, then this logic should be added to the register scavenger. Hopefully, this would improve scavenging behavior for other backends as well.

Initial test-suite pass

Now that the test-suite is hooked up, a number of examples are currently capable of compiling and running under the simulator. It looks like they broadly don't work, though. We should see what cases are currently capable of compiling, add each of them to an allowlist (or all the others to a denylist), and repair them one by one. This will establish a baseline of working functionality for the compiler.

Modify sim binary format to be size-sensitive

The current simulator format is a 64KiB binary image that makes the test suite's size tracking and comparison useless. We should rework the format so it's proportional to the size of the emitted sections; that way we can track code size gains and losses.

Build end-to-end test suite with Machine Verifier

Once the end-to-end test suite is compiling (but maybe not yet entirely passing), we figure out how to build it with the MachineVerifier passes enabled. While this will be quite slow, it should help catch mistakes that wouldn't show up otherwise, and this may save time debugging the broken tests.

Floating Point Support

An existing soft float library should be compiled for the platform. Floating point operations should be made to call out to this library. The library should be included in the package of compiler support routines. The link process should be arranged to ensure that they are not included in the binary unless they are actually used.

Audit Stack Pointer Set Order

The order in which the bytes of the stack pointer are manipulated should be carefully audited. Interrupt handlers that call C routines may see intermediate states while SP is being updated, so all such intermediate states must be guaranteed safe.

High-level feature specification and delivery

We're at the point in the project where we can describe high-level features that must be present within the llvm-mos universe. We can sort these conceptually into must-haves and nice-to-haves. This issue is to discuss which features ought to be present in the deliverables, what priorities they are in development, and how those features should be provided to the users.

The following is a brainstormed list in vaguely sorted order.

Necessary

  • clang & lld
    • Binary distributions for Windows, Linux, MacOS, etc.
  • per-platform linker scripts
  • per-platform crt0 support
  • example programs demonstrating working code in assembler and C
  • A core set of targets. (apple2e, commodore-disk, nes-ines, atari-xex, ...?)

Important

  • Additional targets (atari2600, bbcmicro, acorn, ohio scientific... ???)
  • at least one libc, presumably pre-compiled either generically or for various targets
  • Outreach to interested developers to influence design decisions early
  • llvm test-suite as part of the continuous build, particularly in the single-file and gcc c torture tests
  • a null emulator target for testing on the host that LLVM is compiled on
  • Documentation and wiki of existing parts and their features
  • Automated tests of compiled code on emulated targets

Nice to have

  • lldb support for MOS targets
  • real-time debugging, either in mame or some other lldb-supporting emulator
  • Packaging in rpm, deb, installers, etc.
  • cross-platform libraries for bitmap graphics and/or peripheral I/O
  • Sponsor support

Once this list is discussed, it will become a lot of individual issues and prioritized.

Refactoring fixit sweep, isel through AsmPrinter

Now that development is proceeding in a joint codebase, code cleanliness, simplicity, and readability are a risk factor for the project.
A sweep of the codebase should be performed to simplify, refactor, and clean the implementation. It's worth taking minor performance hits for simplicity's sake, so long as those hits do not represent architectural hurdles to undo. These optimizations will be re-done in cost/benefit order after C99 has been reached; they were only in the De-Risk phase to ensure that they were possible, but the code doesn't need to carry their weight while it's being generalized.

Edit:
As a result of this factor, every newly-introduced concept should have an accompanying documentation comment. This includes pseudoinstructions, logical instructions, pseudoregisters, and code generation library routines, among others.

Audit Instruction Selector

For each possible combination of legal machine instructions, the MOS instruction selector needs to be able to emit corresponding machine instructions/pseudos. The set of legal opcodes and types should be carefully audited to ensure that no legal cases escape the instruction selector. This needs to be done after the legalizer audit, since that is likely to introduce new cases that the instruction selector must handle.

Rewrite MOSAsmParser

The MOSAsmParser works, but it could probably work in about a third of the code that is currently present.

Rather than repair it, it should probably be rewritten more simply from scratch, with roughly equivalent functionality.

Get test suite passing at -O0, -O3, and -Oz

After targetting -Os, we need to make sure that the other optimization levels work as well. This should be done after auditing, as it should be a mostly perfunctory check at that point.

"cc65 compatibility"

Whenever I've brought up feature requests for a new compiler online, the near to first request that comes up is "cc65 compatibility."

On its face, this seems like a sensible request, but the deeper that you think about it, the less sensible this request becomes. ABI compatibility? All the #pragma's? How about all the predefined macros? Read and write cc65 library files? Support cl65 linker scripts?

I'd like to break out "cc65 compatibility," which I think is an abstract notion, into a smaller set of specific and measurable features.

1. ca65 compatibility. More specifically, a version of llvm-mc that's capable of emulating ca65, to the point where it can compile and link all the assembly code from the cc65 sdk. This is not necessarily the same as complete ca65 emulation, whatever that is. In practice, people could use the cc65 compiler to generate assembly code, and then pass their assembly through llvm-mc and our linker, to receive ELF introspection and real-time debugging and other things that we can do.

2. ABI compatibility. More, specifically, a version of llvm-mos that observes this [https://github.com/cc65/wiki/wiki/Parameter-passing-and-calling-conventions](calling convention). I'm a bit worried about even making this available, because all comparisons of llvm-mos to cc65 that use such a convention, will cause llvm-mos to look slow and embarrassing. A great part of llvm-mos's benefits is the control it has over the register allocator, which would all go away if the ABI were compatible. Thoughts?

3. clang-cc65 compatibility. Considering all the command line flags, #pragmas, and constants that cc65 assumes to exist, and all of which are deviations from ANSI C, I'm going to flat out state that we should not bother. Honestly, I'd be more tempted to simply give such users a Python script that simply strips all the cc65-specific #pragma's from their source code, converts all the asm commands to clang form, and just give them a header file in the SDK that sets all the cc65-specific constants. If that.

The end goal, of whatever we call cc65 compatibility, is to get people to leave cc65 alone and to proceed to use a more performant compiler. To that end, I think we might be able to get ahead of the problem, by publishing a few benchmarks of our own, comparing cc65's performance to llvm-mos. Interestingly, implementing item 1 above makes it straightforward to try these benchmarks ourselves.

In the pessimal case, our answer to the question of cc65 compatibility is "if it's C99, we can compile it." In a more optimal case, perhaps I can work on item 1, once some higher priority items are dealt with.

Pseudo-instruction Scavenging

Instead of saving and restoring live registers tightly around conflicting pseudoinstructions, larger regions can be spilled to hopefully keep the register free for other pseudoinstructions that need it. This logic is already implemented in RegisterScavenger; we may be able to reuse it.

"Possibly-Recursive" Optimization Remark

It'll be important for users to be able to detect and diagnose when LLVM considers a routine to be possibly-recursive. An optimization remark should be generated indicating which routines are considered recursive and why.

LLVMs optimization remark framework should be analyzed and a CUJ for using it for this purpose should be formed. Instructions for tuning code for non-recursiveness detection should be added to the project README.

Support interrupts

Presently, a function statically allocates its stack frame if it can be proven not to recurse. However, even code that does not directly recurse can still end up with multiple invocations active at the same time.

If a function is on the logical call stack when an interrupt occurs, the same function may end up getting called by the interrupt handler. The mechanism for this may not be directly visible by the C compiler: the ASM-language interrupt handler may call an external C symbol that eventually ends up calling the function. Notably, this can occur even if the function is a leaf, and even if it's address is never given to a function pointer.

Accordingly, the static stack optimization needs to be reserved for functions that are both provably nonrecursive and not callable by interrupt handlers, directly or transitively. A function being directly callable in an interrupt should be fairly rare, so it shouldn't be too burdensome to require an annotation in this case. For now, we can probably use attribute((interrupt)) for this purpose. In that case, we can find all functions that conservatively might be called by such a function, then somehow prevent the static stack from being used in any of them.

PostRA pseudo scheduling

Post-register-allocation pseudo-instructions cannot report their full side-effect profile, since their implementation will wildly differ depending on where the register allocator places their inputs and outputs. Once register allocation occurs, their side effect profiles will be known, and they might be reschedulable to locations where they don't interfere with live registers.

Why it's risky:
Without this, pseudoinstructions may expand to a very large amount of expensive save/restore code.

It's unclear how this rescheduling can be safely performed. The true requirements of a pseudoinstruction expansion would consist of some combination of virtual and physical registers. The virtual registers would need to be assigned a free physical register of the appropriate class, while the physical registers may interfere with other physical registers currently live around the pseudo.

Choose strategy for dealing with 65xx spurious reads and writes

The 6502 has a few quirks that can cause spurious reads and writes to occur. The CMOS edition fixes some, but not all, of these quirks. To avoid unintentionally triggering hardware behaviors, volatile memory accesses need to avoid any instruction that can cause spurious memory accesses. This provides a programmer mechanism to ensure that hardware register access precisely follows the C abstract machine model.

Note: if no platform we plan on supporting has read/write triggered hardware, we can skip this. I'll admit to not having done a through literature review of the OS manuals; it shouldn't be too hard to provide this guarantee, so it's likely less work to do this than to verify that we don't have to.

Repair spill hoisting optimization

At the very end of register allocation, LLVM's Greedy register allocator performs an optimization called "spill hoisting." First, redundant spills are eliminated within each basic block, and then spills are hoisted out of frequently-executed basic blocks up into ones exectuted less frequently.

Unfortunately, this optimization hoists spills by deleting them and re-emitting them, potentially sourcing the value from a completely different register. Unlike other targets, spilling certain register on the 6502 may require a temporary virtual register, so a spill may go from not needing a virtual register to needing one, as it's hoisted. LLVM contains logic to prevent spills that need virtual registers from being hoisted (introduced by X86), but not for this case, since the spill didn't originally have one, but gained one after being hoisted.

Repairing this optimization is nontrivial, so this optimization was disabled entirely for MOS. It's unclear how important this optimization actually is in practice, especially when compared with the other low-hanging fruit available. Accordingly, once we have effective benchmarking established, and once easy and impactful optimizations are exhausted, this should be measured and reexamined. It may be possible to do this piecemeal; it should be easier to reenable redundant spill elimination than spill hoisting.

Implement G_ICMP/G_SELECT

There's a "not yet implemented" assert that gets hit at https://github.com/llvm-mos/llvm-mos/blob/main/llvm/lib/Target/MOS/MOSInstructionSelector.cpp#L220 under a variety of conditions.

Codegen previous to instruction selection is capable of emitting sequences of the following form:

%65:any(s1) = G_ICMP intpred(ult), %99:any(s8), %64:any
%66:any(s1) = G_ICMP intpred(eq), %99:any(s8), %64:any
%67:any(s1) = G_ICMP intpred(ult), %98:any(s8), %63:any
%33:any(s1) = G_SELECT %66:any(s1), %67:any, %65:any

LLVM is trying to pipeline these G_ICMP calls for us, and as a result lowering is not only harder, we're doing some work that we are about to throw away.

Audit Copy Pseudos

LLVM can emit three types of COPY pseudo instructions:

  1. Copies between two registers in the same register class are considered safe essentially anywhere in the pipeline.
  2. Copies between register classes are reserved for use in the instruction selector.
  3. The register allocator can emit copies between a register in one register class and another register in the "largest legal superclass". This allows the register allocator to widen register classes when splitting live ranges via copy.

We should make sure that all of these types of possible copies are handled in copyPhysReg by auditing the register class definitions, the instruction selector, and the getLargestLegalSuperclass() function, respectively.

Get automatic releases working again

The LLVM cache file was moved from the old SDK into LLVM proper, with the rest of the distribution configurations.

Make sure that Github is capable of building a release distribution, on every checkin, on every branch, in any fork of llvm-mos.

Review llvm-mos.org

The llvm-mos.org web site now has a more substantial body of information in it. I'd like to modify the projects' READMEs to refer to various pages on llvm-mos.org.

The current contents of the readme have been migrated to https://llvm-mos.org/wiki/Getting_started . The git readme should be redirect to the appropriate pages on llvm-mos.org.

Additionally, articles on llvm-mos.org should be audited and added as necessary. The process for adding or modifying content should currently be relatively low friction.

Verify inline assembly in C/C++ works per gcc/llvm documentation

Even though it's not technically part of the C99 standard, inline assembly support is crucial to the effective use of this compiler.

Accordingly, all relevant features of inline assembly should work according to the GCC documentation. It's unclear exactly what this entails, so this issue is a placeholder which should be replaced with specific work to complete the current inline assembly implementation.

Get SingleSource test suite passing

It looks it's much easier than I thought to debug broken test cases; now that we have a working printf, printf debugging works pretty well, and the tests cases are pretty small. Accordingly, we can sprint directly towards getting as many of the 1500 or so SingleSource tests passing as possible.

A great number of tests assume 32-bit integers; we should repair as many of these as possible afterwards if the cases they cover are actually applicable to MOS.

Finally, we can begin the original audit process to make sure there's nothing we missed. I'd expect that LLVM's SingleSource test suite isn't perfectly comprehensive; we'll need to build up the LLVM codegen unit tests for any cases we discover during the audit sweep.

Audit Register Scavenger

The register scavenger may be called upon to spill and reload any register inside an allocatable register class that is used inside and instruction sequence that is scavenged. The emission of such instruction sequences (all are presently in PEI) should be examined to determine this set, and it should be ensured possible for the scavenger to save/restore at least one copy of each such register.

There's a sliding scale of support on this; the scavenger is "last ditch", so every target has some small fixed cap to the number of possible saves. We just need to make sure that that limit is high enough that we never run into it in practice. One per register class seems like a good starting point; less that that is asking for trouble.

Audit Legalizer

The GlobalISel IR Translator phase can produce a great variety of global machine instructions and types from the input IR. The IR translator implementation should be carefully examined and audited to ensure that no matter what it emits, the MOS legalizer is capable of handling it. Only IR constructs producible by Clang in C99 strictly need to be handled, but the generality of the legalizer should not be artificially limited.

G_PTR_ADD can create G_CONSTANT with no assigned register bank

In some cases, the MOSCombiner pass may create a new virtual register and assign a G_CONSTANT to it. This will happen if, during the MOSCombiner step, the CombinerHelper::matchPtrAddImmedChain and CombinerHelper::applyPtrAddImmedChain functions get called.

However, the MOSCombiner pass happens after register bank selection. Ergo, because of the newly created virtual register not being assigned to any bank, MOSCombiner will fail verification after this pass, and abort.

Several workarounds are possible. We could recreate the MOSCombinerHelper and special case these functions. We could run a "mini" register bank selection pass ourselves, after MOSCombiner, which should be easy since we have one bank.

All this happens before the MOSInstructionSelector, so changes there won't help.

Before MOSCombiner pass:

bb.1.entry:
liveins: $a, $x, $y, $rc6, $rs1
%0:any(p0) = COPY $rs1
%3:any(s8) = COPY $a
%4:any(s8) = COPY $x
%5:any(s8) = COPY $y
%6:any(s8) = COPY $rc6
G_STORE %3:any(s8), %0:any(p0) :: (store 1 into %ir.x11, !tbaa !2)
%13:any(s8) = G_CONSTANT i8 1
%17:any(p0) = G_PTR_ADD %0:any, %13:any(s8)
G_STORE %4:any(s8), %17:any(p0) :: (store 1 into %ir.x11 + 1, !tbaa !2)
%14:any(s8) = G_CONSTANT i8 2
%8:any(p0) = G_PTR_ADD %0:any, %14:any(s8)
G_STORE %5:any(s8), %8:any(p0) :: (store 1 into %ir.y2, !tbaa !7)
%11:any(p0) = G_PTR_ADD %8:any, %13:any(s8)
G_STORE %6:any(s8), %11:any(p0) :: (store 1 into %ir.y2 + 1, !tbaa !7)
RTS

After:

bb.1.entry:
liveins: $a, $x, $y, $rc6, $rs1
%0:any(p0) = COPY $rs1
%3:any(s8) = COPY $a
%4:any(s8) = COPY $x
%5:any(s8) = COPY $y
%6:any(s8) = COPY $rc6
G_STORE %3:any(s8), %0:any(p0) :: (store 1 into %ir.x11, !tbaa !2)
%13:any(s8) = G_CONSTANT i8 1
%17:any(p0) = G_PTR_ADD %0:any, %13:any(s8)
G_STORE %4:any(s8), %17:any(p0) :: (store 1 into %ir.x11 + 1, !tbaa !2)
%14:any(s8) = G_CONSTANT i8 2
%8:any(p0) = G_PTR_ADD %0:any, %14:any(s8)
G_STORE %5:any(s8), %8:any(p0) :: (store 1 into %ir.y2, !tbaa !7)
%19: _ (s8) = G_CONSTANT i8 3
%11:any(p0) = G_PTR_ADD %0:any, %19 : _ (s8)

G_STORE %6:any(s8), %11:any(p0) :: (store 1 into %ir.y2 + 1, !tbaa !7)
RTS

Resolve Mechanism for Generating Platform Binaries

The mechanism by which users will generate platform binaries from C and ASM sources has come into question. The available solution space spans presupplied makefiles and shell scripts, Clang driver integration, lld linker scripts, and various combinations of the above. The space should be explored and multiple candidate solutions generated, to make the pros and cons clearer, from the perspective of usability, maintainability, and extensibility.

Support feature bits in object files

Support has been laid out already for feature bits, so that the various 6502 variance can be marked with their own special instructions. However, there's no provision for making sure to link only compatible versions of the 6502. It should be possible to query one of the flags in the ELF header of any object file, to see which processor features are supported.

"No live subrange at use" after register allocation

I'm seeing a case where, when running llc with -verify-machineinstrs, that the MachineVerifier step is reporting that no live subrange exists at the point where an operand is used.

Some evidence suggests that MOSInstrInfo::loadStoreRegStackSlot() may be involved in this behavior, but I have low confidence in my assessment.

The command line that is generating this error is as follows:

llc -O2 -debug --debug-pass=Details -verify-machineinstrs -o game.s game.ll

game.ll and llc's log file are attached.

It would be helpful to understand what debugging strategy you would follow, to track down the source of this behavior.

.ll source: game.zip
build.log

The end of the log file is as follows:

# End machine code for function one_frame.

*** Bad machine code: No live subrange at use ***
- function:    one_frame
- basic block: %bb.5 for.body (0x166807ee988) [1660B;1828B)
- instruction: 1692B	%688:gpr = COPY %686.subhi:imag16
- operand 1:   %686.subhi:imag16
- interval:    %686 [1668r,1692r:0)  0@1668r L0000000000000002 [1668r,1676r:0)  0@1668r weight:2.326217e-01
- at:          1692B

*** Bad machine code: No live subrange at use ***
- function:    one_frame
- basic block: %bb.5 for.body (0x166807ee988) [1660B;1828B)
- instruction: 1716B	%678:gpr = COPY %677.sublo:imag16
- operand 1:   %677.sublo:imag16
- interval:    %677 [1708r,1732r:0)  0@1708r L0000000000000020 [1708r,1732r:0)  0@1708r weight:2.304062e-01
- at:          1716B
LLVM ERROR: Found 2 machine code errors.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.	Program arguments: C:\\git\\llvm-mos\\build\\bin\\llc.exe -O2 -debug --debug-pass=Details -verify-machineinstrs -o C:\\git\\llvm-mos/build/game.s C:\\git\\llvm-mos/llvm/test/CodeGen/MOS/game.ll
1.	Running pass 'Function Pass Manager' on module 'C:\git\llvm-mos/llvm/test/CodeGen/MOS/game.ll'.
2.	Running pass 'Verify generated machine code' on function '@one_frame'
Exception thrown at 0x00007FF69C4F63E0 in llc.exe: 0xC000001D: Illegal Instruction.
The program '[15316] llc.exe' has exited with code 0 (0x0).



Support function pointer calls via indirect jump

Since the 6502 has no indirect jump instruction, function pointer calls must emitted by JSR-ing to an indirect jump. The indirect jump can be implemented either via the RTS trick or via a real JMP indirect instruction.

Promote Stack Slots not Live Across Calls to ZP Regs

Any stack slots that can be proven not to live across any calls can be promoted to any available zero-page registers.

Such values would normally be in zero page registers; if they weren't, it means that they were either spilled or they had their address taken. If they were spilled, optimizations may have freed up zero page registers since. If they had their address taken, they may be freely assigned to zero page registers, since all those locations have real memory addresses.

Jump Tables

LLVM can be made to emit jump tables for dense portions of switch statements. This allows constant time dispatch instead of a logarithmic number of comparisons.

The 6502 could implement this using either the RTS trick or an indirect JMP. LLVM needs to be tuned to only emit jump tables when they would be more efficient than the replaced comparisons; indirect branches are fairly cheap, so this may be "almost always."

Add doxygen comments to non-overridden functions

All non-overridden functions should have Doxygen comments describing their purpose and broad semantics. Any footguns should be called out.

Any functions where the implementation mechanism or rationale is non-obvious should have additional comments giving this information.

Audit Stack Slot Spill/Reload

The register allocator may spill or reload a virtual register of any allocatable register class to the stack. All such register classes need to be examined to ensure that the the target-specific load/store code can handle them.

SEV replacement will not work as expected

To compensate for the missing SEV instruction on the 6502, a case in MOSInstrInfo.cpp has been added to try to emit a BIT instruction in place of a SEV instruction:

  case MOS::V:
    if (Val == 1) {
      // The V flag may be set by a BIT of a 16-bit address, which contains a
      // byte with bit 6 set.
      Builder.buildInstr(MOS::BITAbs, {Register(MOS::V)}, {Register(MOS::A)})
          .addExternalSymbol("__set_v");
      MI.eraseFromParent();
      return;
    }

The intention here is to use a BIT instruction to compensate. However, there are two problems here. First, if this code actually executes, then in debug mode an assert will occur as follows:

*** Bad machine code: Using an undefined physical register ***
- function:    set_entities
- basic block: %bb.2 for.body (0x29026f327b8)
- instruction: $v = BITAbs $a, &__set_v
- operand 1:   $a

This assert is thrown from MachineVerifier.cpp, around line 2231.

Second, this workaround does not do what it's intended to do. The BIT instruction overwrites whatever's in the N flag with whatever's in bit 7 of the accessed memory. Ergo, set_v trashes N unexpectedly.

Codegen does try to use V as a general purpose 1 bit flag.

Several workarounds are possible. A software implementation of SEV is possible, but writing a version that keeps all the other flags intact might go something like:

php
pla
ora #$40
pha
plp

which seems pretty expensive just to set the V bit, and it trashes the accumulator to boot. Better off, perhaps, not to assume the V bit is generally usable by codegen as a 1-bit register.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.