Overview
Requirements
- Multiple register classes
- Register class conflicts
- PHI nodes
- Spill and filling
Want
- Host flag handling
- Host flag retaining and spilling
- Callee/Caller understanding
- RA constraints on instructions
Requirements
Register classes
We need to support multiple register classes. Most pretty trivial
- GPRs
- These are 32bit and 64bit sized GPRs that map directly to x86-64 and AArch64 host registers
- AArch64: x0-x30
- X86-64: RAX-R15
- FPRs
- These are 64bit and 128bit vector registers that map directly to x86-64 and AArch64 host registers
- AArch64: v0-v31
- X86-64: xmm0-xmm15
- Flag register
- These map directly to the x86-64 and AArch64 host registers.
- Not strictly necessary but will be in the future when we want optimal codegen
- Host Backend will need to provide list of IR ops that overwrite the flag register to ensure correct spilling
- AArch64: PSTATE.<N,Z,C,V>
- X86-64: EFLAGS
- GPRPair
- For 128bit compare and swap specifically we need paired GPR operations.
- AArch64's
CASP*instructions take two GPRPair arguments for Expected and Desired results - These need to directly conflict with the GPR class for the RA to function
- Pairs
{x0, x1}, {x2, x3}, ...conflict with GPRsx0, x1, x2, x3,... - Need to be consecutive and even starting number for AArch64
Register class conflicts
Currently only GPR and GPRPair are the register class conflicts we need to understand. In the future there may be the idea of using SVE for AVX256 emulation. So the SVE class would conflict with the FPR class. Future goals there though
PHI nodes
We currently support PHI nodes to a certain extent. We need to support this more in the future for when we hoist variable usage to the top of loops. The current problem with PHI nodes is that their live range isn't calculated correctly and will break when going through a backedge. This is because the phi's live range is currently ending at its last use in the block rather than on the backedge of the block that goes back to its declaration.
- Currently unused but once fixed would improve performance of x86 string ops and pave way for future hoisting
Spill and filling
Spilling and filling is currently supported. This is fairly straightforward and just spills the host registers on to the stack in a stack slot and refills when necessary. Live ranges are calculated with the spill slots and the slots will be reused if possible.
- Currently spill slot sizes just set to size of 16bytes.
- Causes some wasted stack space when pushing smaller elements
- Might be nice to improve spill slot calculation to pack by size and more cleanly
Wants
Host flag handling
Currently host flag handling just means pulling the flags at comparison instruction time (FCMP in this case) and then the flags are pushed in to a GPR. It would be nice to have these stay in the host flag as long as possible so we can DCE unused flags. This is a longer term goal mostly
- Spilling & Filling is possible on both x86-64 and AArch64.
- Could either spill to GPR and keep around or just dump to a spill stack at that point
- AArch64: Uses MRS/MSR NZCV
- X86-64: Uses LAHF/SAHF
Callee/Caller understanding
Let the CPU backend tell the RA what range of registers is callee/caller saved and have the CPUBackend be able to query which values are live at a given Node location This is necessary to get C ABI function calls working more efficiently in the JITs
RA constraints on instructions
Add constraints on to the IR ops that must be fulfilled. Specifically this is to remove extraneous MOV instructions in the IR ops. A good example is the CAS/CASPair instructions where the Expected value must also be the destination. Having the ability to add an constraint on the IR that Dest = Arg[0] sort of thing will allow use to remove extraneous moves in quite a few instructions.