Inline allocation refactoring
Thursday, November 9, 2006
On PowerPC and x86 processors with SSE2, Factor supports floating point intrinsics, and the final boxing of a floating point value is entirely open-coded, so no GC check was done. Fixing this was the final stage of the Allocation refactoring.
At the end of a basic block, the compiler calls end-basic-block
, which
boxes any floating point values stored in registers, and moves them to
the data stack. It also moves all other integer values and pointers to
the stack too. Within a basic block, no stack writes are performed, and
stack shuffling takes place at compile time in the form of register
renaming. The fix for the inline allocation GC issue was to simply leave
1kb free at the top of the heap at all times, and if any float values
were boxed, compile a call to a C function which performs a GC check in
end-basic-block
. This smells like the old issue that the entire GC
refactoring was supposed to avoid (the GC on 75% heap consumption being
inadequate) however very little memory is allocated by boxing floats,
much less than 1Kb.
However there was one sticking point; in a handful of locations,
end-basic-block
would be called to generate machine code in a context
where certain register values are not to be clobbered. For example,
conditionals worked this way: first, the value at the top of the stack
would be popped (which might not generate any code at all, if it is in a
register), then the basic block would end so registers would sync with
the in-memory stack, and then the conditional jump was performed based
on the value of this register. This would no longer work, since
end-basic-block
could now generate a call into C, and thus the
register holding the condition would be clobbered. The fix is not to end
the basic block at a conditional jump, but instead “split” the virtual
register representation of the stack at both branches. So for example if
r3
held some known value before a conditional, the compiler could
reuse this value in either branch without reading it back in.
Another place where the compiler called end-basic-block
was when
generating fixnum intrinsics which could overflow. These open-coded
intrinsics are more efficient than calls into C, since in the common
case of there not being an overflow, the code path re-uses registers
rather than reading from the stack, and so on. For example, fixnum+
adds two integers, and if the result overflows, it allocates a bignum.
Again, the problem was that it would read two values from the stack into
registers (which might not have generated any machine code at all, if
the value were already in registers), then it would end the basic block,
then it would add them, and if an overflow occurred, it would call into
C to allocate a bignum. This worked in the past, but now that the end of
a basic block can call GC, the fix here is to open-code the bignum
allocation code, and therefore avoid the need to end the basic block
before fixnum+
is assembled in.
The other cases where end-basic-block
were called remain untouched;
namely, before jumping to another word definition, before returning from
a word definition, and before an explicit alien call into C. In these
cases, no values were held on in registers so a GC is safe. The removal
of calls to end-basic-block
from around conditionals and overflowing
fixnum primitives means the compiler can keep more values in registers,
generate less code to shuffle values between registers and memory, and
box less floats. And of course, it also means that certain operations
won’t run out of memory even if plenty is available, simply because no
GC check gets done.
I still have to fix the fixnum*
intrinsic on both platforms; this
results in an inline allocation of a 2-cell bignum, which is trickier
than 1-cell bignum like fixnum+
. I’m also hunting down a compiler bug
which results in some words from contrib/crypto
being miscompiled.
Once these two items are done, I will release 0.86.