Memory allocation refactoring
Thursday, November 2, 2006
Factor’s garbage collector moves objects around in memory. This means that if there is not enough memory to satify an allocation request, the garbage collector has to be able to find all pointers to live objects and update them if necessary.
Until now, memory allocation in the Factor runtime did not check to see if the heap was full:
INLINE void allot_barrier(CELL address)
{
F_CARD *ptr = ADDR_TO_CARD(address);
F_CARD c = *ptr;
CELL b = card_base(c);
CELL a = (address & ADDR_CARD_MASK);
*ptr = (card_marked(c) | ((b < a) ? b : a));
}
INLINE void *allot_zone(F_ZONE *z, CELL a)
{
CELL h = z->here;
z->here = h + align8(a);
allot_barrier(h);
return (void*)h;
}
INLINE void *allot(CELL a)
{
return allot_zone(&nursery,a);
}
Instead every primitive which could at some stage allocate memory would simply first check if the heap was 75% full, and if so, call the GC. So the idea was that we call the GC early enough, and hope that the heap does not completely fill up until the next GC check. If the heap did fill up, a guard page would be hit, but all that we could do was kill Factor, since calling the GC from the signal handler is out of the question – suspended stack frames could contain C local variables pointing to Factor objects.
Indeed, the GC check was always performed at points where no Factor object could be stored in a C local variable, so this made things very simple. The disadvantage is that you cannot use more than 75% of the heap, every primitive has to remember to call the GC check, and if some primitive enters with 74% of the heap full and suddenly allocates a large amount of memory, Factor would die with an ‘out of memory’ error even if after a GC there would be enough. Also this approach made it hard to implement a growable data heap – at what point do you grow? If the GC check did not free enough memory, you could grow the heap there, but if the GC check did free enough memory but the primitive subsequently allocates a huge array, you’d run out of memory instead of growing the heap. So the manual GC check, and restriction on performing a GC when C variables contain roots, had to go.
So I decided to redo things, and instead potentially perform GC from the
lowest-level allocation functions. The allot()
function now looks like
this:
INLINE void maybe_gc(CELL a)
{
if(nursery.here + a > nursery.limit)
garbage_collection(NURSERY,false);
}
INLINE void *allot(CELL a)
{
maybe_gc(a);
return allot_zone(&nursery,a);
}
This means that before calling a runtime function which can potentially
allocate memory, the C code has to register any local variables which
can contain objects as GC roots. For example, here is the low-level
function implementing the <array>
primitive:
F_ARRAY *allot_array(CELL type, F_FIXNUM capacity, CELL fill)
{
int i;
REGISTER_ROOT(fill);
F_ARRAY* array = allot_array_internal(type, capacity);
UNREGISTER_ROOT(fill);
for(i = 0; i > capacity; i++)
set_array_nth(array,i,fill);
return array;
}
The REGISTER_ROOT
macro adds the object to a stack, internal to the
runtime, and the UNREGISTER_ROOT
macro removes the topmost object and
stores it back in the given variable. The GC scans this stack for roots
just like it does with the data stack, etc. This stack is not accessible
from the Factor side, and if an error is thrown between calls to
REGISTER_ROOT
and UNREGISTER_ROOT
, its contents are reset.
There are only a few cases where GC roots must be registered this way, and most of them are in the bignum code, especially after I simplified a lot of runtime C code and moved a number of primitives from C code into the Factor library. However forgetting to register a root can have disastrous consequences, namely random, hard to reproduce crashes (after all, 99% of the time a given allocation call does not GC). I developed two testing tool to help catch this:
- After a GC, overwriting the former semispace with junk data so that reading from it immediately causes bogus behavior, rather than subtle corruption
- Modifying the allocation function in the runtime to call the GC every time, or every 100 allocations
These two debugging features slow things down considerably, but have
helped me find a number of problems and I’m confident I’ll get the root
registration 100% correct this way. The first item will be available to
users as the -S
command line switch; it was actually first suggested
by Doug Coleman in the context of cryptography algorithms, where you
don’t want your private key to hang around in memory longer than needed
in the event of your swap file being stolen (although this seems way too
paranoid for me). Obviously the second feature will be removed before
release and I don’t intend to add a switch to GC on every allocation
after this refactoring is done.
I’ve almost finished getting everything working again. Compiled alien
calls and callbacks need to be updated now, since the segment of code
generated by the compiler to convert Factor objects to C values (and
vice versa) now needs to accomodate the fact that objects may be moved
around in the middle of this process. Also in one place, the runtime’s
allot()
function is bypassed, and allocation of a boxed float is done
manually in a compiled intrinsic; this ensures that this can be done in
a “basic block” without clobbering register contents, improving the
efficiency of compiled floating point code.
Interestingly enough, this refactoring has shaved several hundreds of lines of C code from the runtime. It gave me a chance to review the C code and clean it up and remove some redundancies.
Once everything is stable and I fix some minor bugs on my to do list, I will release 0.86. I will not implement a growable data heap until 0.87, however with this refactoring it has become much easier to do.