Reducing image size by eliminating literal tables from code heap entries
Sunday, December 6, 2009
Introduction
The compiled code heap consists of code blocks which reference other code blocks and objects in the data heap. For example, consider the following word:
: hello ( -- ) "Hello world" print ;
The machine code for this word is the following:
000000010ef55f10: 48b87b3fa40d01000000 mov rax, 0x10da43f7b
000000010ef55f1a: 4983c608 add r14, 0x8
000000010ef55f1e: 498906 mov [r14], rax
000000010ef55f21: 48bb305ff50e01000000 mov rbx, 0x10ef55f30 (hello+0x20)
000000010ef55f2b: e9e0ffabff jmp 0x10ea15f10 (print)
The immediate operand of the first mov
instruction (0x10da43f7b
) is
the address of the string "Hello world"
in the data heap. The
immediate operand of the last jmp
instruction (0x10ea15f10
) is the
address of the machine code of the print
word in the code heap.
Unlike some dynamic language JITs where all references to data and
compiled code from machine code are done via indirection tables, Factor
embeds the actual addresses of the data in the code. This means that the
garbage collector needs to be able to find all pointers in a code heap
block (for the “sweep” phase of garbage collection), and update them
(for the “compact” phase).
Relocation tables
Associated to each code block is a relocation table, which tells the VM what instruction operands contain special values that it must be aware of. The relocation table is a list of triples, packed into a byte array:
- The relocation type is an instance of the
relocation_type
enum in instruction_operands.hpp. This value tells the VM what kind of value to deposit in the operand – possibilities include a data heap pointer, the machine code of a word, and so on. - The relocation class is an instance of the
relocation_class
enum in instruction_operands.hpp. This value tells the VM how the operand is represented – the instruction format, whether or not it is a relative address, and such. - The relocation offset is a byte offset from the start of the code block where the value is to be stored.
Code that needs to inspect relocation table entries uses the
each_instruction_operand()
method defined in
code_block.hpp.
This is a template method which can accept any object overloading
operator()
.
Literal tables
The next part is what I just finished refactoring. I’ll describe the old approach first. The simplest way, and what Factor used until now, is the following. Relocation table entries that expect a parameter, such as those that deposit addresses from the data heap and code heap, take the parameter from a literal table associated to each code block. When the compiler compiles a word, it spits out some machine code and a literal table. It hands these off to the Factor VM. The “sweep” phase of the garbage collector traces each code block’s literal table, and the “compact” phase, after updating the literal table, stores the operands back in each instruction referenced from the relocation table.
Eliminating literal tables
The problem with the old approach is that the garbage collector doesn’t really need the literal table. The address of each data heap object and code block referenced from machine code is already stored in the machine code itself. Indeed, the only thing missing until now was a way to read instruction operands out of instructions. With this in place, code blocks no longer had to hold on to the literal table after being constructed. Each code block’s literal table is only used to deposit the initial values into machine code when a code block is first compiled by the compiler. Subsequently, the literal table becomes garbage and is collected by the garbage collector. When tracing code blocks, the garbage collector traverses the instruction operands themselves, using the relocation table alone. In addition to the space savings gained by not keeping these arrays of literals around, another interesting side-effect of this refactoring is that a full garbage collection no longer resets generic word call sites back to the cold call entry point, which would effectively discard all inline caches (read about inline caching in Factor).
Coping with code redefinition
A call to a word is compiled as a direct jump in Factor. This means that
if a word is redefined and recompiled, existing call sites need to be
updated to point to the new definition. The implementation of this is
slightly more subtle now that literal tables are gone. Every code block
in the code heap has a reference to an owner
object in its header (see
the code_block
struct in
code_blocks.cpp).
The owner is either a word or a quotation. Words and quotations, in
turn, have a field which references a compiled code heap block. The
latter always points at the most recent compiled definition of that
object (note that quotations are only ever compiled once, because they
are immutable. Words, however, can be redefined by reloading source
files). At the end of each compilation unit, the code heap is traversed,
and each code block is updated in turn. The code block’s relocation
table is consulted, and instruction operands which reference compiled
code heap blocks are updated. Before this would be done by overwriting
all call targets from the literal table. Now, this is accomplished by
looking at the owner object of the current target, and then storing the
owner’s most recent code block back in the instruction operand. This is
implemented in the update_word_references()
method defined in
code_blocks.cpp.
In addition to helping with redefinition, the owner object reference is
used to construct call stack traces.
Additional space savings in deployed binaries
Normally, every compiled code block references its owner object, so that code redefinition can work. This means that if a word or quotation is live, then the code block corresponding to its most recent definition will be live, and vice versa. In deployed images where the compiler and debugger have been stripped out, words cannot be redefined and stack traces are not needed, so the owner field can be stripped out. This means that a word object can get garbage collected at deploy time even if its compiled code block is called. As it turns out, most words are never used as objects, and can be stripped out in this manner. So the literal table removal has an even bigger effect in deployed images than development images.
Size comparison
The following table shows image sizes before and after this change on Mac OS X x86-64.
Image | Before (Megabytes) | After (Megabytes) |
---|---|---|
Development image | 92 Mb | 90 Mb |
Minimal development image | 8.6 Mb | 8.2 Mb |
Deployed hello-ui | 2.3 Mb | 1.5 Mb |
Deployed bunny | 3.5 Mb | 3.1 Mb |