Reducing image size by eliminating literal tables from code heap entries

Sunday, December 6, 2009

Introduction

The compiled code heap consists of code blocks which reference other code blocks and objects in the data heap. For example, consider the following word:

: hello ( -- ) "Hello world" print ;

The machine code for this word is the following:

000000010ef55f10: 48b87b3fa40d01000000  mov rax, 0x10da43f7b
000000010ef55f1a: 4983c608              add r14, 0x8
000000010ef55f1e: 498906                mov [r14], rax
000000010ef55f21: 48bb305ff50e01000000  mov rbx, 0x10ef55f30 (hello+0x20)
000000010ef55f2b: e9e0ffabff            jmp 0x10ea15f10 (print)

The immediate operand of the first mov instruction (0x10da43f7b) is the address of the string "Hello world" in the data heap. The immediate operand of the last jmp instruction (0x10ea15f10) is the address of the machine code of the print word in the code heap.
Unlike some dynamic language JITs where all references to data and compiled code from machine code are done via indirection tables, Factor embeds the actual addresses of the data in the code. This means that the garbage collector needs to be able to find all pointers in a code heap block (for the “sweep” phase of garbage collection), and update them (for the “compact” phase).

Relocation tables

Associated to each code block is a relocation table, which tells the VM what instruction operands contain special values that it must be aware of. The relocation table is a list of triples, packed into a byte array:

The relocation type is an instance of the relocation_type enum in instruction_operands.hpp. This value tells the VM what kind of value to deposit in the operand – possibilities include a data heap pointer, the machine code of a word, and so on.
The relocation class is an instance of the relocation_class enum in instruction_operands.hpp. This value tells the VM how the operand is represented – the instruction format, whether or not it is a relative address, and such.
The relocation offset is a byte offset from the start of the code block where the value is to be stored.

Code that needs to inspect relocation table entries uses the each_instruction_operand() method defined in code_block.hpp. This is a template method which can accept any object overloading operator().

Literal tables

The next part is what I just finished refactoring. I’ll describe the old approach first. The simplest way, and what Factor used until now, is the following. Relocation table entries that expect a parameter, such as those that deposit addresses from the data heap and code heap, take the parameter from a literal table associated to each code block. When the compiler compiles a word, it spits out some machine code and a literal table. It hands these off to the Factor VM. The “sweep” phase of the garbage collector traces each code block’s literal table, and the “compact” phase, after updating the literal table, stores the operands back in each instruction referenced from the relocation table.

Eliminating literal tables

The problem with the old approach is that the garbage collector doesn’t really need the literal table. The address of each data heap object and code block referenced from machine code is already stored in the machine code itself. Indeed, the only thing missing until now was a way to read instruction operands out of instructions. With this in place, code blocks no longer had to hold on to the literal table after being constructed. Each code block’s literal table is only used to deposit the initial values into machine code when a code block is first compiled by the compiler. Subsequently, the literal table becomes garbage and is collected by the garbage collector. When tracing code blocks, the garbage collector traverses the instruction operands themselves, using the relocation table alone. In addition to the space savings gained by not keeping these arrays of literals around, another interesting side-effect of this refactoring is that a full garbage collection no longer resets generic word call sites back to the cold call entry point, which would effectively discard all inline caches (read about inline caching in Factor).

Coping with code redefinition

A call to a word is compiled as a direct jump in Factor. This means that if a word is redefined and recompiled, existing call sites need to be updated to point to the new definition. The implementation of this is slightly more subtle now that literal tables are gone. Every code block in the code heap has a reference to an owner object in its header (see the code_block struct in code_blocks.cpp). The owner is either a word or a quotation. Words and quotations, in turn, have a field which references a compiled code heap block. The latter always points at the most recent compiled definition of that object (note that quotations are only ever compiled once, because they are immutable. Words, however, can be redefined by reloading source files). At the end of each compilation unit, the code heap is traversed, and each code block is updated in turn. The code block’s relocation table is consulted, and instruction operands which reference compiled code heap blocks are updated. Before this would be done by overwriting all call targets from the literal table. Now, this is accomplished by looking at the owner object of the current target, and then storing the owner’s most recent code block back in the instruction operand. This is implemented in the update_word_references() method defined in code_blocks.cpp. In addition to helping with redefinition, the owner object reference is used to construct call stack traces.

Additional space savings in deployed binaries

Normally, every compiled code block references its owner object, so that code redefinition can work. This means that if a word or quotation is live, then the code block corresponding to its most recent definition will be live, and vice versa. In deployed images where the compiler and debugger have been stripped out, words cannot be redefined and stack traces are not needed, so the owner field can be stripped out. This means that a word object can get garbage collected at deploy time even if its compiled code block is called. As it turns out, most words are never used as objects, and can be stripped out in this manner. So the literal table removal has an even bigger effect in deployed images than development images.

Size comparison

The following table shows image sizes before and after this change on Mac OS X x86-64.

Image	Before (Megabytes)	After (Megabytes)
Development image	92 Mb	90 Mb
Minimal development image	8.6 Mb	8.2 Mb
Deployed hello-ui	2.3 Mb	1.5 Mb
Deployed bunny	3.5 Mb	3.1 Mb