Two-tier compilation comes to Factor

Monday, September 10, 2007

Factor 0.90 uses the following execution model:

Words with a static stack effect are compiled to machine code by the optimizing compiler.
Words which use continuations but otherwise have static stack effects run in the interpreter.
Words with dynamic stack effects run in the interpreter.
Primitives which modify the code heap are considered to have a dynamic stack effect, only to make sure that words calling them do not compile. This is because adding a new compiled code block can trigger a code GC, and code GC is not smart enough to scan the native call stack to identify return addresses as roots.

The interpreter is a rather inefficient affair, quotations are stored on an interpreter callstack; execution of a quotation proceeds by inspecting each element. If the element is a word, it is executed, if it is a wrapped word, it is unwrapped, and otherwise, it is a literal pushed on the data stack. This entails a runtime type dispatch for each element of the quotation.

What I’m working on right now is a new execution model. The code isn’t ready yet and probably won’t be for another week, but here is how everything works so far:

Words with a static stack effect are compiled to machine code by the optimizing compiler, just like before.
Words which use continuations but otherwise have static stack effects are compiled to machine code by the optimizing compiler - this is new!
Words with dynamic stack effects are compiled to machine code by the new non-optimizing compiler.
The GC now knows about native stack frames and is now able to traverse them when scanning for roots.

This is really hardcore – Factor does not have an interpreter of any form now, except for the single-stepper in tools.interpreter. The amount of code which is compiled optimized has increased now that I/O can compile (I/O uses continuations under the hood to implement non-blocking, and would run in the interpreter; a significant performance problem). Not only that, but the remaining, truly dynamic code is compiled by a “mop-up” compiler.

Ages ago, quotations used to be linked lists. When I made them a distinct type of their own, I had this form of “quotation compilation” in mind as an eventual possibility. See this old article, and this one. It is good that more than a year later I’ve finally carried over this transition to its logical conclusion.

Quotations are still sequences, but are now immutable, and have an additional slot holding a pointer to a compiled code block.

The new non-optimizing compiler is written in C due to bootstrap reasons, but it is very trivial: it simply unrolls the interpreter loop, eliminating the dynamic dispatch, and using the native call stack instead of a fake interpreter call stack. Everything it compiles is a concatenation of a fixed number of basic blocks, which are assembled and provided by Factor code. So the C side is essentially just a glorified use-case of memcpy(). For example,
[ 2 + * ]
compiles to very naive machine code which performs the following actions:

A stack frame prologue – this is omitted if the only calls to words are in tail position
Push literal 2
Call + word
Tail call (jump to) * word

While the generated code is only 2x faster than the interpreter, compile time is very fast and I don’t expect any performance problems for code which constructs quotations on the fly.

Because Factor now controls the stack frame layout from top to bottom, implementing capture and restoration of compiled call stacks is relatively trivial. Scanning them for GC roots is a bit trickier. Note that Factor’s GC is not conservative in any way; it manages to precisely identify all roots. Some language implementors claim that conservative collection is the only way to scavenge the native stack for roots, but this is false.

And not having an interpreter is a political victory of sorts. Factor is not an interpreted language, period – it’s thoroughly natively compiled.

The Self language uses a similar, but more advanced implementation strategy; a fast, non-inlining compiler compiles all code, and a highly optimizing compiler compiles hot-spots. Factor’s optimizing compiler compiles all code for which it can infer a stack effect, and it doesn’t try to do runtime type feedback.