How Factor implements closures
Monday, January 18, 2010
The recent blog post from the Clozure CL team on Clozure CL’s implementation of closures inspired me to do a similar writeup about Factor’s implementation. It is often said that “closures are a poor man’s objects”, or “objects are a poor man’s closures”. Factor takes the former view, because as you will see they are largely implemented within the object system itself.
Quotations
First, let us look at quotations, and what happens internally when a quotation is called. A quotation is a sequence of literals and words. Quotations do not close over any lexical environment; they are entirely self-contained, and their evaluation semantics only depend on their elements, not any state from the time they were constructed. So quotations are anonymous functions but not closures.
Internally, a quotation is a pair, consisting of an array and a machine code entry point. The array stores the quotation’s elements, and when you print a quotation with the prettyprinter, this is how Factor knows what its elements are:
( scrachpad ) [ "out.txt" utf8 [ "Hi" print ] with-file-writer ] .
[ "out.txt" utf8 [ "Hi" print ] with-file-writer ]
The machine code entry point refers to a code block in the code heap containing the quotation’s compiled definition.
The call
generic word calls quotations. This word can take any
callable,
not just a quotation, and has a method for each type of callable. The
callable
class includes quotations, as well as curry
and compose
which are discussed below. This means that closure invocation is
implemented on top of method dispatch in Factor.
For reasons that will become clear in the last section on compiler optimizations, most quotations never have their entry points called directly, and so it would be a waste of time to compile all quotations as they are read by the parser.
Instead, all freshly-parsed quotations have their entry points set to
the lazy-jit-compile
primitive from the
kernel.private
vocabulary.
The call
generic word has a method on the quotation class. This method
invokes the
(call)
primitive from the kernel.private vocabulary. The (call)
primitive
does not type check, since by the time it is called it is known that the
input is in fact a quotation. This primitive has a very simple machine
code definition:
mov eax,[esi] ! Pop datastack
sub esi,4
jmp [eax+13] ! Jump to quotation's entry point
Note that the quotation is stored in the EAX register; this is
important. Recall that initially, a quotation’s entry point is set to
the lazy-jit-compile
word, and that all quotations initially share
this entry point.
This word, which is not meant to be invoked directly, compiles
quotations the first time they are called. Since all quotations share
the same initial entry point, it needs to know which quotation invoked
it. This is done by passing the quotation to this word in the EAX
register. The lazy-jit-compile
word compiles this quotation, sets its
entry point to the new compiled code block, and then calls it again. On
subsequent calls of the same quotation, the new compiled definition will
be jumped to directly, and the lazy-jit-compile
entry point is not
involved.
If you’re interested in the definition of lazy-jit-compile
, search for
it in these files:
The curry and compose words
curry
The curry
word is the fundamental constructor for making closures. It takes a
value and a callable
, and returns something that prints out as if it
were a new quotation:
( scratchpad ) 5 [ + 2 * ] curry .
[ 5 + 2 * ]
This is not a quotation though, but rather an instance of the curry
class. Instances of this class are pairs of elements: an object, and
another callable
.
Instances of curry
are callable
, because the call
generic word has
a method on curry
. This method pushes the first element of the pair on
the data stack, then recursively calls call
on the second element.
Calls to curry
can be chained together:
( scratchpad ) { "a" "b" "c" } 1 2 [ 3array ] curry curry map .
{ { "a" 1 2 } { "b" 1 2 } { "c" 1 2 } }
Note that using curry
, many callables can be constructed which share
the same compiled definition; only the data value differes.
Technically, curry
is just an optimization – it would be possible to
simulate it by using sequence words to construct a new quotation with
the desired value prepended, but this would be extremely inefficient.
Prepending an element would take O(n) time, and furthermore, result in
the new quotation being compiled the first time the result was called.
Indeed, in some simpler concatenative languages such as
Joy, quotations
are just linked lists, and they execute in the interpreter, so partial
application can be done by using the cons
primitive for creating a new
linked list.
compose
The
compose
word takes two callable
s and returns a new instance of the compose
class. Instances of this class are pairs of callable
s.
( scratchpad ) [ 3 + ] [ sqrt ] compose .
[ 3 + sqrt ]
As with curry
, the call
generic word has a method on the compose
class. This method recursively applies call
to both elements.
It is possible to express the effect of compose
using curry
and
dip
:
: my-compose ( quot1 quot2 -- compose )
[ [ call ] dip call ] curry curry ; inline
The main reason compose
exists as a distinct type is to make the
result prettyprint better. Were it defined as above, the result would
not prettyprint in a nice way:
( scratchpad ) [ 3 + ] [ sqrt ] [ [ call ] dip call ] curry curry .
[ [ 3 + ] [ sqrt ] [ call ] dip call ]
The curry
and compose
constructors are sufficient to express all
higher-level forms of closures.
An aside: compose and dip
It is possible to express compose
using curry
and dip
. Conversely,
it is also possible to express dip
using compose
and curry
.
: my-dip ( a quot -- ) swap [ ] curry compose call ;
Here is an example of how this works. Suppose we have the following code:
123 321 [ number>string ] my-dip
Using the above definition of my-dip
, we get
123 321 [ number>string ] swap [ ] curry compose call ! substitute definition of 'my-dip'
123 [ number>string ] 321 [ ] curry compose call ! evaluate swap
123 [ number>string ] [ 321 ] compose call ! evaluate curry
123 [ number>string 321 ] call ! evaluate compose
123 number>string 321 ! evaluate call
"123" 321 ! evaluate number>string
Fry syntax
The fry syntax
provides nicer syntax sugar for more complicated usages of curry
and
compose
. Beginners learning Factor should start with fry syntax, and
probably don’t need to know about curry
and compose
at all; but this
syntax desugars trivially into curry
and compose
, as explained in
the documentation.
Lexical variables
Code written with the locals vocabulary can create closures by referencing lexical variables from nested quotations. For example, here is a word from the compiler which computes the breadth-first order on a control-flow graph:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
[ accum push ]
[ dom-children work-list push-all-front ] bi
] slurp-deque
accum ;
In the body of the word, three lexical variables are used; the input
parameter cfg
, and the two local bindings work-list
and accum
.
When the parser reads the above definition, it creates several
quotations whose bodies reference local variables, for example,
[ accum push ]
. Before defining the new word, however, the :: parsing
word rewrites this into concatenative code.
Suppose we were doing this rewrite by hand. The first step would be to
make closed-over variables explicit. We can do this by currying the
values of accum
and work-list
onto the two quotations passed to
bi
:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list [
accum [ push ] curry
work-list [ [ dom-children ] dip push-all-front ] curry bi
] slurp-deque
accum ;
And now, we can do the same transformation on the quotation passed to
slurp-deque
:
:: breadth-first-order ( cfg -- bfo )
<dlist> :> work-list
cfg post-order length :> accum
cfg entry>> work-list push-front
work-list accum work-list [
[ [ push ] curry ] dip
[ [ dom-children ] dip push-all-front ] curry bi
] curry curry slurp-deque
accum ;
Note how three usages of curry
have appeared, and all lexical variable
usage now only occurs at the same level where the variable is actually
defined. The final rewrite into concatenative code is trivial, and
involves some tricks with dup
and dip
.
A lexical closure closing over many variables will be rewritten into
code which builds a long chain of curry
instances, essentially a
linked list. This is less efficient than closure representations used in
Lisp implementations, where typically all closed-over values are stored
in a single array. However in practice this is not usually a problem,
because of optimizations outlined below.
Mutable local variables
Mutable local variables are denoted by suffixing their name with !
.
Here is an example of a loop that counts a mutable variable up to 100:
:: counted-loop-test ( -- )
0 :> i!
100 [ i 1 + i! ] times ;
Factor distinguishes between binding (done with :>
) and assignment
(done with foo!
where foo
is a local previously declared to be
mutable). At a fundamental level, all bindings and closures are
immutable; code using mutable locals is rewritten to close over a
mutable heap-allocated box instead, and reads and writes involve an
extra indirection. The previous example is roughly equivalent to the
following, where we use an immutable local variable holding a mutable
one-element array:
:: counted-loop-test ( -- )
{ 0 } clone :> i
100 [ i first 1 + i set-first ] times ;
For this reason, code that uses mutable locals does not optimize as well, and iterative loops that uses mutable local variables can run slower than tail-recursive functions which uses immutable local variables. This might be addressed in the future with improved compiler optimizations.
Optimizations
Quotation inlining
If after inlining, the compiler sees that call
is applied to a literal
quotation, it inlines the quotation’s body at the call site. This
optimization works very well when quotations are used as “downward
closures”, and this is why most quotations never have their dynamic
entry point invoked at all.
Escape analysis
Since curry
and compose
are ordinary tuple classes, any time some
code constructs instances of curry
and compose
, and immediately
unpacks them, the compiler’s escape analysis pass can eliminate the
allocations entirely.
The escape analysis pass has no special knowledge of curry
and
compose
; it applies an optimization intended for object-oriented
code.
Again, this helps optimize code with “downward closures”, with the
result that most usages of curry
and compose
will never allocate any
memory at run time.
For more details, see my blog post about Factor’s escape analysis algorithm.