Efficient partial function application (aka compiled curry)
Monday, August 6, 2007
Note: the code described here is not in darcs yet, but will make it there within the new few days after I fix a few residual bugs.
Ever since quotations became a first-class data type, Factor has had a
curry
word:
3 [ + ] curry .
[ 3 + ]
This is essentially a partial function application. (I’m aware that
currying and function application is not the same, but this word was
named before I was aware of that fact, and anyway, curry
reads better
than papply
or partial
.)
Previously, this was an expensive word to use. Not only did it allocate
a new quotation and copy the existing quotation’s elements, but words
which called quotations produced by curry
did not compile, and had to
run in the interpreter; this is due to the compiler’s design, which
“lifts” all quotations up to their call site.
Now, this has all changed. For interpreted code, curry
now runs in
constant time, not linear time proportional to the quotation’s length,
because it allocates a small object holding the object together with the
quotation. These objects are instances of a curry
class, but they
print like quotations and are equal to quotations having the same
elements, so they’re essentially identical:
3 [ + ] curry [ 3 + ] = .
t
When a curried quotation is called by call
in the interpreter, the VM
simply pushes the curried object first, then calls the quotation.
In compiled code, we can do even better now. The compiler knows about
curry
and applies a rewrite rule to the dataflow graph. Essentially,
it replaces all occurrences of curry
’s output value with a pair of
values, being the object and quotation; it also “widens” all stack
shuffling from the point where the curry is created to where it is
called. If the curry is passed to an ordinary word, the compiler inserts
a real call to curry
which allocates memory; however, the key idea
behind this rewrite rule is that curry instantiation is “lazy”, and is
not done at all if the curry is simply passed downward to a
combinator.
For example, consider this example:
: foo ( x y -- x' y' ) 3 [ + ] curry 2apply ;
This word adds 3 to the top two stack elements. The compiler first
inlines the 2apply
word:
: foo ( x y -- x' y' ) 3 [ + ] curry tuck >r >r call r> r> call ;
Now it applies the rewrite rule:
: foo ( x y -- x' y' ) 3 [ + ] abc--bcabc >r >r >r call r> r> r> call ;
The compiler represents stack shuffles in symbolic form, internally; I’m
writing abc--bcabc
to mean a shuffle with that effect, for which no
word exists in the library.
Now, it lifts the quotation up to the call site:
: foo ( x y -- x' y' ) 3 [ + ] abc--bcabc >r >r >r drop + r> r> r> drop + ;
Finally, it removes the literal value [ + ]
, since it is now dead; it
travels around the stack without ever being used for anything, and is
dropped:
: foo ( x y -- x' y' ) 3 ab--bab >r >r + r> r> + ;
So in this case, the compiler will produce the exact same code for the following two snippets:
3 [ + ] curry 2apply
3 tuck >r >r + r> r> +
While [ + ] curry 2apply
is not a useful thing to write, since the
direct expansion is simpler, it does demonstrate how the compiler is
able to remove the object allocation from this code.
So, curry
is efficient now, both in the interpreter and compiler. But
what does this mean?
The role played by curry
here is somewhat like a Lisp lambda which
closes over one free variable. We can use curry
to package up a bunch
of values, pass them to the combinator, and fiddle with them from inside
a quotation. Until now, you either had to pay a performance penalty for
using curry
, or you would use stack juggling tricks. For example,
consider the each
and 2each
combinators. Here is the current
implementation:
: (each) ( seq quot i -- seq quot i )
[ rot nth-unsafe swap call ] 3keep ; inline
: each ( seq quot -- )
over length [ (each) ] repeat 2drop ; inline
: (2each) ( quot seq seq i -- quot seq seq i )
[ 2nth-unsafe rot dup slip ] 3keep ; inline
: 2each ( seq1 seq2 quot -- )
-rot 2dup min-length [ (2each) ] repeat 3drop ; inline
Here is an implementation using curry
:
: (each) ( seq quot -- n quot' )
>r [ length ] keep r>
[ >r nth-unsafe r> call ] 2curry ; inline
: each ( seq quot -- )
(each) each-integer ; inline
: (2each) ( seq1 seq2 quot -- n quot' )
>r [ min-length ] 2keep r>
[ >r 2nth-unsafe r> call ] 3curry ; inline
: 2each ( seq1 seq2 quot -- )
(2each) each-integer ; inline
Note that each-integer
is the new version of repeat
, with a slightly
altered stack effect.
In the old code, the quotation passed to repeat
has to manually save
and restore various values that it needs; now that curry
is efficient,
we just package those values up with the quotation before passing it
along to each-integer
.
It is somewhat ironic that while Factor is as “concatenative” language,
until now concatenating quotations together was discouraged, at least
where performance was important! But now, we can easily write an
efficient compose
word which takes a pair of quotations:
: compose [ >r call r> call ] 2curry ; inline
For example,
[ 2 2 ] [ + ] compose .
[ [ 2 2 ] [ + ] >r call r> call ]
It is pretty clear that this quotation has the same effect when called
as [ 2 2 + ]
. The difference between compose
, which only works on
quotations and produces a funny result, and append
, which works on all
sequences, is that since compose
is built from curry
, it runs in
O(1) time, and the compiler is able to eliminate its runtime cost
altogether.
For example, consider the assoc-each
combinator. It is implemented in
terms of the assoc-find
combinator, since assoc-find
is the only
primitive combinator each assoc instance must implement. On each
iteration, we want to call the quotation, but always output f
, forcing
assoc-find
to continue the iteration until the end. Formerly,
assoc-each
was implemented as follows. First, we had a utility word,
assoc-find-with
, which called assoc-find
while retaining a parameter
on the stack. It was in implemented using a assoc-with
word which was
also used for assoc-each-with
, and so on:
: assoc-with 2swap [ >r -rot r> call ] 2keep ; inline
: assoc-find-with ( obj assoc quot -- key value ? )
swap [ assoc-with rot ] assoc-find
>r >r 2nip r> r> ; inline
: assoc-each ( assoc quot -- )
swap [ rot call f ] assoc-find-with 3drop ; inline
This is ugly as hell, and essentially boilerplate. Here is the new way:
: assoc-each ( assoc quot -- )
[ f ] compose assoc-find 3drop ; inline
That’s it; no need to define -with
words, and no unnecessary stack
shuffling. We exploit the “concatenative” nature of Factor here –
composition of sequences is the same as composition of functions!
Which brings me to the next topic; all the “manually curried”
combinators you know and love, such as each-with
, map-with
, and so
forth, have been moved to a deprecated
vocabulary. They’re going away
soon. Updating code is easy.
swap [ swap XYZ ] each-with === [ XYZ ] curry each
[ XYZ ] each-with === [ XYZ ] curry* each
Same for map-with
, subset-with
, and so on. The curry*
word is just
partial application on the second stack element; it is defined in terms
of curry
.
All the sequence and assoc combinators have been greatly simplified. In
fact, I was able to move the 2swap
word out of the core and into
extra/shuffle
; stack shuffling has been simplified by curry
to such
an extent that in particular, 2swap
is simply not needed anymore.
Other more complex shuffle words such as roll
and pick
are used less
frequently now, as well.
By lowering the cost of an abstraction, I was able to simplify code by a non-trivial amount. There’s a general theme here: the goal of a compiler is to make idiomatic code run fast. It is not good enough to only accelerate hand-unrolled, tuned code chock-full of irrelevant detail and declarations. Nobody wants to write unnecessarily complex code just to please the compiler.