Efficient partial function application (aka compiled curry)

Monday, August 6, 2007

Note: the code described here is not in darcs yet, but will make it there within the new few days after I fix a few residual bugs.

Ever since quotations became a first-class data type, Factor has had a curry word:

  3 [ + ] curry .
[ 3 + ]

This is essentially a partial function application. (I’m aware that currying and function application is not the same, but this word was named before I was aware of that fact, and anyway, curry reads better than papply or partial.)

Previously, this was an expensive word to use. Not only did it allocate a new quotation and copy the existing quotation’s elements, but words which called quotations produced by curry did not compile, and had to run in the interpreter; this is due to the compiler’s design, which “lifts” all quotations up to their call site.

Now, this has all changed. For interpreted code, curry now runs in constant time, not linear time proportional to the quotation’s length, because it allocates a small object holding the object together with the quotation. These objects are instances of a curry class, but they print like quotations and are equal to quotations having the same elements, so they’re essentially identical:

3 [ + ] curry [ 3 + ] = .
  t

When a curried quotation is called by call in the interpreter, the VM simply pushes the curried object first, then calls the quotation.

In compiled code, we can do even better now. The compiler knows about curry and applies a rewrite rule to the dataflow graph. Essentially, it replaces all occurrences of curry’s output value with a pair of values, being the object and quotation; it also “widens” all stack shuffling from the point where the curry is created to where it is called. If the curry is passed to an ordinary word, the compiler inserts a real call to curry which allocates memory; however, the key idea behind this rewrite rule is that curry instantiation is “lazy”, and is not done at all if the curry is simply passed downward to a combinator.

For example, consider this example:

: foo ( x y -- x' y' ) 3 [ + ] curry 2apply ;

This word adds 3 to the top two stack elements. The compiler first inlines the 2apply word:

: foo ( x y -- x' y' ) 3 [ + ] curry tuck >r >r call r> r> call ;

Now it applies the rewrite rule:

: foo ( x y -- x' y' ) 3 [ + ] abc--bcabc >r >r >r call r> r> r> call ;

The compiler represents stack shuffles in symbolic form, internally; I’m writing abc--bcabc to mean a shuffle with that effect, for which no word exists in the library.

Now, it lifts the quotation up to the call site:

: foo ( x y -- x' y' ) 3 [ + ] abc--bcabc >r >r >r drop + r> r> r> drop + ;

Finally, it removes the literal value [ + ], since it is now dead; it travels around the stack without ever being used for anything, and is dropped:

: foo ( x y -- x' y' ) 3 ab--bab >r >r + r> r> + ;

So in this case, the compiler will produce the exact same code for the following two snippets:

3 [ + ] curry 2apply
3 tuck >r >r + r> r> +

While [ + ] curry 2apply is not a useful thing to write, since the direct expansion is simpler, it does demonstrate how the compiler is able to remove the object allocation from this code.

So, curry is efficient now, both in the interpreter and compiler. But what does this mean?

The role played by curry here is somewhat like a Lisp lambda which closes over one free variable. We can use curry to package up a bunch of values, pass them to the combinator, and fiddle with them from inside a quotation. Until now, you either had to pay a performance penalty for using curry, or you would use stack juggling tricks. For example, consider the each and 2each combinators. Here is the current implementation:

: (each) ( seq quot i -- seq quot i )
    [ rot nth-unsafe swap call ] 3keep ; inline

: each ( seq quot -- )
    over length [ (each) ] repeat 2drop ; inline

: (2each) ( quot seq seq i -- quot seq seq i )
    [ 2nth-unsafe rot dup slip ] 3keep ; inline

: 2each ( seq1 seq2 quot -- )
    -rot 2dup min-length [ (2each) ] repeat 3drop ; inline

Here is an implementation using curry:

: (each) ( seq quot -- n quot' )
    >r [ length ] keep r>
    [ >r nth-unsafe r> call ] 2curry ; inline

: each ( seq quot -- )
    (each) each-integer ; inline

: (2each) ( seq1 seq2 quot -- n quot' )
    >r [ min-length ] 2keep r>
    [ >r 2nth-unsafe r> call ] 3curry ; inline

: 2each ( seq1 seq2 quot -- )
    (2each) each-integer ; inline

Note that each-integer is the new version of repeat, with a slightly altered stack effect.

In the old code, the quotation passed to repeat has to manually save and restore various values that it needs; now that curry is efficient, we just package those values up with the quotation before passing it along to each-integer.

It is somewhat ironic that while Factor is as “concatenative” language, until now concatenating quotations together was discouraged, at least where performance was important! But now, we can easily write an efficient compose word which takes a pair of quotations:

: compose [ >r call r> call ] 2curry ; inline

For example,

[ 2 2 ] [ + ] compose .
  [ [ 2 2 ] [ + ] >r call r> call ]

It is pretty clear that this quotation has the same effect when called as [ 2 2 + ]. The difference between compose, which only works on quotations and produces a funny result, and append, which works on all sequences, is that since compose is built from curry, it runs in O(1) time, and the compiler is able to eliminate its runtime cost altogether.

For example, consider the assoc-each combinator. It is implemented in terms of the assoc-find combinator, since assoc-find is the only primitive combinator each assoc instance must implement. On each iteration, we want to call the quotation, but always output f, forcing assoc-find to continue the iteration until the end. Formerly, assoc-each was implemented as follows. First, we had a utility word, assoc-find-with, which called assoc-find while retaining a parameter on the stack. It was in implemented using a assoc-with word which was also used for assoc-each-with, and so on:

: assoc-with 2swap [ >r -rot r> call ] 2keep ; inline

: assoc-find-with ( obj assoc quot -- key value ? )
    swap [ assoc-with rot ] assoc-find
    >r >r 2nip r> r> ; inline

: assoc-each ( assoc quot -- )
    swap [ rot call f ] assoc-find-with 3drop ; inline

This is ugly as hell, and essentially boilerplate. Here is the new way:

: assoc-each ( assoc quot -- )
    [ f ] compose assoc-find 3drop ; inline

That’s it; no need to define -with words, and no unnecessary stack shuffling. We exploit the “concatenative” nature of Factor here – composition of sequences is the same as composition of functions!

Which brings me to the next topic; all the “manually curried” combinators you know and love, such as each-with, map-with, and so forth, have been moved to a deprecated vocabulary. They’re going away soon. Updating code is easy.

swap [ swap XYZ ] each-with  ===  [ XYZ ] curry each
[ XYZ ] each-with  ===  [ XYZ ] curry* each

Same for map-with, subset-with, and so on. The curry* word is just partial application on the second stack element; it is defined in terms of curry.

All the sequence and assoc combinators have been greatly simplified. In fact, I was able to move the 2swap word out of the core and into extra/shuffle; stack shuffling has been simplified by curry to such an extent that in particular, 2swap is simply not needed anymore. Other more complex shuffle words such as roll and pick are used less frequently now, as well.

By lowering the cost of an abstraction, I was able to simplify code by a non-trivial amount. There’s a general theme here: the goal of a compiler is to make idiomatic code run fast. It is not good enough to only accelerate hand-unrolled, tuned code chock-full of irrelevant detail and declarations. Nobody wants to write unnecessarily complex code just to please the compiler.