Evaluating a quadratic polynomial

Friday, March 30, 2007

I am about to release a version of Cat which supports optional named parameters. The turning point in making this decision to add support for named parameters to a version of Cat occured when I tried to implement a simple quadratic equation with names.

He shows the following program for evaluating a quadratic polynomial f(x)=ax+bx+c, which takes the values x, a, b, c as input:

define quadratic_point_free
{
    [dupd [sqr] dip swap [*] dip] dipd [* +] dip +
}

This is not good stack code.

I’m not writing this to bash Christopher, but I think he has not had much experience actually programming code in a stack language. Implementing an interpreter for any language doesn’t force one to practice the idioms and problem solving techniques employed when actually programming in the language; likewise, one doesn’t need a deep understanding of the implementation of their language to use it. So I think Chris is committing the classical newbie mistake when learning a language – trying to relate features of the language to what they already know, and are somewhat surprised to find that they don’t map one to one. In Christopher’s case, however, he has the freedom to change the language as he sees fit. Since I don’t know Cat very well, I can’t say if named parameters would be a good thing or not. However, I think a feature like this needs a more compelling use-case than encoding polynomial evaluations.

So why is this code in bad taste? First, there’s no stack effect comment. It wasn’t hard for me to see that the stack effect is ( x a b c -- f(x) ), but it should be documented. However, the real problem is that this code operates at a level of abstraction which is too low.

Since he is a compiler developer, I imagine Christopher wrote the above code by manually “compiling” it in his head. This is the wrong way to approach a stack language. Instead, I have found two alternative approaches to be far more fruitful:

Write your code to take advantage of stack language idioms, and combine existing words express your problem in a simple way. That is, implement a Domain Specific Language (DSL)!
Write code which writes tedious code and performs this compilation process for you. That is, implement a DSL with more hair!

I’ll show how to apply both approaches to solving this problem in Factor. In both cases we use Factor features not found in Cat, but they would be trivial to add. Cat, Joy and Factor are all very closely related, and share the same evaluation model and representation of code as quotations. Factor is compiled and it’s compiler transforms are unique among concatenative languages, but a hypothetical compiled Joy or Cat could implement the same functionality with very little effort. Almost everything I write below also applies to low-level Forth.

Without further ado, I’ll start with the first approach.

Just because the formula at hand has three additions and two multiplications, doesn’t mean the symbols in the program must map one-to-one. The set of polynomials with real (or complex) coefficients forms a vector space. Polynomials are vectors. Polynomial evaluation is the dot product:
ax^2+bx+c = (a,b,c).(x^2,x,1)
Integers are sequences in Factor, and generating a sequence of the form { 1 x x^2 ... } is easy by mapping over an integer with the ^ word:

: powers ( x n -- seq ) [ ^ ] map-with ;

As long as the coefficients of the quadratic polynomial are given as a single object on the stack, a sequence { c b a }, we are done:
: quadratic ( c,b,a x -- f(x) ) 3 powers v. ;
However, we may as well generalize to evaluating a polynomial of any degree;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;
For example, to compute f(x) = 1024x^3 + 115x^2 - 71x + 576, you write
{ 576 -71 115 1024 } poly-eval
Certainly a lot simpler than the equivalent using the + and * words directly. And of course it will work with bignums, rationals, floats, and complex numbers. Also the polynomial is now a first-class entity in the language; even though it is just a sequence of numbers, it still stands that we have identified the essence of a polynomial. We are not threading the polynomial through the expression as a series of scalars. This means we can now pass the polynomial around as a single entity, and manipulate it with sequence words; we can add two polynomials together, multiply a polynomial by a scalar, juggle the coefficients around.

So to sum things up, I would consider the following a better implementation of Christopher’s function:

: powers ( x n -- seq ) [ ^ ] map-with ;
: poly-eval ( x coeff -- f(x) ) [ length powers ] keep v. ;

This is useful as a general library routine; and it is very simple considering what it does. Also note that the way we approached the problem was different. We solved a general problem and the special case fell out for free. (In fact, the libs/math library already implements polynomial evaluation. This whole discussion is moot.)

Now, let us consider the second approach to problem solving: metaprogramming.

Back to Chris’s function; here is a literal translation into Factor:

: quadratic-eval ( x a b c -- f(x) )
    >r >r dupd >r sq r> swap >r * r> r> r> >r * + r> + ;

This looks like the output of an infix DSL compiler. It would be easy to implement an infix math DSL in Factor (easier than embedding Factor features into applicative languages!) and it would probably generate something almost identical to the above output, given input like
INFIX" a*x*x+b*x+c"
However I won’t show you how to build an infix DSL. Instead, I’ll solve a more limited problem; polynomial evaluation using Horner’s Rule. The result will be a poly-eval word called in the exact same manner:
{ 576 -71 115 1024 } poly-eval
But instead, a compile-time transform will generate efficient code which does not actually create or iterate over arrays, or call the ^ word at run time. Such an approach is overkill in most situations. However, if your program needs to evaluate a lot of polynomials, it can speed things up. It also shows that the Factor compiler is very easy to extend, because it is written in, and compiles, a stack language.

Since we will be defining a compile-time transform, we need to write a word which takes a sequence of coefficients, and generates a quotation – the quotation receives the input on the stack and evaluates the polynomial at that input. This word will be recursive. Consider the base case first. We are evaluating a constant polynomial;

: poly-eval-base ( c -- quot )
    [ nip ] curry ;

For example, the code to evaluate the constant polynomial { 3.5 } is 3.5 nip; keep in mind the top of the stack is the value of x (which is disregarded by constant functions).

Now if we have a way of computing some polynomial g(x), we write a word which computes x*g(x) + c:

: poly-eval-step ( quot c -- newquot )
    [ >r dupd call * r> + ] curry curry ;

The recursion puts the two together:

: poly-eval-quot ( seq -- quot )
    dup length 1 = [
        first poly-eval-base
    ] [
        unclip >r poly-eval-quot r> poly-eval-step
    ] if ;

The above three words primarily take sequences apart and create new ones. The recursion is really a left fold; a left fold combinator could be added to the Factor library, but I haven’t needed it yet.

We can test this code first:

  3 { 1 2 2 } poly-eval-quot call .
25

In fact, it is true that 25 = 1 + 2*3 + 2*3*3. As it stands, this new code generating implementation is not any more efficient. Instead of creating an intermediate array of exponents, we create an intermediate quotation of code. The trick is that if the sequence of coefficients is known at compile time, we can generate the quotation at compile time too:

: poly-eval poly-eval-quot call ;

\ poly-eval 1 [ poly-eval-quot ] define-transform

Consider the following word:

: foo { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;

We can define a specializer for it. This tells the compiler that while the word can take any type of object as input, a certain type is anticipated to happen more than most others; in our case, we optimize for the case where the input is a floating point value:
\ foo { float } "specializer" set-word-prop
Now, take a look at the PowerPC disassembly of the case where the input is indeed a floating point value:

0x059ea500:     lwz     r11,0(r14)
0x059ea504:     lfd     f0,3(r11)
0x059ea508:     lis     r11,1438
0x059ea50c:     ori     r11,r11,42632
0x059ea510:     lwz     r11,0(r11)
0x059ea514:     lfd     f1,3(r11)
0x059ea518:     fmul    f0,f0,f1
0x059ea51c:     lis     r11,1438
0x059ea520:     ori     r11,r11,42648
0x059ea524:     lwz     r11,0(r11)
0x059ea528:     lfd     f1,3(r11)
0x059ea52c:     fadd    f0,f0,f1
0x059ea530:     lwz     r11,0(r14)
0x059ea534:     lfd     f1,3(r11)
0x059ea538:     fmul    f1,f1,f0
0x059ea53c:     lis     r11,1438
0x059ea540:     ori     r11,r11,42644
0x059ea544:     lwz     r11,0(r11)
0x059ea548:     lfd     f0,3(r11)
0x059ea54c:     fadd    f1,f1,f0
0x059ea550:     lwz     r11,0(r14)
0x059ea554:     lfd     f0,3(r11)
0x059ea558:     fmul    f0,f0,f1
0x059ea55c:     lis     r11,1438
0x059ea560:     ori     r11,r11,42640
0x059ea564:     lwz     r11,0(r11)
0x059ea568:     lfd     f1,3(r11)
0x059ea56c:     fadd    f0,f0,f1
0x059ea570:     lwz     r11,0(r14)
0x059ea574:     lfd     f1,3(r11)
0x059ea578:     fmul    f1,f1,f0
0x059ea57c:     lis     r11,1438
0x059ea580:     ori     r11,r11,42636
0x059ea584:     lwz     r11,0(r11)
0x059ea588:     lfd     f0,3(r11)
0x059ea58c:     fadd    f1,f1,f0
0x059ea590:     lis     r12,1
0x059ea594:     ori     r12,r12,39104
0x059ea598:     lwz     r12,0(r12)
0x059ea59c:     lwz     r11,4(r12)
0x059ea5a0:     addi    r11,r11,16
0x059ea5a4:     stw     r11,4(r12)
0x059ea5a8:     addi    r11,r11,-16
0x059ea5ac:     li      r12,43
0x059ea5b0:     stw     r12,0(r11)
0x059ea5b4:     stfd    f1,8(r11)
0x059ea5b8:     ori     r12,r11,5
0x059ea5bc:     stw     r12,0(r14)
0x059ea5c0:     lis     r12,1
0x059ea5c4:     ori     r12,r12,4492
0x059ea5c8:     mtlr    r12
0x059ea5cc:     blrl

The generated assembly will look like gibberish to people who are not familiar with the internals of the Factor compiler (anyone except for me?), so you’ll have to take my word for this, but in fact we have the following:

The compiler doesn’t do a good job of eliminating redundant loads from memory, and the above code could be a lot better.
But…
There is no recursion, sequence iteration, or sequence allocation in the above assembly.
The only place where a floating point value is boxed on the heap is at the end, after the final value has been computed. The allocation is done with customized inline assembly.
There are no subroutine calls until the end, where the GC check is called.
The input value is unboxed (too many times) and the constant coefficient values are unnecessarily unboxed, because the compiled code refers to boxed float objects not unboxed float values. This could be improved. However unboxing a float is inlined, and only takes two instructions.
All this is despite the fact that the code was generated from the following high-level description of the problem:
{ 11.4 23.7 78.22 99.6 -174.67 } poly-eval

We can compare the performance of the two implementations using the following benchmark:

:my-poly { 11.4 23.7 78.22 99.6 -174.67 } poly-eval ;
\ my-poly { float } "specializer" set-word-prop
: benchmark 1000000 [ >float my-poly drop ] each ;

It evaluates a fixed polynomial on every integer between 0 and 1000000, discarding the results. All arithmetic is done in floating point. Here are the results:

Implementation	Run time (ms)
Naive	2713
Compile-time transform	102

This is a 27x speed improvement!

Here is the grand finale. Suppose the polynomial coefficients are not fixed at compile time, but compile from some input file which is loaded by the user while the program runs. The user might not be aware there is such a thing as a compiler. Well, in Factor, compile-time is any-time. At the time the file is loaded, you can generate the evaluation quotations and compile them just by calling compile-quot. This technique is widespread among high-level languages. There are documented cases where Common Lisp code can beat optimized C by using information only available at run-time to optimize code. In stack languages, higher-order functions and code generation come even more naturally than in Lisp.

Nothing we did called for named parameters. Instead, one has to learn to “context switch” into the stack language mindset. It is a very different way of looking at the world.