Specialized float arrays

Wednesday, July 25, 2007

I added float arrays to Factor. A float array behaves like an array of floats, except the representation is more efficient; individual elements are not boxed.

Literal float arrays look like F{ 0.64 0.85 0.43 0.16 0.37 }.

Float arrays can provide a performance benefit if the compiler is able to infer enough information to unbox float values. For example, consider the following word:

: v+ [ + ] 2map ;

It takes two arrays and adds elements pairwise. Let’s try timing the performance of this word with normal arrays:

( scratchpad ) { 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3200 ms run / 25 ms GC time

Now float arrays:

( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3653 ms run / 70 ms GC time

It is actually slower! This is because each element access has to allocate a new float on the heap. But now, lets use the new hints vocabulary to give a hint to the compiler that v+ should be optimized for float arrays:

HINTS: v+ float-array float-array ;

This has the effect of compiling a version of this word specialized to float arrays. Here, the compiler can work some magic and eliminate boxing altogether:

( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
974 ms run / 10 ms GC time

I used float arrays to make the spectral norm benchmark faster: the run time went from 120 seconds to 30 seconds. The raytracer was not improved by float arrays, though; I need to investigate why.

Also float array operations are only compiled efficiently on PowerPC right now. I need to code some new assembly intrinsics for the other platforms.

Float arrays can be passed to C functions directly. Long-term, somebody should look into using SSE2 and AltiVec to optimize vector operations on float arrays. That would really rock.