Specialized float arrays
Wednesday, July 25, 2007
I added float arrays to Factor. A float array behaves like an array of floats, except the representation is more efficient; individual elements are not boxed.
Literal float arrays look like F{ 0.64 0.85 0.43 0.16 0.37 }
.
Float arrays can provide a performance benefit if the compiler is able to infer enough information to unbox float values. For example, consider the following word:
: v+ [ + ] 2map ;
It takes two arrays and adds elements pairwise. Let’s try timing the performance of this word with normal arrays:
( scratchpad ) { 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3200 ms run / 25 ms GC time
Now float arrays:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
3653 ms run / 70 ms GC time
It is actually slower! This is because each element access has to
allocate a new float on the heap. But now, lets use the new hints
vocabulary to give a hint to the compiler that v+
should be optimized
for float arrays:
HINTS: v+ float-array float-array ;
This has the effect of compiling a version of this word specialized to float arrays. Here, the compiler can work some magic and eliminate boxing altogether:
( scratchpad ) F{ 0.64 0.85 0.43 0.16 0.37 0.64 0.85 0.43 0.16 0.37 } dup [ 1000000 [ 2dup v+ drop ] times ] time
974 ms run / 10 ms GC time
I used float arrays to make the spectral norm benchmark faster: the run time went from 120 seconds to 30 seconds. The raytracer was not improved by float arrays, though; I need to investigate why.
Also float array operations are only compiled efficiently on PowerPC right now. I need to code some new assembly intrinsics for the other platforms.
Float arrays can be passed to C functions directly. Long-term, somebody should look into using SSE2 and AltiVec to optimize vector operations on float arrays. That would really rock.