Improved performance on the raytracer benchmark

Saturday, September 13, 2008

The new optimizer was missing two optimizations performed by the old one; arithmetic identity optimizations, and modular arithmetic optimizations. This brings the total number of optimization passes to 11 (def-use analysis runs twice).

The first optimization involves removing such redundant code sequences as 0 +, 1 * and 0 bitor. The old optimizer performed this unconditionally, which was incorrect since most of these identities fail to hold for floats. For example, -0.0 0 + is 0.0, so in this case 0 + is not a no-op, and 1.0 0.0 / 0.0 * is NaN, so 0.0 * does not always yield zero. However, they are valid for integers, so the new optimizer only applies these optimizations to integer-valued expressions.

The second optimization is described in the blog post I linked to.

I also found and fixed a code generator regression which has been there for ages, and a type inference regression which appeared with the new code generator. Fixing these two regressions and implementing the two missing optimizations above improved performance of benchmark.raytracer by almost 2x; the run time went from 12 seconds to 6.5 seconds on my 2.4 GHz Core 2 Duo MacBook Pro.

The Java version runs in 1.1 seconds with the server VM (I modified the code to run the benchmark 10 times in a loop to give the JIT a chance to do some runtime profiling, and I measured the run time internally so as to avoid JVM startup overhead). The Common Lisp version runs in 3.5 seconds under SBCL. Note that I’m running the benchmark with n=200 and level=3.

Factor is slowly inching towards the performance offered by other compilers. Hopefully one day I’ll be able to close the gap.