Improved performance on the raytracer benchmark
Saturday, September 13, 2008
The new optimizer was missing two optimizations performed by the old one; arithmetic identity optimizations, and modular arithmetic optimizations. This brings the total number of optimization passes to 11 (def-use analysis runs twice).
The first optimization involves removing such redundant code sequences
as 0 +
, 1 *
and 0 bitor
. The old optimizer performed this
unconditionally, which was incorrect since most of these identities fail
to hold for floats. For example, -0.0 0 +
is 0.0
, so in this case
0 +
is not a no-op, and 1.0 0.0 / 0.0 *
is NaN
, so 0.0 *
does
not always yield zero. However, they are valid for integers, so the new
optimizer only applies these optimizations to integer-valued
expressions.
The second optimization is described in the blog post I linked to.
I also found and fixed a code generator regression which has been there
for ages, and a type inference regression which appeared with the new
code generator. Fixing these two regressions and implementing the two
missing optimizations above improved performance of
benchmark.raytracer
by almost 2x; the run time went from 12 seconds to
6.5 seconds on my 2.4 GHz Core 2 Duo MacBook Pro.
The Java
version
runs in 1.1 seconds with the server VM (I modified the code to run the
benchmark 10 times in a loop to give the JIT a chance to do some runtime
profiling, and I measured the run time internally so as to avoid JVM
startup overhead). The Common Lisp
version
runs in 3.5 seconds under SBCL. Note that I’m running the benchmark with
n=200
and level=3
.
Factor is slowly inching towards the performance offered by other compilers. Hopefully one day I’ll be able to close the gap.