Comparing Factor's performance against V8, LuaJIT, SBCL, and CPython
Saturday, May 29, 2010
Together with Daniel Ehrenberg and Joe Groff, I’m writing a paper about Factor for DLS2010. We would appreciate feedback about the draft version of the paper. As part of the paper we include a performance comparison between Factor, V8, LuaJIT, SBCL, and Python. The performance comparison consists of some benchmarks from the The Computer Language Benchmarks Game. I’m posting the results here first, in case there’s something really stupid here.
Language implementations
Factor and V8 were built from their respective repositories. SBCL is version 1.0.38. LuaJIT is version 2.0.0beta4. CPython is version 3.1.2. All language implementations were built as 64-bit binaries and run on an 2.4 GHz Intel Core 2 Duo.
Benchmark implementations
Factor implementations of the benchmarks can be found in our source repository:
Implementations for the other languages can be found at the language benchmark game CVS repository:
In order to make the reverse complement benchmark work with SBCL on Mac OS X, I had to apply this patch; I don’t understand why:
--- bench/revcomp/revcomp.sbcl 9 Feb 2007 17:17:26 -0000 1.4
+++ bench/revcomp/revcomp.sbcl 29 May 2010 08:32:19 -0000
@@ -26,8 +26,7 @@
(defun main ()
(declare (optimize (speed 3) (safety 0)))
- (with-open-file (in "/dev/stdin" :element-type +ub+)
- (with-open-file (out "/dev/stdout" :element-type +ub+ :direction :output :if-exists :append)
+ (let ((in sb-sys:*stdin*) (out sb-sys:*stdout*))
(let ((i-buf (make-array +buffer-size+ :element-type +ub+))
(o-buf (make-array +buffer-size+ :element-type +ub+))
(chunks nil))
@@ -72,4 +71,4 @@
(setf start 0)
(go read-chunk))))
end-of-input
- (flush-chunks)))))))
+ (flush-chunks))))))
Running the benchmarks
I used Factor’s deploy tool to generate minimal images for the Factor benchmarks, and then ran them from the command line:
./factor -e='USE: tools.deploy "benchmark.nbody-simd" deploy'
time benchmark.nbody-simd.app/Contents/MacOS/benchmark.nbody-simd
For the scripting language implementations (LuaJIT and V8) I ran the scripts from the command line:
time ./d8 ~/perf/shootout/bench/nbody/nbody.javascript -- 1000000
time ./src/luajit ~/perf/shootout/bench/nbody/nbody.lua-2.lua 1000000
For SBCL, I did what the shootout does, and compiled each file into a new core:
ln -s ~/perf/shootout/bench/nbody/nbody.sbcl .
cat > nbody.sbcl_compile <<EOF
(proclaim '(optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0) (space 0)))
(handler-bind ((sb-ext:defconstant-uneql (lambda (c) (abort c))))
(load (compile-file "nbody.sbcl" )))
(save-lisp-and-die "nbody.core" :purify t)
EOF
sbcl --userinit /dev/null --load nbody.sbcl_compile
cat > nbody.sbcl_run <<EOF
(proclaim '(optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0) (space 0)))
(main) (quit)
EOF
time sbcl --dynamic-space-size 500 --noinform --core nbody.core --userinit /dev/null --load nbody.sbcl_run 1000000
For CPython, I precompiled each script into bytecode first:
python3.1 -OO -c "from py_compile import compile; compile('nbody.python3-4.py')"
Benchmark results
All running times are wall clock time from the Unix time
command. I
ran each benchmark 5 times and used the best result.
Factor | LuaJIT | SBCL | V8 | CPython | |
fasta | 2.597s | 1.689s | 2.105s | 3.948s | 35.234s |
reverse-complement | 2.377s | 1.764s | 2.955s | 3.884s | 1.669s |
nbody | 0.393s | 0.604s | 0.402s | 4.569s | 37.086s |
binary-trees | 1.764s | 6.295s | 1.349s | 2.119s | 19.886s |
spectral-norm | 1.377s | 1.358s | 2.229s | 12.227s | 1m44.675s |
regex-dna | 0.990s | N/A | 0.973s | 0.166s | 0.874s |
knucleotide | 1.820s | 0.573s | 0.766s | 1.876s | 1.805s |
Benchmark analysis
Some notes on the results:
- There is no Lua implementation of the regex-dna benchmark.
- Some of the SBCL benchmark implementations can make use of multiple cores if SBCL is compiled with thread support. However, by default, thread support seems to be disabled on Mac OS X. None of the other language implementations being tested have native thread support, so this is a single-core performance test.
- Factor’s string manipulation still needs work. The fasta, knucleotide and reverse-complement benchmarks are not as fast as they should be.
- The binary-trees benchmark is a measure of how fast objects can be allocated, and how fast the garbage collector can reclaim dead objects. LuaJIT loses big here, perhaps because it lacks generational garbage collection, and because Lua’s tables are an inefficient object representation.
- The regex-dna benchmark is a measure of how efficient the regular expression implementation is in the language. V8 wins here, because it uses Google’s heavily-optimized Irregexp library.
- Factor beats the other implementations on the nbody benchmark because it is able to make use of SIMD.
- For some reason SBCL is slower than the others on spectral-norm. It should be generating the same code.
- The benchmarks exercise insufficiently-many language features. Any benchmark that uses native-sized integers (for example, an implementation of the SHA1 algorithm) would shine on SBCL and suffer on all the others. Similarly, any benchmark that requires packed binary data support would shine on Factor and suffer on all the others. However, the benchmarks in the shootout mostly consist of scalar floating point code, and text manipulation only.
Conclusions
Factor’s performance is coming along nicely. I’d like to submit Factor to the computer language shootout soon. Before doing that, we need a Debian package, and the deploy tool needs to be easier to use from the command line.