Factor Language Blog

Comparing Factor's performance against V8, LuaJIT, SBCL, and CPython

Saturday, May 29, 2010

Together with Daniel Ehrenberg and Joe Groff, I’m writing a paper about Factor for DLS2010. We would appreciate feedback about the draft version of the paper. As part of the paper we include a performance comparison between Factor, V8, LuaJIT, SBCL, and Python. The performance comparison consists of some benchmarks from the The Computer Language Benchmarks Game. I’m posting the results here first, in case there’s something really stupid here.

Language implementations

Factor and V8 were built from their respective repositories. SBCL is version 1.0.38. LuaJIT is version 2.0.0beta4. CPython is version 3.1.2. All language implementations were built as 64-bit binaries and run on an 2.4 GHz Intel Core 2 Duo.

Benchmark implementations

Factor implementations of the benchmarks can be found in our source repository:

Implementations for the other languages can be found at the language benchmark game CVS repository:

LuaJIT SBCL V8 CPython
binary-trees binarytrees.lua-2.lua binarytrees.sbcl binarytrees.javascript binarytrees.python3-6.python3
fasta fasta.lua fasta.sbcl fasta.javascript-2.javascript fasta.python3-2.python3
knucleotide knucleotide.lua-2.lua knucleotide.sbcl-3.sbcl knucleotide.javascript-3.javascript knucleotide.python3-4.python3
nbody nbody.lua-2.lua nbody.sbcl nbody.javascript nbody.python3-4.python3
regex-dna regexdna.sbcl-3.sbcl regexdna.javascript regexdna.python3
reverse-complement revcomp.lua revcomp.sbcl revcomp.javascript-2.javascript revcomp.python3-4.python3
spectral-norm spectralnorm.lua spectralnorm.sbcl-3.sbcl spectralnorm.javascript spectralnorm.python3-5.python3

In order to make the reverse complement benchmark work with SBCL on Mac OS X, I had to apply this patch; I don’t understand why:

--- bench/revcomp/revcomp.sbcl 9 Feb 2007 17:17:26 -0000 1.4
+++ bench/revcomp/revcomp.sbcl 29 May 2010 08:32:19 -0000
@@ -26,8 +26,7 @@
 
 (defun main ()
   (declare (optimize (speed 3) (safety 0)))
-  (with-open-file (in "/dev/stdin" :element-type +ub+)
-    (with-open-file (out "/dev/stdout" :element-type +ub+ :direction :output :if-exists :append)
+  (let ((in sb-sys:*stdin*) (out sb-sys:*stdout*))
       (let ((i-buf (make-array +buffer-size+ :element-type +ub+))
             (o-buf (make-array +buffer-size+ :element-type +ub+))
             (chunks nil))
@@ -72,4 +71,4 @@
                         (setf start 0)
                         (go read-chunk))))
            end-of-input
-             (flush-chunks)))))))
+             (flush-chunks))))))

Running the benchmarks

I used Factor’s deploy tool to generate minimal images for the Factor benchmarks, and then ran them from the command line:

./factor -e='USE: tools.deploy "benchmark.nbody-simd" deploy'
time benchmark.nbody-simd.app/Contents/MacOS/benchmark.nbody-simd

For the scripting language implementations (LuaJIT and V8) I ran the scripts from the command line:

time ./d8 ~/perf/shootout/bench/nbody/nbody.javascript -- 1000000
time ./src/luajit ~/perf/shootout/bench/nbody/nbody.lua-2.lua 1000000

For SBCL, I did what the shootout does, and compiled each file into a new core:

ln -s ~/perf/shootout/bench/nbody/nbody.sbcl .

cat > nbody.sbcl_compile <<EOF
(proclaim '(optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0) (space 0)))
(handler-bind ((sb-ext:defconstant-uneql (lambda (c) (abort c))))
  (load (compile-file "nbody.sbcl" )))
(save-lisp-and-die "nbody.core" :purify t)
EOF

sbcl --userinit /dev/null --load nbody.sbcl_compile

cat > nbody.sbcl_run <<EOF
(proclaim '(optimize (speed 3) (safety 0) (debug 0) (compilation-speed 0) (space 0)))
(main) (quit)
EOF

time sbcl --dynamic-space-size 500 --noinform --core nbody.core --userinit /dev/null --load nbody.sbcl_run 1000000

For CPython, I precompiled each script into bytecode first:

python3.1 -OO -c "from py_compile import compile; compile('nbody.python3-4.py')"

Benchmark results

All running times are wall clock time from the Unix time command. I ran each benchmark 5 times and used the best result.

Factor LuaJIT SBCL V8 CPython
fasta 2.597s 1.689s 2.105s 3.948s 35.234s
reverse-complement 2.377s 1.764s 2.955s 3.884s 1.669s
nbody 0.393s 0.604s 0.402s 4.569s 37.086s
binary-trees 1.764s 6.295s 1.349s 2.119s 19.886s
spectral-norm 1.377s 1.358s 2.229s 12.227s 1m44.675s
regex-dna 0.990s N/A 0.973s 0.166s 0.874s
knucleotide 1.820s 0.573s 0.766s 1.876s 1.805s

Benchmark analysis

Some notes on the results:

  • There is no Lua implementation of the regex-dna benchmark.
  • Some of the SBCL benchmark implementations can make use of multiple cores if SBCL is compiled with thread support. However, by default, thread support seems to be disabled on Mac OS X. None of the other language implementations being tested have native thread support, so this is a single-core performance test.
  • Factor’s string manipulation still needs work. The fasta, knucleotide and reverse-complement benchmarks are not as fast as they should be.
  • The binary-trees benchmark is a measure of how fast objects can be allocated, and how fast the garbage collector can reclaim dead objects. LuaJIT loses big here, perhaps because it lacks generational garbage collection, and because Lua’s tables are an inefficient object representation.
  • The regex-dna benchmark is a measure of how efficient the regular expression implementation is in the language. V8 wins here, because it uses Google’s heavily-optimized Irregexp library.
  • Factor beats the other implementations on the nbody benchmark because it is able to make use of SIMD.
  • For some reason SBCL is slower than the others on spectral-norm. It should be generating the same code.
  • The benchmarks exercise insufficiently-many language features. Any benchmark that uses native-sized integers (for example, an implementation of the SHA1 algorithm) would shine on SBCL and suffer on all the others. Similarly, any benchmark that requires packed binary data support would shine on Factor and suffer on all the others. However, the benchmarks in the shootout mostly consist of scalar floating point code, and text manipulation only.

Conclusions

Factor’s performance is coming along nicely. I’d like to submit Factor to the computer language shootout soon. Before doing that, we need a Debian package, and the deploy tool needs to be easier to use from the command line.