Persistent hashtables
Wednesday, August 6, 2008
Long time no blog! I’ve been very busy (with Factor), and there has been a lot of activity lately. I’ve been working flat-out on the new compiler, which I will blog about later.
Yesterday and today though, I worked on porting Clojure’s PersistentHashMap to Factor. We already had persistent vectors; the hashtables are more complex. My implementation is pretty much a direct port of the Java code, and in some parts it uses locals heavily. I plan on revisiting it at some point and using more Factor idioms. Even though the code is not idiomatic, it is still pretty easy to follow, and Factor’s direct support from having multiple return values eliminated the need for a temporary ‘box’ object.
My initial tests seem to indicate persistent hashtables are 2-4x slower than normal hashtables for common operations. This isn’t so bad, since its only a constant time slowdown, and you gain immutability, but it is still a bit disappointing. Perhaps they can be optimized further.
The reason I wanted to implement persistent hashtables in the first place is to use them in the new Factor compiler. Instead of cloning a large hashtable repeatedly in one place, which I suspected was leading to quadratic algorithmic complexity, I wanted to use persistent hashtables so that I could add entries in constant time while still retaining older versions for backtracking.
It turns out I found another way to get the performance gain I wanted, and so persistent hashtables are unused in the new compiler for now.
However they were fun to implement, and they’re now part of the basis
library; an intermediate layer between core
and extra
, in the
persistent.hashtables
vocabulary. They sit alongside
persistent.vectors
and persistent.heaps
, the latter of which was
implemented by Daniel
Ehrenberg.
The new basis
vocabulary root will contain libraries which are not
fundamentally part of the language core, and are not required for
bootstrap, but those that anyone would expect a language implementation
to provide. Examples include various data structures, I/O, XML parsing,
database access, the GUI toolkit, the web framework, and so on. I also
moved the optimizing compiler to basis, since its not required for
bootstrap either.
In order for a vocabulary to be moved to basis
, it must implement some
functionality that’s frequently used required, and have high quality
documentation, robust and optimized implementation, and as many unit
tests as possible. Over the next few months, I hope that more
vocabularies are moved to basis
and that the code quality there only
goes up.