Persistent hashtables

Wednesday, August 6, 2008

Long time no blog! I’ve been very busy (with Factor), and there has been a lot of activity lately. I’ve been working flat-out on the new compiler, which I will blog about later.

Yesterday and today though, I worked on porting Clojure’s PersistentHashMap to Factor. We already had persistent vectors; the hashtables are more complex. My implementation is pretty much a direct port of the Java code, and in some parts it uses locals heavily. I plan on revisiting it at some point and using more Factor idioms. Even though the code is not idiomatic, it is still pretty easy to follow, and Factor’s direct support from having multiple return values eliminated the need for a temporary ‘box’ object.

My initial tests seem to indicate persistent hashtables are 2-4x slower than normal hashtables for common operations. This isn’t so bad, since its only a constant time slowdown, and you gain immutability, but it is still a bit disappointing. Perhaps they can be optimized further.

The reason I wanted to implement persistent hashtables in the first place is to use them in the new Factor compiler. Instead of cloning a large hashtable repeatedly in one place, which I suspected was leading to quadratic algorithmic complexity, I wanted to use persistent hashtables so that I could add entries in constant time while still retaining older versions for backtracking.

It turns out I found another way to get the performance gain I wanted, and so persistent hashtables are unused in the new compiler for now.

However they were fun to implement, and they’re now part of the basis library; an intermediate layer between core and extra, in the persistent.hashtables vocabulary. They sit alongside persistent.vectors and persistent.heaps, the latter of which was implemented by Daniel Ehrenberg.

The new basis vocabulary root will contain libraries which are not fundamentally part of the language core, and are not required for bootstrap, but those that anyone would expect a language implementation to provide. Examples include various data structures, I/O, XML parsing, database access, the GUI toolkit, the web framework, and so on. I also moved the optimizing compiler to basis, since its not required for bootstrap either.

In order for a vocabulary to be moved to basis, it must implement some functionality that’s frequently used required, and have high quality documentation, robust and optimized implementation, and as many unit tests as possible. Over the next few months, I hope that more vocabularies are moved to basis and that the code quality there only goes up.