Removing duplicates, boolean arrays, ARM
Sunday, January 21, 2007
Comp Sci lesson learned: For some reason I was always under the
impression that one cannot remove duplicates from an array while
preserving order in O(n) time. This is why Factor had two words, prune
and hash-prune
.
The former preserved order and ran in quadratic time; it would loop over the input sequence, and see if each element was already present in an accumulator, if not, it would add it to the accumulator.
The latter did not preserve order and ran in linear time; it would add all elements as keys in a new hashtable, then output the sequence of keys.
Well it turns out there’s a way to remove duplicates while keeping order
in O(n) time. The idea is that as you inspect each element of the input
sequence in turn, you check if it is present in a hashtable; if it is,
you keep going, otherwise you push the element on an accumulator and add
it to the hashtable. I guess this is obvious to most people but for some
reason I was not aware of this algorithm and did not even think this is
possible. The implementation is simple enough, and I changed the prune
word to use this algorithm, and removed hash-prune
:
: (prune) ( hash vec elt -- )
rot 2dup hash-member?
[ 3drop ] [ dupd dupd set-hash swap push ] if ; inline
: prune ( seq -- newseq )
dup length <hashtable> over length <vector>
rot [ >r 2dup r> (prune) ] each nip ;
A new data structure: The Unix native I/O code has had some words
for managing a space-efficient bit array for quite some time. These
words were not really useful outside of Unix I/O, and were not safe
either. I cleaned them up and made them more useful; now Factor has a
boolean-array
class which supports the sequence protocol and behaves
just like an array, except it can only store t or f, and it uses much
less space. Dan asked for this to make the XML parser more efficient,
and it was about time Factor got a boolean array type, anyway.
ARM: Doug Coleman got me a Netstix ARM computer as an early birthday present. Thanks Doug! I was going to buy one anyway when I felt like porting Factor to ARM, but I guess he really wanted an ARM port now :-) So I’ve decided to make this port a priority for 0.88. I’m still finishing some compiler changes and setting up the software on the Netstix, but once that is done I will start working on the port. Running on ARM will be nice, because it will open up the world of cellphones and handhelds to Factor. Especially because Factor is both high level and fast.