Factor Language Blog

Converting a table to a tree

Thursday, April 3, 2008

I found an interesting Haskell snippet in a reddit comment.

From the comment:

*Here’s a slightly longer function I created a while ago:
*

listToForest :: Eq a => [[a]] -> Forest a
listToForest = map toTree . groupBy ((==) `on` head) . filter (/= [])
               where toTree = Node . (head . head) <*> (listToForest . map tail)

The function turns a table of results, such as that which might come back from a database query, into a tree. e.g.

[[A, B, C],
 [A, B, D],
 [A, E, F]]

Becomes:

    A 
   / \
  B   E
 / \   \
C   D   F

I decided to try implementing this in Factor, and I came up with three versions.

All versions represent a tree as its root node, and each node is a hashtable mapping the key of the child to the child node.

The first version does not use any utility words other than what’s in the standard library; it consists of a single, rather long word:

: list>forest ( list -- forest )
    [ empty? not ] subset
    H{ } clone [ [ >r unclip r> [ ?push ] change-at ] curry each ] keep
    [ list>forest ] assoc-map ;

The second version uses a couple of utilities:

: push-at ( value key assoc -- )
    [ ?push ] change-at ;

: classify ( seq classifier -- assoc )
    H{ } clone [ [ slip push-at ] 2curry each ] keep ;

: list>forest ( list -- forest )
    [ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

The third version has a different implementation of classify which uses extra/fry syntax sugar:

: push-at ( value key assoc -- )
    [ ?push ] change-at ;

: classify ( seq classifier -- assoc )
    H{ } clone [ '[ @ , push-at ] each ] keep 

: list>forest ( list -- forest )
    [ empty? not ] subset [ unclip ] classify [ list>forest ] assoc-map ;

If this was a one-off thing I’d use the first version. If I was writing production code, I would use one of the second two, and I might even put the utilities in a shared vocabulary since they’re generally useful. In the last version, the only stack manipulation operator is keep, and everything else just happens to be on the stack in the correct order. This is how good stack code is written.

Some explanation:

  • ?push is like push except it makes a new vector if the input is f
  • slip calls a quotation underneath the top of the stack.
  • push-at adds a value to the sequence stored at a key in a hashtable.
  • classify takes a sequence and a quotation which decomposes a sequence element into a key and a value. It builds an association mapping all the unique keys to corresponding values.
  • unclip takes a sequence and outputs the rest of the sequence followed by the first element.
  • Haskell’s groupBy is a generalization of my classify; it works with an equivalence relation rather than a “fibration” into a set with an implied equivalence relation (wrong term, but I’m not sure what the correct term is).
  • My classify runs in linear time and I suspect groupBy is quadratic, but I could be wrong.

I’d be interested in seeing a version of this function in idiomatic Common Lisp or Scheme. I suspect it would be somewhat longer, but not horribly so.