Prevalence of shuffle words and dataflow combinators
Monday, December 15, 2008
Today on IRC, binarybandit
wrote this word:
: pie-chart ( keys values -- url )
[ "|" join ] [ [ number>string ] map "," join ] bi*
"http://chart.apis.google.com/chart?cht=p&chs=500x250&chl=%s&chd=t:%s" sprintf ;
(Side note: it uses the recently-contributed printf
vocabulary, which
is a nice piece of work: it parses the format string uses PEGs and
compiles to quotations at compile time).
This inspired me to make some pie charts showing the frequency of usages of shuffle words and dataflow combinators in the Factor library. So first, we need a word which takes a sequence of words and counts the number of usages of each, normalizing the results so that their sum is 1:
: usage-histogram ( words -- keys values )
[ [ name>> ] [ usage length ] bi ] { } map>assoc
dup values sum '[ _ /f ] assoc-map
sort-values unzip ;
Now we can use a meta-programming trick to extract the list of shufflers from the shuffle words help article:
"shuffle-words" >link uses [ word? ] filter usage-histogram pie-chart print
Here is the resulting chart:
Note that it counts the number of usages of each shuffler, so a word
that calls a shuffler more than once is not counted, and a word which
uses two distinct shufflers is counted twice. You can see that dup, drop
and swap are by far the most popular, and the more complex patterns are
rarely used. It is interesting that when beginners first start writing
Factor they tend to write code which is heavily biased towards the less
frequently-used shufflers. I think teaching dataflow combinators before
shufflers can help here, because they make it easier to structure your
code in a clean way.
For cleave/spread/apply combinators, we have to do a bit more work to get a list of them programatically, since they’re spread over several help articles:
[ "cleave-combinators" "spread-combinators" "apply-combinators" ]
[ >link uses ] map concat { either? both? } diff
[ word? ] filter
Now we can make the pie chart as before. Here is the result:
Again, bi
completely dominates.
I’m not sure what to make of these results, other than that the Google Charts API is pretty nifty, and that there is untapped potential for “code data mining” in Factor’s reflection capabilities. We could use this to discover potential abstractions and patterns.