Tool to tally contributor patch counts

Tuesday, January 9, 2007

I wrote a little Factor program which runs displays the number of patches submitted by each contributor to the Factor darcs repository. You can find it in demos/contributors.

It works as follows:

spawns a darcs process, asking it to emit a changelog in XML
parses the XML and extracts author attributes of patch tags
Computes the tally

This code uses the new hash-prune word which is only found in 0.88. It is like prune (which removes duplicates from a sequence) except that it does not retain order, and is much faster.

Here is the output, with actual e-mail addresses censored:

{ 
    { 1535 { "slava@..." } }
    { 270 { "chris.double@..." } }
    { 226 { "erg@t..." } }
    { 180 { "wayo.cavazos@..." } }
    { 50 { "matthew.willis@..." } }
    { 33 { "microdan@..." } }
    { 11 { "Benjamin Pollack <benjamin.pollack@...>" } }
    { 7 { "chapman.alex@..." } }
    { 4 { "Kevin Reid <kpreid@...>" } }
    { 2 { "lypanov@..." } }
    { 1 { "agl@..." } }
}

I’d like to do more with XML in the future, so I can hopefully suggest some new abstractions to Daniel, and help clean up the naming scheme of the XML processing words. I think Factor has the potential to simplify XML processing considerably over many other languages.

Here is the code:

REQUIRES: libs/process libs/xml ;
USING: memory io process sequences prettyprint kernel arrays
xml xml-utils ;
IN: contributors

: changelog ( -- xml )
    image parent-dir cd
    "darcs changes --xml-output" "r" <process-stream> read-xml ;

: authors ( xml -- seq )
    children-tags [ "author" <name-tag> prop-name ] map ;

: patch-count ( authors author -- n )
    swap [ = ] subset-with length ;

: patch-counts ( authors -- assoc )
    dup hash-prune [ [ patch-count ] keep 2array ] map-with ;

: contributors ( -- )
    changelog authors patch-counts sort-keys reverse . ;

PROVIDE: demos/contributors ;

MAIN: demos/contributors contributors ;