Introducing the profiler
Monday, May 14, 2007
Factor now has a simple call-counting profiler. Usage is simple. You must first enable it, which recompiles all words:
USE: profiler
enable-profiler
Then you can use a combinator to profile quotations:
( scratchpad ) [ "hello " "world" append ] profile
Then you can view results:
( scratchpad ) profile.
profiling 1
append 1
call 1
> 2
< 2
+ 2
check-copy 2
length 4
(copy) 13
Here we see that very few actual words were called; append
is
compiled, and so much inlining and open-coding occurs that pretty much
the only indication of any work being done is the (copy)
word being
invoked once per character.
Another example is reversing a string:
( scratchpad ) [ "hello world" reverse ] profile
( scratchpad ) profile.
profiling 1
> 1
< 1
+ 1
like 1
reverse 1
call 1
check-copy 1
frange? 3
column? 3
groups? 3
- 11
circular? 11
<= 12
(copy) 12
gb? 14
reversed? 14
length 18
1- 22
nth-unsafe 22
>fixnum 33
Here we see more work is done. Type checks, generic arithmetic, and so
on. This is because of how reverse
is implemented; it creates a
virtual reversal sequence then converts this into a sequence of the same
type as the input.
Of course, the profiler is a lot more useful when applied to non-trivial
computations. Here the simple list produced by profile.
gets rather
large, so various other words, such as vocab-profile.
,
usage-profile.
and vocabs-profile.
come in handy. All these words
are documented in the help system, along with a general article
describing the profiler itself. Grab Factor from darcs and enter
"profiling" help
in the UI listener.
The profiler has a long history. I believe even before CFactor got a native compiler, it had a call counting profiler for a little while. When the compiler came along, I removed it because it didn’t work for compiled code. In Factor 0.89, I re-implemented it, intending to add support for compiled profiling before releasing 0.89, however I felt this would delay the release too much, so I left it undocumented and not fully functional.
Last week, I was under the impression that accurate counting was too
much overhead anyway, especially with compiled code, so I scrapped the
profiler and re-implemented it as a sampling profiler which took
measurements every 10ms. Unfortunately, sampling was not as accurate as
I would like, and sampling compiled code was complex; it would require
writing code for every CPU/OS combination that Factor runs on to get the
program counter from a suspended context. Not to mention this didn’t
work
right
on Mac OS X. So I went back to accurate call counting, and implemented
support for profiling compiled code by compiling a simple prolog which
increments a word’s counter at the beginning of every word. This is why
you must call enable-profiling
which recompiles all words first.
Except for pathological cases such as Fibonacci, the overhead is not
that bad, and the implementation is simple.
So the journey is complete and Factor has a profiler now, which supports
both interpreted and compiled code. Even though the profiler is pretty
simple (you basically get a flat call count, not a call graph) I’ve
already used it to optimize a few things. I will use it some more to
optimize the compiler; now that enable-profiler
recompiles all words,
I have even more incentive to speed up compile times.