A compiler optimization
Friday, January 12, 2007
Here are the times to run the demos/raytracer
benchmark in recent
Factor versions:
Version | Time (seconds) |
---|---|
0.85 | 43 |
0.86 | 51 |
0.87 | 51 |
0.88 | 29 |
I’m still not sure what’s behind the regression from 0.85 to 0.86, maybe I can reclaim that time too.
The performance gain comes from improved type inference. The compiler is
now able to remove overflow checks on loop index variables if it can
prove that they are bounded above by a fixnum. The raytracer benefits
form this because it calls vector
arithmetic words with arrays as arguments, and the length of an array is
always a fixnum.
Another less impressive gain is found for a similar reason when reading a 3mb file into a string:
Version | Time (seconds) |
---|---|
0.87 | 1.8 |
0.88 | 1 |
The code to read a file into a string:
"/Users/slava/repos/Factor/boot.image.ppc" 4 1024 * 1024 * [ stream-copy ] keep >string drop
I/O performance still sucks, as you can see. I omitted the 0.85 and 0.86 results because they are the same as 0.87.
I will now explain where the performance gain comes from.
The running example which will be used in the below description is the following short snippet:
dup type 0 eq? [ dup 5 < [ 1+ ] when ] when
The compiler here is able to remove both the type dispatch and overflow
check on the 1+
, transforming it from a full-fledged call to a
polymorphic word into a single instruction. This example is contrived
but it is exactly the simplest form of the very realistic case of a loop
counter being incremented in a recursive word, bounded by a fixnum such
as an array length.
The type inferencer maintains two forms of state; the association between values and their inferred classes in the current control flow path, and a map of constraints, where a constraint is basically a logical statement of the form “the value X has class Y”. In the constraint map, each key/value pair is like an implication; if the constraint in the key becomes true, then the inferencer assumes that the constraint in the value becomes true.
In the above example, type inference proceeds as follows. When the
inferencer sees the call to type
, it adds a bunch of constraint pairs
to the map of constraints:
(the output of 'type' is equal to 0) -> (the input is a fixnum)
(the output of 'type' is equal to 1) -> (the input is a bignum)
...
Then, the inferencer sees the eq?
word, and notices that one of the
inputs is a literal. This forces a new constraint pair to be added to
the constraint map:
(the output of 'eq?' is equal to t) -> (the first input of 'eq?' is equal to 0)
Next, the inferencer proceeds to infer types inside the when
conditional. Upon entering the body of the conditional, the inferencer
makes an assumption:
(the type of the top of the stack is t)
Upon making this assumption, it checks the constraint map for any
additional facts that this assumption may imply. As it turns out, the
constraints set up previously yield some information. It was recorded
that the assumption that the top of the stack is true implies that
(the first input of 'eq?' is equal to 0)
, which implies that
(the input to 'type' is a fixnum)
.
So inside the branch, the type inferencer now knows that the top of the
stack is a fixnum. The call to <
is inlined and the type dispatch is
removed, thus replacing it with fixnum<
.
This is where the fun begins. If fixnum<
returns true, then the first
input is less than the second. This is completely obvious. However, the
second input must be less than or equal to the most positive fixnum. So
by transitivity, the first input must be strictly less than the most
positive fixnum. The type inferencer knows this, and the fixnum<
word
defines a constraint to this effect.
There is a new type in Factor now:
PREDICATE: fixnum small 1+ fixnum? ;
The fixnum<
word defines a constraint of the form:
(the output of 'fixnum<' is t) -> (the first input is small)
Once again, we have a branch, inside which the type inferencer assumes
the top of the stack is t
, which by a previous implication forces the
type inferencer to assume the top of the stack is small.
At this point, the call to 1+
is reached, and the definition of the
polymorphic 1+
is inlined and specialized to the case of a fixnum
input. This replaces the call to 1+
with 1 fixnum+
.
The type inference is almost done. However, there is still one thing to
do. An optimizer hook for fixnum+
word notices that one of its inputs
is small and the other input is 1, and this replaces the call to
fixnum+
with fixnum+fast
, which has no overflow check and compiles
to a single instruction.
Here is an example of very pretty stack code defining the type inference
rule for <
:
! if a is less than b and b is a fixnum, then a is small
\ < [
[
small 0 `input class,
fixnum 0 `input class,
fixnum 1 `input class,
general-t 0 `output class,
] set-constraints
] "constraints" set-word-prop
It is worth noting that as long as your integers still have arbitrary precision, none of these optimizations would be any easier with a static type system. I’m sure GHC does something very similar to this.