An algorithm for escape analysis
Sunday, August 10, 2008
In compiler implementation, escape analysis is the process of discovering which values defined within a procedure are returned or passed down to another subroutine – these are escaping values. Compound objects which are allocated inside the procedure, and which do not escape, are candidates for stack allocation. If applied in isolation, the main benefit of stack allocation is reduced GC overhead; however with generational garbage collection, this is not so important. The real payoff comes from combining stack allocation with unboxing – if all usages of a compound object are replaced by its constituent values, further optimizations can be applied; the constituents might get stored in registers, or become candidates for value numbering, etc. Sometimes this optimization is referred to as scalar replacement.
I implemented escape analysis in the new compiler I’m working on for Factor. Since I couldn’t find a description of the algorithm with the properties I wanted in the literature, I came up with my own and documented it here.
To start with, in Factor, almost everything is done with subroutine calls – words are very short, and object slot access involves calling another word. And most objects pass through shuffle words on their way to their destination. So escape analysis only makes sense after inlining and type inference, and we also need to treat shuffle words specially, not as other subroutine calls. The latter is already taken care of, however; the compiler has special knowledge of shuffle words so that it can perform non-local optimizations.
A more subtle issue is that of writable slots. At present, my escape analysis only considers tuples where all slots are declared read-only as candidates for unboxing. Writing into slots creates difficulties; aliasing must be considered, and mixing the unboxing of mutable objects together with multi-shot continuations doesn’t seem to make sense either (capturing a continuation copies the stack but not the heap; so if we restore a multishot continuation several times and read a mutable slot, we can observe whether unboxing took place or not by seeing if restoring the continuation restored its old value). Technically, I could extend it to unbox tuples whose slots are never written into, not just those whose slots are forced to be read-only. I will probably make this change if it avoids complicating the code too much.
Escape analysis is very beneficial in Factor; for example, iteration over a virtual sequence can be rewritten by the compiler to avoid allocating the virtual sequence altogether. It also enables programming idioms such as “cursors”: you can represent the iteration state over a collection as an object, and have the object be unboxed at compile time, turning high-level functional code into tight loops in registers. In general, by reducing the cost of allocating temporary objects, better factoring is encouraged and cleaner code can be written.
The idea is basically to calculate which values may be unboxed, and for each such value, we replace its allocation with a no-op, replace each slot selection with a stack shuffle, and “widen” each shuffle that it passes through to permute its slots.
For example, suppose we define a new data type, the academic functional
programmer’s favorite,
TUPLE: cons { car read-only } { cdr read-only } ;
Now if we have this code,
: foo-0 ( -- a b ) 1 2 cons boa dup car>> swap cdr>> ;
We would expect the compiler to convert it to something like the following:
: foo-0' ( -- a b ) 1 2 2dup drop rot nip ;
The transformation proceeds as follows:
- the constructor was removed altogether
- the
dup
became2dup
because now the cons is two items on the stack - the first slot selection (
car>>
) becamedrop
since this drops the second slot value while retaining the first one - the
swap
becamerot
since one of the inputs was the tuple, which is now two values on the stack - the second slot selection (
cdr>>
) became anip
since this drops the first slot value while retaining the second one
The codegen would then simplify the shuffling to a no-op, and the word would become nothing more than a push of two literal values on the stack.
To formalize the above, we need to come up with a good definition for a value which “escapes”. We formulate this in terms of the inputs and outputs of the various nodes in the high-level SSA IR of the Factor compiler. Since this is SSA, each value is defined only once, so we may identity a value with its definition; we can say that a value is the result of a specific operation, such as a memory allocation.
First, every value has an entry in the allocation map. The entry is
either f
, meaning the value’s definition has not been seen yet, t
,
meaning the value is allocated outside of this word, or a sequence of
values, meaning that this value is a tuple allocation whose slots are
from this sequence. When the analysis is done, no value has an entry of
f
, however while the analysis is in progress, we may encounter such
cases, and they have to be handled – basically, an entry of f
means
“this entry may change to anything else in the future”.
Clearly, a value which is returned from the word in question escapes –
these values are stored as the input to a #return
node in the compiler
IR. Also, values which are inputs to subroutine calls escape – these
are the #call
nodes. One exception is that the slot selection
primitive slot
, when called with a constant integer argument, does not
force its input value to escape; after all, we need to have some way of
accessing slots of stack-allocated objects. Calls to this primitive
appear after type inference allows slot accessors to be inlined.
Finally, values which are inputs to a call of the tuple construction
primitive <tuple-boa>
are not considered to escape either; we would
like to unbox nested (but not circular) structure.
In the following two, the allocation escapes:
: foo-1 ( -- cons ) 1 2 cons boa ;
: foo-2 ( -- ) 1 2 cons boa . ;
Here it doesn’t – after inlining, car>>
becomes 3 slot
:
: foo-3 ( -- obj ) 1 2 cons boa car>> ;
In the following, neither of the two allocations escape:
: foo-4 ( -- obj ) 1 2 cons boa 3 cons boa car>> car>> ;
Now we run into our first complication. Consider the following code:
: foo-5 ( -- cons ) 1 2 cons boa 3 cons boa car>> ;
The first allocation escapes, the second one does not. Now clearly the
result of car>>
(3 slot
after inlining) escapes by our above
criteria; how do we link it with the output of 1 2 cons boa
? Well, we
can construct an undirected graph where the vertices are values, and two
vertices are linked by an edge if a sufficient and necessary condition
for one of them to escape is for the other to escape. We can thus look
at connected components of this graph instead of individual values – if
any member of the connected component escapes, they all do. I call this
graph the “escaping values graph”.
So in the above example foo-5
, we add an edge between the first input
to the second cons boa
call, and the output of car>>
, and when we
encounter the #return
node and see that the output of car>>
escapes,
we will also know that its input escapes; hence the first allocation
escapes, but the second one does not.
A number of nodes in the IR are created when shuffle words are converted
– these are #>r
, #r>
, and #shuffle
. They do not make values
escape, but they do define new values (this is SSA), and whether those
values escape or not depends on whether their inputs escape, and vice
versa.
This allows us to catch that the following allocation escapes:
: foo-6 ( -- cons ) 1 2 cons boa >r "Hello world" print r> ;
but the following doesn’t:
: foo-7 ( -- obj ) 1 2 cons boa >r "Hello world" print r> cdr>> ;
A related type of node is the #copy
node, which arises as a result of
other optimizations. We also add edges to the escaping values graph
mapping its inputs to its outputs.
This pretty much sums it up for straight-line code without branches or recursion. You build a value graph to track value equivalence, mark certain values as escaping when you encounter call and return nodes, then when you’re done, you know that the allocations whose connected components in the graph do not contain escaping values can be safely unboxed.
The situation with conditionals is trickier. Suppose you have a word
bar
which outputs a cons, but cannot be inlined for whatever reason,
and you use the word as follows:
: foo-8 ( ? -- obj ) [ 1 2 cons boa ] [ bar ] if car>> ;
Here, the first allocation does not escape – the only place it can ever
be used is the car>>
; but if we replace it with two items on the stack
and change the call to car>>
into a stack shuffle, then in the event
of the second branch being taken, we would get incorrect behavior; here
we certainly do want the accessor to be called, and we don’t expect
bar
to return two values on the stack either.
So clearly, in the above, we have to somehow infer that the result of
the allocation escapes, even though it is never an input to a call or a
return. How do we do this? Well, here is the SSA form for the above, in
somewhat stylized, C-like form, with x
the top of the stack:
if(x)
y1 = cons(1,2);
else
y2 = bar();
y = phi(y1,y2);
z = y.car;
return (z);
The problem here is the #phi
node has two inputs, one of which has an
entry in the allocation map {y1,y2}
, and the other one has a value of
t
, since it is allocated outside of this word. Therefore, we cannot
unbox y1
, since that would mess up the #phi
node. So we add a check
for this; if the inputs to a #phi
nodes have incompatible allocation
map entries, either because of them is a sequence of values and the
other is t
, or because they are both sequences of values of different
length, we mark all values as escaping, ruling out any unboxing. On the
other hand, if all entries are compatible, we have to add new #phi
nodes to unify the slots. Even if unboxing does not take place for other
reasons, these #phi
nodes are important to track relations between
tuple slots.
For example, consider this program:
: foo-9 ( ? -- obj ) [ 1 2 cons boa ] [ 3 4 cons boa ] if car>> ;
The two allocations are compatible and the compiler should rewrite the above into roughly the following:
: foo-9' ( ? -- obj ) [ 1 ] [ 3 ] if ;
Here is the SSA form of foo-9
:
if(x)
{
a1 = 1;
b1 = 2;
y1 = cons(a1,b1);
}
else
{
a2 = 3;
b2 = 4;
y2 = cons(a2,b2);
}
y = phi(y1,y2);
z = y.car;
return (z);
Escape analysis would add the following entries to the allocation map:
a1 => t
a1 => t
y1 => (a1,b1)
a2 => t
a3 => t
y2 => (a2,b2)
a3 => t
b3 => t
y => (a3,b3)
The last three entries are added as a result of the #phi
node being
traversed.
As for the value graph, the slot access would link z
with a3
as soon
as we encounter the statement z = y.car;
.
However, we actually need to add more edges to the value graph than just
this one. Consider the following program, almost the same as foo-9
except we return the cons cells on the stack:
: foo-10 ( ? -- cons ) [ 1 2 cons boa ] [ 3 4 cons boa ] if ;
Here, we cannot unbox either value and both allocations escape. However,
while we would end up marking the output of the #phi
as an escaping
value, nothing would be done to the outputs of cons boa
, resulting in
incorrect code. So what we need to do is link the inputs and outputs of
a phi
in the escaping values graph, thus if any one of them escapes,
they all escape. Here is the SSA form for foo-10
:
if(x)
{
a1 = 1;
b1 = 2;
y1 = cons(a1,b1);
}
else
{
a2 = 3;
b2 = 4;
y2 = cons(a2,b2);
}
y = phi(y1,y2);
/* Inserted by escape analysis */
a3 = phi(a1,a2);
b3 = phi(b1,b2);
return (y);
Here is the allocation map:
a1 => t
a1 => t
y1 => (a1,b1)
a2 => t
a3 => t
y2 => (a2,b2)
a3 => t
b3 => t
y => (a3,b3)
And here are the edges of the escaping values graph for foo-10
:
a3 -- a1
a3 -- a2
b3 -- b1
b3 -- b2
y -- y1
y -- y2
Because y
escapes, none of the values in its connected component –
namely, y
, y1
, and y2
– are candidates for unboxing.
This approach handles all other corner cases involving branches also:
: foo-11 ( ? -- cons ) [ 1 2 cons boa dup . ] [ 3 4 cons boa ] if car>> ;
Here, the first allocation cannot be unboxed, and this prevents the
second allocation from being unboxed since they meet at a #phi
node.
Here is another example:
: foo-12 ( ? -- cons ) [ 1 2 cons 3 cons boa ] [ bar 4 cons boa ] if car>> car>> ;
What happens here? Well, let’s look at the SSA form:
if(x)
{
a1 = 1;
b1 = 2;
y1 = cons(a1,b1);
c1 = 3;
z1 = cons(y1,c1);
}
else
{
y2 = bar();
c2 = 4;
z2 = cons(y2,c2);
}
z = phi(z1,z2);
/* Inserted by escape analysis */
y3 = phi(y1,y2);
c3 = phi(c1,c3);
u = z.car;
v = u.car;
return(u);
Here is the allocation map:
a1 => t
b1 => t
y1 => {a1,b1}
c1 => t
z1 => {y1,c1}
y2 => t
c2 => t
z2 => {y2,c2}
y3 => t
c3 => t
z => {y3,c3}
u => t
v => t
Here are the edges of the value graph:
y3 -- y1
y3 -- y2
z -- z1
z -- z2
v -- y3
Note that while y1
does not escape, it is in the same connected
component as y2
, which does escape, therefore y1
cannot be unboxed.
However, z1
and z2
can both be unboxed.
Recursion and loops arising from usage of combinators such as each
create #phi
nodes where some of the inputs are defined after the
#phi
node in a post-order traversal of the dominator graph.
For example,
10 [ . ] each
yields the following SSA form:
int x0 = 0;
int x1 = 10;
L: x2 = phi(x0,x3);
if(x2 < x1)
{
.(x2);
x3 = x2 + 1;
goto L;
}
Note that x3
is defined after the #phi
node referencing it.
When the escape allocation encounters, some of the inputs have entries
of f
in the allocation map. These values are ignored when checking for
compatibility; we optimistically assume that they will later be defined
in a way which does not force the value to be escaping. After the first
iteration of the analysis, all values should now have non-f
entries in
the allocation map. We repeat the analysis until the allocation map
entries of the outputs of #phi
nodes stop changing; while it is
possible to construct code where this can take any number of iterations,
in real examples it should terminate after one or two passes.
For example, in the following, the allocations can be unboxed:
: foo-13 ( n -- obj ) 1 2 cons boa swap [ drop 3 4 cons boa ] times car>> ;
We’d expect the compiler to convert the code into the following:
: foo-13' ( n -- obj ) 1 2 rot [ 2drop 3 4 ] times drop ;
Here is the SSA form:
int a1 = 1;
int b1 = 2;
int y1 = cons(a1,b1);
int x0 = 0;
L: x2 = phi(x1,x3);
/* Inserted by escape analysis */
a3 = phi(a1,a2);
b3 = phi(b1,b2);
y2 = phi(y1,y3);
if(x2 < n)
{
x3 = x2 + 1;
a2 = 3;
b2 = 4;
y3 = cons(a2,b2);
goto L;
}
z = y2.car;
return(z);
Here is the allocation map:
a1 => t
b1 => t
y1 => {a1,b1}
x0 => t
x2 => t
a3 => t
b3 => t
y2 => {a3,b3}
x3 => t
a2 => t
b2 => t
y3 => {a2,b2}
z => t
And the edges of the escaping values graph:
a3 -- a1
a3 -- a2
x2 -- x1
x2 -- x3
b3 -- b1
b3 -- b2
y2 -- y1
y2 -- y3
The only value marked as escaping is z
, and its connected component
only contains itself, so the two allocations may be unboxed.
On the other hand, if we have
: foo-14 ( n -- obj ) 1 2 cons boa [ 3 cons boa ] times car>> ;
Unboxing cannot take place. We cannot unbox the second allocation, because it builds recursive structure, and we cannot unbox the first because its result is stored inside the second allocation, which is not unboxed. Here is the SSA form:
int a1 = 1;
int b1 = 2;
int y1 = cons(a1,b1);
int x0 = 0;
L: x2 = phi(x1,x3);
y2 = phi(y1,y3);
/* Inserted by escape analysis */
a3 = phi(y2,a1);
b3 = phi(b1,a2);
if(x2 < n)
{
x3 = x2 + 1;
a2 = 3;
y3 = cons(y2,a2);
goto L;
}
And the allocation map:
a1 => t
b1 => t
y1 => {a1,b1}
x0 => t
x2 => t
a3 => t
b3 => t
y2 => {a3,b3}
x3 => t
a2 => t
y3 => {y2,a2}
z => t
And the edges of the escaping values graph:
a3 -- a1
a3 -- y2
x2 -- x1
x2 -- x3
b3 -- b1
b3 -- a2
y2 -- y1
y2 -- y3
We see that y2
and y3
are in the same connected component as a3
,
which has an allocation map entry of t
, therefore neither one may be
unboxed.
As far as the implementation goes, the above gives a high-level overview. The only tricky part is the implementation of the escaping values graph. I used a disjoint set (union find); adding an edge adds equivalences, and marking a value as an escaping value makes it equivalent to a special “escaping” object. Checking if a value is in the same connected component as an escaping value then is just a check to see if the value is equivalent to “escaping”. A disjoint set is a slightly exotic data structure which perform this check in what is essentially constant time.
I implemented this algorithm; for now, you can find it in the unfinished/compiler/tree/escape-analysis directory. The pass which actually performs the unboxing using the information collected during escape analysis is in unfinished/compiler/tree/tuple-unboxing. In comparison to the analysis, the rewrite stage is pretty simple, so I won’t describe the details here.