ARM port day 2
Wednesday, January 31, 2007
I worked on the ARM assembler today; it now supports all ARM condition codes (almost every instruction is conditional), as well as “addressing mode 1” instructions (arithmetic), multiplication, and “addressing mode 2” instructions (single value load/store). Multiple load/stores and a few other rarely-used instructions are still missing:
The last time I was porting Factor to a new platform, the compiler
worked differently; it would write generated machine code directly to
memory, which made the assembler impossible to unit test. Now, the
assembler just appends machine code to the array being built by an
enclosing call to make, making it very easy to unit test. I did
test-driven development today; I used the GNU assembler to assemble bits
of code, add a unit test asserting that Factor generated the same
machine code for the given input, and code away until the tests
passed:
ARM assembly is quite interesting, with lots of operand and addressing modes. The instruction encoding also involves lots of bit fields, so I had to come up with a new abstraction to put together integers with shifts and ors.
The PowerPC instruction format is not as complicated as ARM, so the PowerPC assembler used to just have words which would shift and or values by hand:
: insn ( operand opcode -- ) 26 shift bitor , ;
: a-form ( d a b c xo rc -- n )
    >r 1 shift >r 6 shift >r 11 shift >r 16 shift >r 21 shift
    r> bitor r> bitor r> bitor r> bitor r> bitor ;
: b-form ( bo bi bd aa lk -- n )
    >r 1 shift >r 2 shift >r 16 shift >r 21 shift
    r> bitor r> bitor r> bitor r> bitor ;
: d-form ( d a simm -- n )
    HEX: ffff bitand >r 16 shift >r 21 shift r> bitor r> bitor ;
: sd-form ( d a simm -- n ) swapd d-form ;
: i-form ( li aa lk -- n )
    >r 1 shift bitor r> bitor ;
: x-form ( a s b xo rc -- n )
    swap
    >r 1 shift >r 11 shift >r swap 16 shift >r 21 shift
    r> bitor r> bitor r> bitor r> bitor ;
: xfx-form ( d spr xo -- n )
    1 shift >r 11 shift >r 21 shift r> bitor r> bitor ;
: xo-form ( d a b oe rc xo -- n )
    swap
    >r 1 shift >r 10 shift >r 11 shift >r 16 shift >r 21 shift
    r> bitor r> bitor r> bitor r> bitor r> bitor ;
This was not too bad for PowerPC, but for ARM this strategy would have been unmanageable from the start. Here is what the same part of the PowerPC assembler looks like with the new abstraction:
: insn ( operand opcode -- ) { 26 0 } bitfield , ;
: a-form ( d a b c xo rc -- n ) { 0 1 6 11 16 21 } bitfield ;
: b-form ( bo bi bd aa lk -- n ) { 0 1 2 16 21 } bitfield ;
: s>u16 ( s -- u ) HEX: ffff bitand ;
: d-form ( d a simm -- n ) s>u16 { 0 16 21 } bitfield ;
: sd-form ( d a simm -- n ) s>u16 { 0 21 16 } bitfield ;
: i-form ( li aa lk -- n ) { 0 1 0 } bitfield ;
: x-form ( a s b xo rc -- n ) { 1 0 11 21 16 } bitfield ;
: xfx-form ( d spr xo -- n ) { 1 11 21 } bitfield ;
: xo-form ( d a b oe rc xo -- n ) { 1 0 10 11 16 21 } bitfield ;
This expresses the intent of the code much more clearly. The ARM assembler uses much more complicated bitfield specifiers, such as:
: (BX) ( Rm l -- )
    {
        { 1 24 }
        { 1 21 }
        { BIN: 111 16 }
        { BIN: 1111 12 }
        { BIN: 1111 8 }
        5
        { 1 4 }
        { register 0 }
    } insn ;
In the above word, we are building a bit field where some values come
from the stack, some are literal, and the last one is obtained by
applying the register word to a stack value. Writing this out by hand
would be a pain in any language. Fortunately Factor makes it very easy
to build mini-DSLs like this.
Here are some ARM assembly instructions, with GNU and Factor syntax side by side:
sub ip, fp, #4               IP FP 4 SUB
addeqs r0, ip, r9 lsl #2     R0 IP R9 2 <LSL> S ?EQ ADD
ldr r1, [r5 - #4]            R1 R5 4 <-> LDR
str r1, [r5 + #8]!           R1 R5 8 <!+> LDR
As you can see, it looks a bit funny, but remember that the whole point of this exercise is to write an assembler library which is called dynamically by the compiler to emit code; this is not for users who want to write applications in assembly.
Another thing I did was sort out how to flush the instruction cache on
ARM (thanks to Mackenzie Straight aka eiz in #concatenative), so I
provided a proper implementation of the flush_icache function, which
is called by the Factor VM to flush the instruction cache when a new
compiled code block is added. On PowerPC, this is implemented in
assembly, but on ARM only the kernel is permitted to flush the
instruction cache so it has to go through a system call. Also, this
system call is not actually exported by glibc. However it is easy enough
to call it directly using macros from asm/unistd.h:
#define __NR_cacheflush __ARM_NR_cacheflush
INLINE _syscall3(void,cacheflush,void *,start,void *,end,unsigned long,flags);
INLINE void flush_icache(CELL start, CELL len)
{
 cacheflush((void *)start,(void *)(start + len),0);
}
Tomorrow, I’ll do some more work on the assembler and see what other instructions are needed by the backend. I’ll also start the backend proper and have Factor compiling simple (and perhaps complex) words. And of course I’ll document what I did in a blog entry, just like I did today and yesterday.