Switching call stacks on different platforms
Sunday, April 4, 2010
User-space implementations of coroutines and green threads need to be able to switch the CPU’s call stack pointer between different memory regions. Since this is inherently CPU- and OS-specific, I’ll limit my discussion to CPUs and platforms that Factor supports.
System APIs for switching call stacks
- Windows has an API for creating and switching between contexts, which it calls “fibers”. The main functions to look at are CreateFiber() and SwitchToFiber(). On Windows, by far the easiest way to switch call stacks is to just use fibers.
- Most Unix systems have a set of functions operating on
ucontext
s, such asmakecontext()
,swapcontext()
and so on. On some systems, these APIs are poorly implemented and documented. - The C standard library functions
setjmp()
andlongjmp()
can be (ab)used to switch contexts. Thejmpbuf
structure stored to bysetjmp()
and read from bylongjmp()
contains a snapshot of all registers, including the call stack pointer. Once you’ve allocated your own call stack, you can capture ajmp_buf
withsetjmp()
, change the call stack pointer, and then resume execution withlongjmp()
. As far as I know, this is completely undocumented and unsupported with every major C compiler. - The most direct way is to write some assembly to switch the relevant
registers directly. Paradoxically, this approach is the most
portable out of all of these, because while it is CPU-specific, the
details are pretty much the same across most OSes, Windows being the
exception. Switching call stacks on Windows is a bit more involved
than just changing the
ESP
register, and requires updating some Windows-specific in-memory structures. However, it is still easy enough to do directly, if you do not wish to use the fiber API. I will describe the details at the end of this post.
High-level libraries for switching call stacks
A couple of existing libraries implement high-level, cross-platform abstractions using some combination of the above mechanisms.
- libcoroutine uses fibers on Windows, and ucontext and longjmp on Unix.
- libcoro uses a handful of Unix-specific APIs if they’re available, or falls back onto some hand-coded assembly routines. The latter works on Windows.
NetBSD/x86 limitation
There is no realiable way to switch call stacks on NetBSD/x86, because
of a limitation in the implementation of libpthread
. Even if your
program doesn’t use pthreads, it might link with a library that does,
such as Glib. The pthread library replaces various C standard library
functions with its own versions, and since this includes some extremely
common calls such as malloc()
, almost nothing will work as a result.
The reason is that the NetBSD pthread library uses a silly trick to implement thread-local storage, one which unfortunately assumes that every native thread has exactly one call stack. The trick is to always allocate thread call stacks on multiples of the maximum call stack size. Then, each thread’s unique ID can just be the result of masking the call stack pointer by the maximum call stack size.
Windows-specific concerns
So far, I’ve only worked out the details of switching call stacks on 32-bit Windows. I’ll talk about 64-bit Windows in another post.
Windows has a mechanism known as Structured Exception Handling, which is a sort of scoped exception handling mechanism with kernel support. Windows uses SEH to deliver processor traps to user-space applications (illegal memory access, division by zero, etc). Some C++ compilers also use SEH to implement C++ exceptions.
On Windows, simply changing the ESP
register to point at another
memory region is insufficient because structured exception handling must
know the current call stack’s beginning and end, as well as a pointer to
the innermost exception handler record.
These three pieces of information are stored in a per-thread information block, known as the TIB. The fiber switching APIs update the TIB for you, which is why Microsoft recommends their use.
The TIB
The TIB is defined by the NT_TIB
struct in winnt.h
:
typedef struct _NT_TIB {
struct _EXCEPTION_REGISTRATION_RECORD *ExceptionList;
PVOID StackBase;
PVOID StackLimit;
PVOID SubSystemTib;
_ANONYMOUS_UNION union {
PVOID FiberData;
DWORD Version;
} DUMMYUNIONNAME;
PVOID ArbitraryUserPointer;
struct _NT_TIB *Self;
} NT_TIB,*PNT_TIB;
The relevant fields that must be updated when switching call stacks are
ExceptionList
, StackBase
and StackLimit
.
Note that since the x86 call stack grows down, StackLimit
is the start
of the call stack’s memory region and StackBase
is the end.
Structured exception handlers
Structured exception handlers are stored in a linked list on the call
stack, with the head of the list pointed at by the ExceptionList
field
of the TIB:
struct _EXCEPTION_REGISTRATION_RECORD
{
PEXCEPTION_REGISTRATION_RECORD Next;
PEXCEPTION_DISPOSITION Handler;
} EXCEPTION_REGISTRATION_RECORD, *PEXCEPTION_REGISTRATION_RECORD;
The exception list should be saved and restored when switching between
call stacks, and a new call stack should begin with an empty list
(ExceptionList
set to NULL
).
If the ExceptionList
field of the TIB or the Next
field of an
exception record point outside the call stack, then the handler in
question will not be called at all.
Accessing the TIB
On x86-32, the current thread’s TIB is stored starting at address 0 in
the segment pointed at by the FS segment register. The Self
field
always points at the struct itself. Since Windows uses flat addressing,
the segment storing the TIB begins somewhere in the linear 32-bit
address space, so an assembly routine to fetch the TIB and return a
pointer to it in EAX looks like this:
fetch_tib:
mov eax,[fs:24]
ret
This assembly routine can then be called from C code, which can manipulate the TIB like any other structure.
Sample code for updating Windows-specific structures
The basis/cpu/x86/32/winnt/bootstrap.factor file defines assembly subroutines for updating the TIB when switching call stacks. These routines are invoked by the x86-32 non-optimizing compiler backend: