Advanced floating point features: exceptions, rounding modes, denormals, unordered compares, and more

Saturday, September 12, 2009

Factor now has a nice library for introspecting and modifying the floating point environment. Joe Groff implemented most of it, and I helped out with debugging and additional floating point comparison operations. All of these features are part of the IEEE floating point specification and are implemented on all modern CPUs, however few programming languages expose a nice interface to working with them. C compilers typically provide hard-to-use low-level intrinsics and other languages don’t tend to bother at all. Two exceptions are the SBCL compiler and the D language.

The new functionality is mostly contained in the math.floats.env vocabulary, with a few new words in math for good measure. The new code is in the repository but it is not completely stable yet; there are still some issues we need to work out on the more obscure platforms.

To follow along with the examples below, you’ll need to get a git checkout from the master branch and load the vocabulary in your listener:

USE: math.floats.env

The first two features, floating point exceptions and traps, are useful for debugging numerical algorithms and detecting potentially undesirable situations (NaNs appearing, underflow, overflow, etc).

Floating point exceptions

One of the first things people learn about floating point is that it has “special” values: positive and negative infinity, and not-a-number (NaN) values. These appear as the results of computations where the answer is undefined (division by zero, square root of -1, etc) or the answer is too small or large to be represented as a float (2 to the power of 10000, etc). A less widely-known fact is that when a special value is computed, “exception flags” are set in a hardware register which can be read back in. Most languages do not offer any way to access this functionality.

In Factor, exception flags can be read using the collect-fp-exceptions combinator, which first clears the flags, calls a quotation, then outputs any flags which were set. For example, division by zero sets the division by zero exception flag and returns infinity:

( scratchpad ) [ 1.0 0.0 / ] collect-fp-exceptions . .
{ +fp-zero-divide+ }
1/0.

Dividing 1 by 3 sets the inexact flag, because the result (0.333….) cannot be represented as a float:

( scratchpad ) [ 1.0 3.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.3333333333333333

The fact that 1/3 does not have a terminating decimal or binary expansion is well-known, however one thing that many beginners find surprising is that some numbers which have terminating decimal expansions nevertheless cannot be represented precisely as floats because they do not terminate in binary (one classic case is 1.0 - 0.9 - 0.1 != 0.0):

( scratchpad ) [ 4.0 10.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.4

Raising a number to a power that is too large sets both the inexact and overflow flags, and returns infinity:

( scratchpad ) [ 2.0 10000.0 ^ ] collect-fp-exceptions . .
{ +fp-inexact+ +fp-overflow+ }
1/0.

The square root of 4 is an exact value; no exceptions were set:

( scratchpad ) [ 4.0 sqrt ] collect-fp-exceptions . .
{ }
2.0

The square root of 2 is not exact on the other hand:

( scratchpad ) [ 2.0 sqrt ] collect-fp-exceptions . .
{ +fp-inexact+ }
1.414213562373095

Factor supports complex numbers, so taking the square root of -1 returns an exact value and does not set any exceptions:

( scratchpad ) [ -1.0 sqrt ] collect-fp-exceptions . .
{ }
C{ 0.0 1.0 }

However, we can observe the invalid operation exception flag being set if we call the internal fsqrt word, which operates on floats only and calls the libc function (or uses the SQRTSD instruction on SSE2):

( scratchpad ) USE: math.libm [ -1.0 fsqrt ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
NAN: 8000000000000

I describe the new NAN: syntax later in this post.

Signaling traps

Being able to inspect floating point exceptions set after a piece of code runs is all well and good, but what if you have a tricky underflow bug, or a NaN is popping up somewhere, and you want to know exactly where? In this case it is possible to set a flag in the FPU’s control register which triggers a trap when an exception is raised. This trap is delivered to the Factor process as a signal (Unix), Mach exception (Mac OS X), or SEH exception (Windows). Factor then throws it as an exception which can be caught using any of Factor’s error handling words, or just left unhandled in which case it will bubble up to the listener.

The with-fp-traps combinator takes a list of traps and runs a quotation with those traps enabled; when the quotation completes (or throws an error) the former FPU state is restored again (indeed it has to be this way, since running the Factor UI’s rendering code with traps enabled quickly kills it). The all-fp-exceptions word is equivalent to specifying { +fp-invalid-operation+ +fp-overflow+ +fp-underflow+ +fp-zero-divide+ +fp-inexact+ }. Here is an example:

( scratchpad ) all-fp-exceptions [ 0.0 0.0 / ] with-fp-traps
Floating point trap

Without the combinator wrapped around it, 0.0 0.0 / simply returns a NaN value without throwing anything.

Rounding modes

Unlike exceptions and traps, which do not change the result of a computation but merely set status flags (or interrupt it), the next two features, the rounding mode and denormal mode, actually change the results of computations. As with exceptions and traps, they are implemented as scoped combinators rather than global state changes to ensure that code using these features is ‘safe’ and cannot change floating point state of surrounding code.

If a floating point operation produces an inexact result, there is the question of how the result will be rounded to a value representable as a float. There are four rounding modes in IEEE floating point:

+round-nearest+
+round-down+
+round-up+
+round-zero+

Here is an example of an inexact computation done with two different rounding modes; the default (+round-nearest+) and +round-up+:

( scratchpad ) 1.0 3.0 / .
0.3333333333333333
( scratchpad ) +round-up+ [ 1.0 3.0 / ] with-rounding-mode .
0.3333333333333334

Denormals

Denormal numbers are numbers where the exponent consists of zero bits (the minimum value) but the mantissa is not all zeros. Denormal numbers are undesirable because they have lower precision than normal floats, and on some CPUs computations with denormals are less efficient than with normals. IEEE floating point supports two denormal modes: you can elect to have denormals “flush” to zero (+denormal-flush+), or you can “keep” denormals (+denormal-keep+). The latter is the default:

( scratchpad ) +denormal-flush+ [ 51 2^ bits>double 0.0 + ] with-denormal-mode .
0.0
( scratchpad ) 51 2^ bits>double 0.0 + .
1.112536929253601e-308

Ordered and unordered comparisons

In math, for any two numbers a and b, one of the following three properties hold:

a < b
a = b
a > b

In floating point, there is a fourth possibility; a and b are unordered. This occurs if one of the two values is a NaN. The unordered? predicate tests for this possibility:

( scratchpad ) NAN: 8000000000000 1.0 unordered? .
t

If an ordered comparison word such as < or >= is called with two values which are unordered, they return f and set the +fp-invalid-operation+ exception:

( scratchpad ) NAN: 8000000000000 1.0 [ < ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
f

If traps are enabled this will throw an error:

( scratchpad ) NAN: 8000000000000 1.0 { +fp-invalid-operation+ } [ < ] with-fp-traps    
Floating point trap

If your numerical algorithm has a legitimate use for NaNs, and you wish to run it with traps enabled, and have certain comparisons not signal traps when inputs are NaNs, you can use unordered comparisons in those cases instead:

( scratchpad ) NAN: 8000000000000 1.0 [ u< ] collect-fp-exceptions . .
{ }
f

Unordered versions of all the comparisons are defined now, u<, u<=, u>, and u>=. Equality of numbers is always unordered, so it does not raise traps if one of the inputs is a NaN. In particular, if both inputs are NaNs, equality always returns f:

( scratchpad ) NAN: 8000000000000 dup [ number= ] collect-fp-exceptions . .
{ }
f

Half-precision floats

Everyone and their drunk buddy know about IEEE single (32-bit) and double (64-bit) floats; IEEE also defines half-precision 16-bit floats. These are not used nearly as much; they come up in graphics programming for example, since GPUs use them for certain calculations with color components where you don’t need more accuracy. The half-floats vocabulary provides some support for working with half-floats. It defines a pair of words for converting Factor’s double-precision floats to and from half-floats, as well as C type support for passing half-floats to C functions via FFI, and building packed arrays of half-floats for passing to the GPU.

Literal syntax for NaNs and hexadecimal floats

You may have noticed the funny NAN: syntax above. Previously all NaN values would print as 0/0., however this is inaccurate since not all NaNs are created equal; because of how IEEE floating point works, a value is a NaN if the exponent consists of all ones, leaving the mantissa unspecified. The mantissa is known as the “NaN payload” in this case. NaNs now print out, and can be parsed back in, using a syntax that makes the payload explicit. A NaN can also be constructed with an arbitrary payload using the <fp-nan> word:

( scratchpad ) HEX: deadbeef <fp-nan> .
NAN: deadbeef

The old 0/0. syntax still works; it is shorthand for NAN: 8000000000000, the canonical “quiet” NaN.

Some operations produce NaNs with different payloads:

( scratchpad ) USE: math.libm
( scratchpad ) 2.0 facos .
NAN: 8000000000022

In general, there is very little you can do with the NaN payload.

A more useful feature is hexadecimal float literals. When reading a float from a decimal string, or printing a float to a decimal string, there is sometimes ambiguity due to rounding. No such problem exists with hexadecimal floats.

An example of printing a number as a decimal and a hexadecimal float:

( scratchpad ) USE: math.constants
( scratchpad ) pi .
3.141592653589793
( scratchpad ) pi .h
1.921fb54442d18p1

Java supports hexadecimal float literals as of Java 1.5. Hats off to the Java designers for adding this! It would be nice if they would add the rest of the IEEE floating point functionality in Java 7.

Signed zero

Unlike twos-complement integer arithmetic, IEEE floating point has both positive and negative zero. Negative zero is used as a result of computations of very small negative numbers that underflowed. They also have applications in complex analysis because they allow a choice of branch cut to be made. Factor’s abs word used to be implemented incorrectly on floats; it would check if the input was negative, and if so multiply it by negative one. However this was a problem because negative zero is not less than zero, and so the absolute value of negative zero would be reported as negative zero. The correct implementation of the absolute value function on floats is to simply clear the sign bit. It works properly now:

( scratchpad ) -0.0 abs .
0.0

Implementation

The implementation of the above features consists of several parts:

Cross-platform Factor code in the math.floats.env vocabulary implementing the high-level API
Assembly code in vm/cpu-x86.32.S, vm/cpu-x86.64.S, and vm/cpu-ppc.S to read and write x87, SSE2 and PowerPC FPU control registers
Low-level code in math.floats.env.x86 and math.floats.env.ppc which implements the high-level API in terms of the assembly functions, by calling them via Factor’s FFI and parsing control registers into a cross-platform representation in terms of Factor symbols
Miscellaneous words for taking floats apart into their bitwise representation in the math vocabulary
Compiler support for ordered and unordered floating point compare instructions in compiler.cfg.instructions
CPU-specific code generation for ordered and unordered floating point compare instructions in cpu.x86 and cpu.ppc