Advanced floating point features: exceptions, rounding modes, denormals, unordered compares, and more
Saturday, September 12, 2009
Factor now has a nice library for introspecting and modifying the floating point environment. Joe Groff implemented most of it, and I helped out with debugging and additional floating point comparison operations. All of these features are part of the IEEE floating point specification and are implemented on all modern CPUs, however few programming languages expose a nice interface to working with them. C compilers typically provide hard-to-use low-level intrinsics and other languages don’t tend to bother at all. Two exceptions are the SBCL compiler and the D language.
The new functionality is mostly contained in the math.floats.env
vocabulary, with a few new words in math
for good measure. The new
code is in the repository but it is not completely stable yet; there are
still some issues we need to work out on the more obscure platforms.
To follow along with the examples below, you’ll need to get a git checkout from the master branch and load the vocabulary in your listener:
USE: math.floats.env
The first two features, floating point exceptions and traps, are useful for debugging numerical algorithms and detecting potentially undesirable situations (NaNs appearing, underflow, overflow, etc).
Floating point exceptions
One of the first things people learn about floating point is that it has “special” values: positive and negative infinity, and not-a-number (NaN) values. These appear as the results of computations where the answer is undefined (division by zero, square root of -1, etc) or the answer is too small or large to be represented as a float (2 to the power of 10000, etc). A less widely-known fact is that when a special value is computed, “exception flags” are set in a hardware register which can be read back in. Most languages do not offer any way to access this functionality.
In Factor, exception flags can be read using the collect-fp-exceptions
combinator, which first clears the flags, calls a quotation, then
outputs any flags which were set. For example, division by zero sets the
division by zero exception flag and returns infinity:
( scratchpad ) [ 1.0 0.0 / ] collect-fp-exceptions . .
{ +fp-zero-divide+ }
1/0.
Dividing 1 by 3 sets the inexact flag, because the result (0.333….) cannot be represented as a float:
( scratchpad ) [ 1.0 3.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.3333333333333333
The fact that 1/3 does not have a terminating decimal or binary expansion is well-known, however one thing that many beginners find surprising is that some numbers which have terminating decimal expansions nevertheless cannot be represented precisely as floats because they do not terminate in binary (one classic case is 1.0 - 0.9 - 0.1 != 0.0):
( scratchpad ) [ 4.0 10.0 / ] collect-fp-exceptions . .
{ +fp-inexact+ }
0.4
Raising a number to a power that is too large sets both the inexact and overflow flags, and returns infinity:
( scratchpad ) [ 2.0 10000.0 ^ ] collect-fp-exceptions . .
{ +fp-inexact+ +fp-overflow+ }
1/0.
The square root of 4 is an exact value; no exceptions were set:
( scratchpad ) [ 4.0 sqrt ] collect-fp-exceptions . .
{ }
2.0
The square root of 2 is not exact on the other hand:
( scratchpad ) [ 2.0 sqrt ] collect-fp-exceptions . .
{ +fp-inexact+ }
1.414213562373095
Factor supports complex numbers, so taking the square root of -1 returns an exact value and does not set any exceptions:
( scratchpad ) [ -1.0 sqrt ] collect-fp-exceptions . .
{ }
C{ 0.0 1.0 }
However, we can observe the invalid operation exception flag being set
if we call the internal fsqrt
word, which operates on floats only and
calls the libc function (or uses the SQRTSD
instruction on SSE2):
( scratchpad ) USE: math.libm [ -1.0 fsqrt ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
NAN: 8000000000000
I describe the new NAN:
syntax later in this post.
Signaling traps
Being able to inspect floating point exceptions set after a piece of code runs is all well and good, but what if you have a tricky underflow bug, or a NaN is popping up somewhere, and you want to know exactly where? In this case it is possible to set a flag in the FPU’s control register which triggers a trap when an exception is raised. This trap is delivered to the Factor process as a signal (Unix), Mach exception (Mac OS X), or SEH exception (Windows). Factor then throws it as an exception which can be caught using any of Factor’s error handling words, or just left unhandled in which case it will bubble up to the listener.
The with-fp-traps
combinator takes a list of traps and runs a
quotation with those traps enabled; when the quotation completes (or
throws an error) the former FPU state is restored again (indeed it has
to be this way, since running the Factor UI’s rendering code with traps
enabled quickly kills it). The all-fp-exceptions
word is equivalent to
specifying
{ +fp-invalid-operation+ +fp-overflow+ +fp-underflow+ +fp-zero-divide+ +fp-inexact+ }
.
Here is an example:
( scratchpad ) all-fp-exceptions [ 0.0 0.0 / ] with-fp-traps
Floating point trap
Without the combinator wrapped around it, 0.0 0.0 /
simply returns a
NaN value without throwing anything.
Rounding modes
Unlike exceptions and traps, which do not change the result of a computation but merely set status flags (or interrupt it), the next two features, the rounding mode and denormal mode, actually change the results of computations. As with exceptions and traps, they are implemented as scoped combinators rather than global state changes to ensure that code using these features is ‘safe’ and cannot change floating point state of surrounding code.
If a floating point operation produces an inexact result, there is the question of how the result will be rounded to a value representable as a float. There are four rounding modes in IEEE floating point:
+round-nearest+
+round-down+
+round-up+
+round-zero+
Here is an example of an inexact computation done with two different
rounding modes; the default (+round-nearest+
) and +round-up+
:
( scratchpad ) 1.0 3.0 / .
0.3333333333333333
( scratchpad ) +round-up+ [ 1.0 3.0 / ] with-rounding-mode .
0.3333333333333334
Denormals
Denormal numbers are
numbers where the exponent consists of zero bits (the minimum value) but
the mantissa is not all zeros. Denormal numbers are undesirable because
they have lower precision than normal floats, and on some CPUs
computations with denormals are less efficient than with normals. IEEE
floating point supports two denormal modes: you can elect to have
denormals “flush” to zero (+denormal-flush+
), or you can “keep”
denormals (+denormal-keep+
). The latter is the default:
( scratchpad ) +denormal-flush+ [ 51 2^ bits>double 0.0 + ] with-denormal-mode .
0.0
( scratchpad ) 51 2^ bits>double 0.0 + .
1.112536929253601e-308
Ordered and unordered comparisons
In math, for any two numbers a
and b
, one of the following three
properties hold:
- a < b
- a = b
- a > b
In floating point, there is a fourth possibility; a
and b
are
unordered. This occurs if one of the two values is a NaN. The
unordered?
predicate tests for this possibility:
( scratchpad ) NAN: 8000000000000 1.0 unordered? .
t
If an ordered comparison word such as <
or >=
is called with two
values which are unordered, they return f
and set the
+fp-invalid-operation+
exception:
( scratchpad ) NAN: 8000000000000 1.0 [ < ] collect-fp-exceptions . .
{ +fp-invalid-operation+ }
f
If traps are enabled this will throw an error:
( scratchpad ) NAN: 8000000000000 1.0 { +fp-invalid-operation+ } [ < ] with-fp-traps
Floating point trap
If your numerical algorithm has a legitimate use for NaNs, and you wish to run it with traps enabled, and have certain comparisons not signal traps when inputs are NaNs, you can use unordered comparisons in those cases instead:
( scratchpad ) NAN: 8000000000000 1.0 [ u< ] collect-fp-exceptions . .
{ }
f
Unordered versions of all the comparisons are defined now, u<
, u<=
,
u>
, and u>=
. Equality of numbers is always unordered, so it does not
raise traps if one of the inputs is a NaN. In particular, if both inputs
are NaNs, equality always returns f
:
( scratchpad ) NAN: 8000000000000 dup [ number= ] collect-fp-exceptions . .
{ }
f
Half-precision floats
Everyone and their drunk buddy know about IEEE single (32-bit) and
double (64-bit) floats; IEEE also defines half-precision 16-bit floats.
These are not used nearly as much; they come up in graphics programming
for example, since GPUs use them for certain calculations with color
components where you don’t need more accuracy. The half-floats
vocabulary provides some support for working with half-floats. It
defines a pair of words for converting Factor’s double-precision floats
to and from half-floats, as well as C type support for passing
half-floats to C functions via FFI, and building packed arrays of
half-floats for passing to the GPU.
Literal syntax for NaNs and hexadecimal floats
You may have noticed the funny NAN:
syntax above. Previously all NaN
values would print as 0/0.
, however this is inaccurate since not all
NaNs are created equal; because of how IEEE floating point works, a
value is a NaN if the exponent consists of all ones, leaving the
mantissa unspecified. The mantissa is known as the “NaN payload” in this
case. NaNs now print out, and can be parsed back in, using a syntax that
makes the payload explicit. A NaN can also be constructed with an
arbitrary payload using the <fp-nan>
word:
( scratchpad ) HEX: deadbeef <fp-nan> .
NAN: deadbeef
The old 0/0.
syntax still works; it is shorthand for
NAN: 8000000000000
, the canonical “quiet” NaN.
Some operations produce NaNs with different payloads:
( scratchpad ) USE: math.libm
( scratchpad ) 2.0 facos .
NAN: 8000000000022
In general, there is very little you can do with the NaN payload.
A more useful feature is hexadecimal float literals. When reading a float from a decimal string, or printing a float to a decimal string, there is sometimes ambiguity due to rounding. No such problem exists with hexadecimal floats.
An example of printing a number as a decimal and a hexadecimal float:
( scratchpad ) USE: math.constants
( scratchpad ) pi .
3.141592653589793
( scratchpad ) pi .h
1.921fb54442d18p1
Java supports hexadecimal float literals as of Java 1.5. Hats off to the Java designers for adding this! It would be nice if they would add the rest of the IEEE floating point functionality in Java 7.
Signed zero
Unlike twos-complement integer arithmetic, IEEE floating point has both
positive and negative zero. Negative zero is used as a result of
computations of very small negative numbers that underflowed. They also
have applications in complex analysis because they allow a choice of
branch cut to be made. Factor’s abs
word used to be implemented
incorrectly on floats; it would check if the input was negative, and if
so multiply it by negative one. However this was a problem because
negative zero is not less than zero, and so the absolute value of
negative zero would be reported as negative zero. The correct
implementation of the absolute value function on floats is to simply
clear the sign bit. It works properly now:
( scratchpad ) -0.0 abs .
0.0
Implementation
The implementation of the above features consists of several parts:
- Cross-platform Factor code in the math.floats.env vocabulary implementing the high-level API
- Assembly code in vm/cpu-x86.32.S, vm/cpu-x86.64.S, and vm/cpu-ppc.S to read and write x87, SSE2 and PowerPC FPU control registers
- Low-level code in math.floats.env.x86 and math.floats.env.ppc which implements the high-level API in terms of the assembly functions, by calling them via Factor’s FFI and parsing control registers into a cross-platform representation in terms of Factor symbols
- Miscellaneous words for taking floats apart into their bitwise
representation in the
math
vocabulary - Compiler support for ordered and unordered floating point compare instructions in compiler.cfg.instructions
- CPU-specific code generation for ordered and unordered floating point compare instructions in cpu.x86 and cpu.ppc