On floating point determinism 6


It’s been recently on twitter a small discussion on floating point determinism. It started off as a C# talk, but then deviated in out of context quotes coming from multiple platforms and languages that added confusion; but it’s hard to explain it in 140 char limit.

So, the issue I’m addressing here… is floating point deterministic?

Floating point standards

First we have to distinguish between floating point, it’s standard, and it’s implementation. There are many standards (and non-standards), but the most common one is the IEEE 754, so I will talk only about that one. For example, the Playstation 2 did not produce IEEE754 standard compliant floating point math at all.

In theory, all implementations strictly conforming to the standard should be deterministic even across different platforms*. However in practice this is not true at all. This can be caused by simple problems like a CPU not really conforming to the standard, to more deep, hard-to-solve problems inherent to how floating point numbers work.

*Update: Not even in theory. (thanks Bruce!)

Floating point is not associative… nor anything

Floating point are not real numbers, this means that the following three formulas can yield a slightly different result:

a + (b + c)  !=  (a + b) + c

Floating point will be deterministic if you always do (a + b) + c in all your platforms; or if you do a + (b + c) in all of them. But as soon as it start to mix hell breaks loose.

For C & C++ programs, this deviation may happen because you used a different compiler, a new version of the same compiler; different compile settings (eg. turn on/off optimizations) or a compiler for a different architecture, which has a different instruction set and a different way to be optimized.

For example, a + b + c + d can be optimized by a compiler to perform “(a + b) + (c + d)” unless it’s on strict or precise settings. And some compilers will refuse to optimize that unless the brackets are explicitly written.

(a+b) + (c+d) is an optimization because a single CPU chip can perform the first two sums in “parallel” using what’s known as instruction pairing (or just pipelining) because there is no data dependency.

What happens if the Windows version compiles as (a + b) + (c + d); GCC for Linux compiles as (a+c) + (b + d); the XBox360 version compiles as ((a + b) + c) + d, and the PS3 version as a + (b + (c + d))?

The answer is that the code is not deterministic across each platform, but it is each to its own platform. All users running the same version on Windows will get the same results, and all the users running the same build on Linux will get the same results. But the results between the Linux & Windows versions won’t match.

Update: It was pointed out to me by Christer Ericson (thanks!) that a + b + c is either (a + b) + c; or a + (b +c); so the expression “a + b + c != a + (b + c)  !=  (a + b) + c” is not true; but the expression “a + (b + c)  !=  (a + b) + c” is. I’ve corrected the error.

 

Complicating things at assembly level

Above I just said that floating point isn’t associative… nor anything. That’s because floating point may look in practice as if they aren’t commutative (a + b = b + a), although in theory they really are.

For example the following bit of code can surprise you:

#include <iostream>
#include <math.h>

void foo(double x, double y)
{
  if (cos(x) != cos(y))
  {
    std::cout << "Huh?!?\n";  // you might end up here when x == y!!
  }
}

int main()
{
  foo(1.0, 1.0);
  return 0;
}

Yes, you may or may not end up in the “Huh?!?” section. Note the “you may or may not”. This is exactly the opposite that someone looking for determinism wants.

The reasons for this behavior is already documented in the C++ FAQ, so if you’re a curious programmer, you probably should’ve already know about it already and shouldn’t surprise you.

But if it does, just to briefly repeat the C++ FAQ’s explanation: in assembly the first cosine is calculated and then it’s result is stored in a temporary variable in ram, and truncated/rounded as a side effect. Then the second cosine is calculated, kept in a register at full precision and then compared with the truncated value in RAM.

Another possibility is that both results are kept in registers and the comparison succeeds as expected. That will depend on register pressure caused by the extension of the code, and the compiler

It’s a matter of quoting in context

Given the same compiler, same platform and same architecture, you will always have determism guaranteed, at least as long as you stick to IEEE 754 (and as long as you do your job to keep it working, i.e. don’t use uninitialized variables, cause race conditions, or use other sources of undeterminism, etc).

And this is what was being talked about in this link and in this one. Those quotes were about games written in C++, for a particular platform.

I agree with eemerson that having deterministic games is a huge bonus. You get free gameplay features (replays, load & save games), multiplayer features (lockstep), and an extremely powerful debugging tool (reproduce almost every bug analyzing how it evolves frame by frame and the condition that caused them, even crashes)

By the way, Havok is a fully deterministic floating point based physics engine, because:

  1. The manual says so :P (and documents proper care of the engine to maintain it’s deterministic propriety)
  2. I use it in Distant Souls, and I can attest that I use determinism for lockstep multiplayer and replays, and never got a problem keeping determinism except when it was my fault. Havok even warns you when using a function improperly that can break determinism. It rocks.

An excellent in-depth technical analysis on how to keep floating point determinism working in practice is performed by Glen Fiedler.

Implementations aren’t all the same

A standard is always the same, but the interpretations are not. Sometimes standards allow for specific deviations and “undefined behaviors”. Otherwise, there would be no difference between Internet Explorer, Mozilla, Chrome and Opera. And all OpenGL driver implementations should render the same way and not bi*ch about proper glsl syntax not compiling (that was a rant).

What this means is that even if you manage to produce very similar instructions at assembly level in two different IEEE compliant architectures that should behave the same (i.e. same truncation, same order of operations, etc), you still may get different results.

Sometimes it’s not even an interpretation, but just implementation bugs. There are very notorious ones, and not so notorious: I for instance, happened to own a (now defunct) Sempron 2200+ that produced slightly different floating point results when taking it to extremely large values. Sorry, I can’t remember the stepping number (but it was a Socket 462, a single core Sempron from 2004). We found out when we where testing a cancelled RTS game and made a simple floating point determinism checker. We tested many Intel & AMD machines, and they all passed, except for my Sempron that consistently failed in a few tests involving large numbers. We were shocked and this kinda explained why I got very ocasional desyncs playing Warcraft III while I was younger.

I prefer ints for determinism

I rely on floating point determinism. My animations impact on the gameplay and thus have to be deterministic. The physics also need it.

However I use the right representation for the right job, and honestly floating point details can be hard to track and visualize.

I keep health & damage as ints. Stat boosts and debufs as ints too. I try to keep as much as the logic side I can as ints. They’re easier to work with, and more predictable. I’m not a purist. Like a tweet suggested, this can easily devolve like the UDP vs TCP debate.

I don’t See Sharp

Ok, bad pun. The original tweet was about C#. As the stackoverflow link states, C# uses a JIT, which:

  1. Can produce floating point instructions that varies across different versions of the .Net runtime even on the same platform & arch (see the associativity optimization problem).
  2. The JIT is allowed to work in higher precision than requested (see the cos(1) != cos(1) problem) and cannot be forced to work in particular fixed precision.

Because of at least these two reasons, C# code may not even produce the same results in the same machine after a simple dll upgrade. Definitely, C# and floating point determinism don’t mix. And that’s ultimately what the tweet was about.


Leave a Comment

Your email address will not be published. Required fields are marked *


5 − Ţwo =

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

6 thoughts on “On floating point determinism

  • Reply
    Bruce Dawson

    You start by saying that these three formulae are not the same:

    a + b + c != a + (b + c) != (a + b) + c

    Clearly the second two are not the same, but it’s not clear what you mean by the first one and it should compile to one of the other two, probably (a + b) + c. The only other plausible possibility is that it could compile to (a + c) + b. However that seems unlikely, and if that is what you mean you should say so.

    Similarly, you ask what happens if “the XBox360 version compiles as a + b + c + d?”, but it’s not clear what you mean by that.

    Finally, you say “In theory, all implementations strictly conforming to the standard should be deterministic even across different platforms”, but this is not true. The IEEE standard does not, last I check, promise determinism. It leaves some decisions up to the implementation. For instance, intermediate precision is not mandated by the standard and can legitimately vary depending on your compiler and processor. See this article for details:

    http://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/

    Other things that are not fully specified are the results of transcendentals such as sin and cos.

  • Reply
    Matias Post author

    Thanks Bruce!

    Indeed the a + b + c confusion was brought to my attention by Christer Ericson on Twitter but I was too tired to change it immediately. While fixing the post, I noticed the XBox360 example had the same flaw. Your comment went right through moderation a couple of minutes after my changes. I apologize for not being more quick to fix it.

    As for IEEE not promising determinism, you’re right. Thank you for pointing it out. I don’t know why I said that. It would only add confusion to my very own next statement: “What this means is that even if you manage to produce very similar instructions at assembly level in two different IEEE compliant architectures that should behave the same (i.e. same truncation, same order of operations, etc), you still may get different results.”
    I’ve corrected the first statement to avoid any confusion.

  • Reply
    jon w

    Also, on intel 32bit cpus, but not amd, you have to turn off 80 bit precision doubles in the register file.
    Also, set the rounding mode flags consistently.
    And SSE changes between revisions!

  • Reply
    Bruce Dawson

    Note that the cos(x) != cox(y) case should be very rare, in the form you give. VC++ sets the x87 FPU to double precision, and you are calling the double precision version of cos, so all the results should be double precision and there is no possibility of a problem. If you set the FPU to extended-double-precision then it could be an issue. Also if you call the float versions of cos (which return a float) then it could be a problem.

    In general this issue should only apply to x87 FPUs, which are going away. On any other FPU this issue can be avoided relatively easily by compiler writers.

    Note that register pressure should not be relevant in this case. When the compiler calls cos() the second time it is, in VC++, required to flush all data from the x87 registers, so the first result will always be stored to memory.

    Note that floating-point determinism is simple, by any means. In fact this pushed me to finish off a blog post on the topic. Coming soon.