Bizarre floating-point behavior with vs. without extra variables, why?

Question

When I run the following code in VC++ 2013 (32-bit, no optimizations):

#include <cmath>
#include <iostream>
#include <limits>

double mulpow10(double const value, int const pow10)
{
    static double const table[] =
    {
        1E+000, 1E+001, 1E+002, 1E+003, 1E+004, 1E+005, 1E+006, 1E+007,
        1E+008, 1E+009, 1E+010, 1E+011, 1E+012, 1E+013, 1E+014, 1E+015,
        1E+016, 1E+017, 1E+018, 1E+019,
    };
    return pow10 < 0 ? value / table[-pow10] : value * table[+pow10];
}

int main(void)
{
    double d = 9710908999.008999;
    int j_max = std::numeric_limits<double>::max_digits10;
    while (j_max > 0 && (
        static_cast<double>(
            static_cast<unsigned long long>(
                mulpow10(d, j_max))) != mulpow10(d, j_max)))
    {
        --j_max;
    }
    double x = std::floor(d * 1.0E9);
    unsigned long long y1 = x;
    unsigned long long y2 = std::floor(d * 1.0E9);
    std::cout
        << "x == " << x << std::endl
        << "y1 == " << y1 << std::endl
        << "y2 == " << y2 << std::endl;
}

I get

x  == 9.7109089990089994e+018
y1 == 9710908999008999424
y2 == 9223372036854775808

in the debugger.

I'm mindblown. Can someone please explain to me how the heck y1 and y2 have different values?

Update:

This only seems to happen under /Arch:SSE2 or /Arch:AVX, not /Arch:IA32 or /Arch:SSE.

The difference is because it's 64-bit float in one case and 80-bit float (that's how float is represented in math co-processor) in another. This should give you a good start. — sharptooth, Jan 31 '14 at 11:27
@sharptooth: Sorry what? Isn't everything here a `double`, which is 64-bit? — user541686, Jan 31 '14 at 11:27
@Mehrdad Floating point arithmetic may be executed with greater precision than indicated by its type, and the situations in which this happens don't need to be documented. — , Jan 31 '14 at 11:29
@Mehrdad: I don't know any more details, I just remember I've heard something like that. I'd write an answer if I had a better explanation. — sharptooth, Jan 31 '14 at 11:29
@hvd: Er, that makes no sense to me -- the value of `x` I see is **correct**, whether its arithmetic was done with 64-bit or 80-bit precision. Since it's a 64-bit variable that means 64 bits were enough to represent it. That also means converting it to an `unsigned long long` should give the correct answer whether it's 64 bits or 80 bits. Who cares if `x` is 64-bit or 80-bit if it has the correct value? How does that affect anything? — user541686, Jan 31 '14 at 11:31
@Mehrdad By what logic is it correct? It is not equal to the mathematical value of `std::floor(9710908999.0089989 * 1.0E9)`. You have a point that the value of `y2` is too significantly different to be caused by this, but it's normal if it's not *exactly* equal to `y1`. — , Jan 31 '14 at 11:35
@hvd: Yes, I know floating point isn't exact, you're missing the point completely by nitpicking there. By "correct" I just mean it makes sense, i.e. it's within roundoff error. 9223372036854775808 is **not** correct or within roundoff error, it's a sign of some overflow. — user541686, Jan 31 '14 at 11:36
For what it worth, I don't reproduce on Linux with gcc which I know has issues with extended precision. — AProgrammer, Jan 31 '14 at 11:37
Other compilers seem to be more consistent: http://codepad.org/fuoERz3k gives y1 = y2 = 9710908999008999424. — Paul R, Jan 31 '14 at 11:37
@AProgrammer: See my update. It works fine in VC++ too if I just don't use SSE2. So I'm really confused... is it a codegen bug? It happens in Debug mode... — user541686, Jan 31 '14 at 11:39
`is it a codegen bug?` I wouldn't look for another explanation. Floating point bugs happen routinely with Microsoft compilers. At one of my jobs we had to switch several modules to icc because of them. — n. 'pronouns' m., Jan 31 '14 at 11:46
Given the small size of the example, I would simply look at the generated code to find out what is going on. — JasonD, Jan 31 '14 at 11:48
@Mehrdad What is your version of VS compiler ? After testing it with VS2013, it shows coherent results with every arch setting. — galop1n, Jan 31 '14 at 11:49
Linking the wrong standard library? (I'm not a VS user, I don't even know if they have different standard libraries for SSE/SSE2 options) — AProgrammer, Jan 31 '14 at 11:51
@n.m.: Interesting, I'm a little hesitant to call it a bug, I'm wondering if it might be allowed by C++. I'll post the disassembly in a few mins. — user541686, Jan 31 '14 at 11:52
@Mehrdad I couldn't reproduce it with VC++ (2012), regardless of the `/Arch` arguments. — James Kanze, Jan 31 '14 at 11:54
I just realized it seems to depend on the code around it (I was testing it in the middle of another function)... let me see if I can give standalone example. — user541686, Jan 31 '14 at 11:55
Has anybody pointed out yet that 9223372036854775808 == 2^63 ? Looks like a bug residue to me. — laune, Jan 31 '14 at 12:01
@JamesKanze: I updated it with a piece of code to reproduce it. Can you reproduce it? — user541686, Jan 31 '14 at 12:12
@galop1n: I updated the question and gave a sample piece of code that reproduces the bug for me. — user541686, Jan 31 '14 at 12:13
@Mehrdad I still can't reproduce it. I added output to your code, so that I could see the values. But maybe it depends on the optimization options. — James Kanze, Jan 31 '14 at 12:41
I get the same result you get. I'm seeing a call to __ftol2 (convert to signed long long), which matches the possible problem suggested in the answers, but makes no sense at all to me. — , Jan 31 '14 at 17:43
@JamesKanze: I don't have any optimizations enabled actually. The compiler flags `"%VS120COMNTOOLS%..\..\VC\bin\cl.exe" /nologo /I "%VS120COMNTOOLS%..\..\VC\include" Temp.cpp /link /LibPath:"%VS120COMNTOOLS%..\..\VC\lib" /LibPath:"%ProgramFiles(x86)%\Microsoft SDKs\Windows\v7.1A\Lib"` trigger this just fine. Are you printing via `std::cout` or via `printf`? For some reason printf doesn't seem to work, it prints zeros for me. — user541686, Jan 31 '14 at 17:43
@hvd: Thanks, glad to know you could repro it. The bizarre thing is: if you get rid of the `static_cast(static_cast(mulpow10(d, j_max))) != mulpow10(d, j_max)` condition in the loop, you don't get this output anymore, even though there's still `__ftol2` being used! — user541686, Jan 31 '14 at 17:44
Hmm. __ftol2 is actually used for both conversions to signed long long and to unsigned long long, but when converting to unsigned long long, quite some extra code surrounds it to handle out-of-range values. — , Jan 31 '14 at 17:57
@hvd: It seems like calling `_fpreset()` before `y2 = std::floor(d * 1.0E9)` fixes it, but I'm baffled why. — user541686, Jan 31 '14 at 18:46
@Mehrdad I've been looking at the generated code, and in the non-working version, the correct value is stored in memory, another value is loaded from memory, and then the just stored (!) value is loaded from memory again, but it doesn't give the original value. — , Jan 31 '14 at 19:07
@Mehrdad It looks like there is an FPU stack overflow. Some part of the `while` loop is leaving values on the FPU stack, and loading the just stored value later fails because there is no more room for that value. — , Jan 31 '14 at 19:31
@hvd: Hmm what would cause that? I don't really know how the FPU works unfortunately, or why it needs a stack, etc. — user541686, Jan 31 '14 at 19:57
@Mehrdad The FPU has eight registers, but uses them in a stack-like fashion, so loading a floating-point value into a register always loads it into register ST0. The other registers get moved around: ST1 gets ST0's old value, ST2 gets ST1's old value, etc. There are also popping instructions to free the stack. The push and pop operations need to be balanced, or things break. :) — , Jan 31 '14 at 20:06
@hvd: Interesting, thanks for the info. I'm a bit confused though, why doesn't this happen when I specify /Arch:IA32? — user541686, Jan 31 '14 at 21:15
It's the SSE2 implementation of the conversion helper function that has this issue. — , Jan 31 '14 at 21:30

ecatmur · Answer 1 · 2014-01-31T11:58:53.077

5

9223372036854775808 is 0x8000000000000000; that is, it is equal to INT64_MIN cast to uint64_t.

It looks like your compiler is casting the return value of floor to long long and then casting that result to unsigned long long.

Note that it is quite usual for overflow in floating-point-to-integral conversion to yield the least representable value (e.g. cvttsd2siq on x86-64):

When a conversion is inexact, a truncated result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.

(this is from the doubleword documentation, but the quadword behaviour is the same.)

edited Jan 31 '14 at 11:58

answered Jan 31 '14 at 11:53

ecatmur

137,771
23
263
343

2

Why would the compiler convert `std::floor(9710908999.0089989 * 1.0E9)` to `long long` and then to `unsigned long long` but not convert `x` to `long long` and then to `unsigned long long`? – Eric Postpischil Jan 31 '14 at 12:17
Interestingly I don't get any FP exceptions even when I use `/fp:except`... but good point about the value. – user541686 Jan 31 '14 at 17:50

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

5

You are converting out-of-range double values to unsigned long long. This is not allowed in standard C++, and Visual C++ appears to treat it really badly in SSE2 mode: it leaves a number on the FPU stack, eventually overflowing it and making later code that uses the FPU fail in really interesting ways.

A reduced sample is

double d = 1E20;
unsigned long long ull[] = { d, d, d, d, d, d, d, d };
if (floor(d) != floor(d)) abort();

This aborts if ull has eight or more elements, but passes if it has up to seven.

The solution is not to convert floating point values to an integer type unless you know that the value is in range.

4.9 Floating-integral conversions [conv.fpint]

A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [ Note: If the destination type is bool, see 4.12. -- end note ]

The rule that out-of-range values wrap when converted to an unsigned type only applies if the value as already of some integer type.

For whatever it's worth, though, this doesn't seem like it's intentional, so even though the standard permits this behaviour, it may still be worth reporting this as a bug.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jan 31 '14 at 20:00

+1 wow I did not know this is undefined behavior. Thanks for the analysis! – user541686 Jan 31 '14 at 21:32
By the way, how do I test if the value is within range? I can't convert `std::numeric_limits::max()` to `double` since it would lose precision... – user541686 Jan 31 '14 at 21:36
@Mehrdad I think you could use `< max()` for the common case, and add special handling for the edge case, but I'm not completely sure. – Jan 31 '14 at 21:45
@Mehrdad Thinking about it some more, you need to know if `double(std::numeric_limits::max())` is exact, rounded down, or rounded up. I'm not sure what the easiest way to do that would be, but as a last resort, convert it to a string. If it's exact, then the conversion is safe if `x - double(std::numeric_limits::max()) < 1`. If it's rounded up, then the conversion is safe if `x < double(std::numeric_limits::max())`. If it's rounded down, then the conversion is safe if `x <= std::numeric_limits::max()`. – Jan 31 '14 at 22:40
@Mehrdad I hope there is a way to write this in a simple way that works for all three cases, but I cannot come up with anything. (I'm ignoring negative values, as you probably noticed.) – Jan 31 '14 at 22:41
He's converting to `unsigned long long`, though, not `signed long long`. The result is within range for an `unsigned long long`. – tmyklebu Feb 01 '14 at 05:14
1

@tmyklebu The conversion that ends up failing is within range, but an earlier conversion is not, and that earlier conversion is the problem. – Feb 01 '14 at 05:43

score 2 · Answer 3 · answered Jan 31 '14 at 12:08

2

Hypothesis: It is a bug. The compiler converts double to unsigned long long correctly but converts extended-precision floating-point (possibly long double) to unsigned long long incorrectly. Details:

double              x = std::floor(9710908999.0089989 * 1.0E9);

This computes the value on the right-hand side and stores it in x. The value on the right-hand side might be computed with extended precision, but it is, as the rules of C++ require, converted to double when stored in x. The exact mathematical value would be 9710908999008998870, but rounding it to the double format produces 9710908999008999424.

unsigned long long y1 = x;

This converts the double value in x to unsigned long long, producing the expected 9710908999008999424.

unsigned long long y2 = std::floor(9710908999.0089989 * 1.0E9);

This computes the value on the right-hand side using extended precision, producing 9710908999008998870. When the extended-precision value is converted to unsigned long long, there is a bug, producing 2⁶³ (9223372036854775808). This value is likely the “out of range” error value produced by an instruction that converts the extended-precision format to a 64-bit integer. The compiler has used an incorrect instruction sequence to convert its extended-precision format to an unsigned long long.

answered Jan 31 '14 at 12:08

Eric Postpischil

141,624
10
138
247

I don't think this answer is right because it runs fine on /Arch:IA32... shouldn't it break there if it had trouble with extended precision? – user541686 Jan 31 '14 at 12:18
+1 logtwo(9710908999.0089989 * 1.0E9) = 63.0743 clearly fits into an unsigned lomng long – Jan 31 '14 at 12:18
1

@Mehrdad Different platforms / different instruction sequences – Jan 31 '14 at 12:23
2

@Mehrdad: The code in the compiler to generate instructions for IA-32 has to be somewhat different from the code to generate instructions for Intel 64. The code may be written years apart by different people. Certainly one may have a bug that the other does not. – Eric Postpischil Jan 31 '14 at 12:52
@EricPostpischil: What's Intel 64? I'm not testing on Intel 64, this is all 32-bit code. I'm only testing on /Arch:IA32 and /Arch:SSE2. But if you think it's a bug with one and not the other then I understand. – user541686 Jan 31 '14 at 17:28
@Mehrdad: “Intel 64” is the proper name for the 64-bit Intel architecture; that is the way it is used in Intel documentation. I presumed it might be in use since you mentioned a difference using the `/Arch:IA32` switch or not. However, I see from [this documentation](http://msdn.microsoft.com/en-us/library/7t5yh4fd.aspx) that the switch is misnamed; it does not select the IA-32 architecture but rather distinguishes the lack of extensions such as SSE, SSE2, and AVX. Regardless, the code in the compiler to generate instructions for different target features must be different. – Eric Postpischil Jan 31 '14 at 17:49

score 0 · Answer 4 · answered Jan 31 '14 at 11:35

You have casted y1 as a double before casting it again to a long. the value of x isn't the "floor" value but a rounded value for floor.

Same logic would apply with casting integers and floats. float x = (float)((int) 1.5) will give a different value to float x = 1.5

Bizarre floating-point behavior with vs. without extra variables, why?

Update:

4 Answers4

Linked

Related