How can a big number fit precisely into `double` with 32-bit GCC?

Question

Consider the following code:

#include <iostream>
int main() {
    long long x = 123456789123456789;
    std::cout << std::fixed;
    auto y = static_cast<double>(x);  // (1)
    std::cout << static_cast<long long>(y) << "n";  // (2)
    std::cout << y << "n";
    std::cout << (x == static_cast<long long>(y)) << "n";  // (3)
    std::cout << static_cast<long long>(static_cast<double>(x)) << "n";  // (4)
    std::cout << (x == static_cast<long long>(static_cast<double>(x))) << "n";  // (5)
}

When compiled with 32-bit GCC on Linux (g++ -m32 a.cpp), it prints as follows:

123456789123456784
123456789123456784.000000
0
123456789123456789
1

Note that result of converting long long to double and then back to long long is different depending on how it was done. If I do it via a separate variable double y (lines (1) and (2)), the result ends with 4. But if I do everything in one expression (line (4)), the result ends with 9 just as the original value.

This is quite inconvenient: there exists no double that results in 123456789123456789 when converted to long long, and the check in line (3) confirms that. However, the check in the line (5) passes as if there is one. Is it a bug in GCC or my program?

This behavior started in GCC 9 according to Godbolt link above, GCC 8 works fine. Even funnier, if I add -O2, all expressions are optimized away during compilation and the output is:

123456789123456784
123456789123456784.000000
0
123456789123456784
0

And if I read x from std::cin and keep -O2, the intermediate variable ends in 4, but after conversion to long long a wild 9 appears.

123456789123456789
123456789123456784.000000
1
123456789123456789
1

I believe there is no undefined behavior in the program above:

Conversion from an integer type to a floating-point type is kind of defined: it results in a rounded floating-point value. So either 123456789123456784 or 123456789123456790.
Both of these values fit into long long.

yeputonsyeputons 8,67334 silver badges67 bronze badges · Accepted Answer · 2023-10-14 21:05:26Z

This seems like another instance of the infamous GCC (non-)bug 323. One might argue it’s closer to e.g. bug 85957 as the issue happens with integers and without any computations with floating-point numbers.

However, the underlying issue is probably the same: GCC uses long double for computations under the hood in 32-bit mode because that’s what 8086’s FPU (8087) used back in the old days. Looking at the disassembly:

; auto y = static_cast<double>(x);  // (1)
    ; Load `x` from address `ebp-16` into `ST(0)`, the 80-bit top of the FPU's stack:
    fild    QWORD PTR [ebp-16]
    ; Load the top of the stack into `y` at `ebp-24`, truncated from 80 bits to 64 bits
    fstp    QWORD PTR [ebp-24]
; std::cout << static_cast<long long>(y) << "n";  // (2)
    ; Load `y` from `ebp-24` into `ST(0)`, that's 64 bits value already
    fld     QWORD PTR [ebp-24]
    ...
; std::cout << static_cast<long long>(static_cast<double>(x)) << "n";  // (4)
    ; Load `x` from address `ebp-16` into `ST(0)`, the 80-bit top of the FPU's stack:
    fild    QWORD PTR [ebp-16]
    ; Work with `ST(0)` directly, that's 80-bit
    ...

So, line (1) actually stores the double into a memory location and truncates it to 64 bits, line (2) later takes these 64 bits from the memory. But line (4) works directly with 80-bit registers and 123456789123456789 fits into 80-bit IEEE extended double precision number precisely. Hence, no rounding, and after a bit of code we have this exact value back in long long.

Surprisingly, even adding -msse -msse2 -mfpmath=sse -march=skylake options do not change the outcome on the most recent GCC 13.2. I thought "using SSE instructions instead of x87" (as I believe is the default with g++ -m64) would change something, but it’s not.

If truncating to 64 bits is important, I’d suggest going through an intermediate volatile double y variable:

#include <iostream>
int main() {
    long long x;
    std::cin >> x;
    std::cout << std::fixed;
    volatile auto y = static_cast<double>(x);  // (1)
    std::cout << static_cast<long long>(y) << "n";  // (2)
    std::cout << y << "n";
    std::cout << (x == static_cast<long long>(y)) << "n";  // (3)
    std::cout << static_cast<long long>(static_cast<double>(x)) << "n";  // (4)
    std::cout << (x == static_cast<long long>(static_cast<double>(x))) << "n";  // (5)
}

when compiled with g++ -m32 and given 123456789123456789 as the input it prints:

123456789123456784
123456789123456784.000000
0
123456789123456789
1

volatile forces GCC to actually store and load double into memory, forcing rounding 80-bit value to 64 bits. Not sure how documented that is. A program-wide documented option would be -ffloat-store.

score 6 · Accepted Answer · 2023-10-15 00:53:42Z

GCC does document this non-standard behavior in https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html.

When no -std=c++XX option is given (i.e. if a default -std=gnu++XX is used), then -fexcess-precision is set to fast.

The effect of this is that, in violation of the C and C++ standards, GCC assumes that operations can always be carried out in higher precision than the types permit and whether this will happen in any given instance is unspecified.

The C and C++ standards however require casts and assignment to round to a value representable in the actual target type. Therefore the result of your checks is not permitted to be 1 as you describe. (However, for other, i.e. arithmetic, operations in a single expression the standards do explicitly permit operation in higher precision.)

You get the standard-conforming behavior with -fexcess-precision=standard which is automatically enabled with a -std=c++XX option. However, for C++, it has been implemented only since GCC 13.

Similarly GCC defaults to -ffp-contract=fast, which is also non-conforming and permits GCC to contract floating point operations across statements, i.e. to assume infinite precision intermediate results, while the standards again don’t permit this across statements, casts or assignment. The standard-conforming option is -ffp-contract=on (or -ffp-contract=off to completely disable it).

(This however doesn’t mean that there aren’t still bugs that make the behavior non-conforming as discussed in the bug reports linked in the other answer.)

How can a big number fit precisely into `double` with 32-bit GCC?

How can a big number fit precisely into `double` with 32-bit GCC?

2 Answers 2

2 Answers
2