Argument order to std::min changes compiler output for floating-point

Question

I was fiddling in Compiler Explorer, and I found that the order of arguments passed to std::min changes the emitted assembly.

Here's the example on Godbolt Compiler Explorer

double std_min_xy(double x, double y) {
    return std::min(x, y);
}

double std_min_yx(double x, double y) {
    return std::min(y, x);
}

This is compiled (with -O3 on clang 9.0.0, for example), to:

std_min_xy(double, double):                       # @std_min_xy(double, double)
        minsd   xmm1, xmm0
        movapd  xmm0, xmm1
        ret
std_min_yx(double, double):                       # @std_min_yx(double, double)
        minsd   xmm0, xmm1
        ret

This persists if I change the std::min to an old-school ternary operator. It also persists across all the modern compilers I tried out (clang, gcc, icc).

The underlying instruction is minsd. Reading the documentation, the first argument of minsd is also the destination for the answer. Apparently xmm0 is where my function is supposed to put its return value, so if xmm0 is used as the first argument, there is no movapd needed. But if xmm0 is the second argument, then it has to movapd xmm0, xmm1 to get the value into xmm0. (editor's note: yes, x86-64 System V passes FP args in xmm0, xmm1, etc., and returns in xmm0.)

My question: why doesn't the compiler switch the order of the arguments itself, so that this movapd isn't necessary? It surely must know that the order of arguments to minsd does not change the answer? Is there some side-effect that I'm not appreciating?

Probably simply because saving a single register swap in a very rare case isn't worth the effort — Alan Birtles, Sep 26 '20 at 21:24
@AlanBirtles I would hope that's not how people writing compiler optimizations think. It pains me. I'm trying to make myself not care, because it won't matter in my context, but still it hurts. — RaveTheTadpole, Sep 26 '20 at 22:02
@AlanBirtles: Rave is correct, that's not how compiler devs think. If this was really a missed optimization (instead of required by strict FP semantics), gcc and clang developers would probably appreciate having a missed-optimization bug filed. (Although it would probably already be a known bug; hard-register constraints placed by calling convention requirements do sometimes lead to wasted `mov` or `movaps` instructions when gcc does register allocation, which you wouldn't see in the middle of a larger function after inlining.) — Peter Cordes, Sep 26 '20 at 22:35
@bolov I meant that the compiler could have switched the order of the `minsd` operands to save the `movapd`. (But, as I'm learning, it can't do that.) — RaveTheTadpole, Sep 26 '20 at 22:48

Peter Cordes · Accepted Answer · 2020-09-28T00:22:14.387

minsd a,b is not commutative for some special FP values, and neither is std::min, unless you use -ffast-math.

minsd a,b exactly implements (a<b) ? a : b including everything that implies about signed-zero and NaN in strict IEEE-754 semantics. (i.e. it keeps the source operand, b, on unordered¹ or equal). As Artyer points out, -0.0 and +0.0 compare equal (i.e. -0. < 0. is false), but they are distinct.

std::min is defined in terms of an (a<b) comparison expression (cppreference), with (a<b) ? a : b as a possible implementation, unlike std::fmin which guarantees NaN propagation from either operand, among other things. (fmin originally came from the C math library, not a C++ template.)

See What is the instruction that gives branchless FP min and max on x86? for much more detail about minss/minsd / maxss/maxsd (and the corresponding intrinsics, which follow the same non-commutative rules except in some GCC versions.)

Footnote 1: Remember that NaN<b is false for any b, and for any comparison predicate. e.g. NaN == b is false, and so is NaN > b. Even NaN == NaN is false. When one or more of a pair are NaN, they are "unordered" wrt. each other.

With -ffast-math (to tell the compiler to assume no NaNs, and other assumptions and approximations), compilers will optimize either function to a single minsd. https://godbolt.org/z/a7oK91

For GCC, see https://gcc.gnu.org/wiki/FloatingPointMath
clang supports similar options, including -ffast-math as a catch-all.

Some of those options should be enabled by almost everyone, except for weird legacy codebases, e.g. -fno-math-errno. (See this Q&A for more about recommended math optimizations). And gcc -fno-trapping-math is a good idea because it doesn't fully work anyway, despite being on by default (some optimizations can still change the number of FP exceptions that would be raised if exceptions were unmasked, including sometimes even from 1 to 0 or 0 to non-zero, IIRC). gcc -ftrapping-math also blocks some optimizations that are 100% safe even wrt. exception semantics, so it's pretty bad. In code that doesn't use fenv.h, you'll never know the difference.

But treating std::min as commutative can only be accomplished with options that assume no NaNs, and stuff like that, so definitely can't be called "safe" for code that cares about exactly what happens with NaNs. e.g. -ffinite-math-only assumes no NaNs (and no infinities)

clang -funsafe-math-optimizations -ffinite-math-only will do the optimization you're looking for. (unsafe-math-optimizations implies a bunch of more specific options, including not caring about signed zero semantics).

Some of the details that `-ffast-math` ignores are not that subtle. I was unpleasantly surprised that it optimizes `isnan()` to `false`: https://godbolt.org/z/zs31Yn — jpa, Sep 27 '20 at 05:54
@jpa: Yeah, "subtle" wasn't a great description, except for code where NaN is something that doesn't happen under normal conditions and you aren't trying to handle it. Edited to be more specific. — Peter Cordes, Sep 27 '20 at 06:18
Note that the need for `movapd` is fixed with `-mavx` option (assuming the target CPU supports AVX), since AVX adds non-destructive source (3-operand) coding of the instructions. — Ruslan, Sep 27 '20 at 18:54
@Ruslan: True. You could still invent cases where it costs a extra instruction, though, e.g. where one operand is memory like `x = std::min(array[i], x)`. Even AVX would need a separate load in a loop instead of a memory source operand because only the 2nd source can be memory. (And it couldn't auto-vectorize with a horizontal min at the end: `std::min` isn't associative either for FP). — Peter Cordes, Sep 27 '20 at 19:07

Artyer · Answer 2 · 2020-09-26T22:31:37.070

14

Consider: std::signbit(std::min(+0.0, -0.0)) == false && std::signbit(std::min(-0.0, +0.0)) == true.

The only other difference is if both arguments are (possibly different) NaNs, the second argument should be returned.

You can allow gcc to reorder the arguments by using the -funsafe-math-optimizations -fno-math-errno optimsations (Both enabled by -ffast-math). unsafe-math-optimizations allows the compiler to not care about signed zero, and finite-math-only to not care about NaNs

edited Sep 26 '20 at 22:31

answered Sep 26 '20 at 21:34

Artyer

15,829
2
29
51

1

`-fno-math-errno` should be irrelevant here; `std::min` is a C++ STL template function, *not* a C math library function that historically had errno-setting on NaN semantics like `fmin` or `sqrt`. `-fno-math-errno` is always a good idea, though, except in historical codebases that check `errno` instead of using `fenv.h`. – Peter Cordes Sep 27 '20 at 17:20
1

Should be but apparently isn't, with clang9.0. https://godbolt.org/z/4bdvMa shows that `-funsafe-math-optimizations` alone doesn't do it, but adding `-fno-math-errno` does optimize it as commutative. This may be a clang9 bug, perhaps also implying a no-NaN assumption that unsafe-math doesn't include? With clang 10.0 and with GCC, unsafe-math + no-math-errno still preserves the operand-order difference: some other part of fast-math makes the difference. – Peter Cordes Sep 27 '20 at 20:12

score 5 · Answer 3 · edited Sep 27 '20 at 18:19

To expand on the existing answers that say std::min isn't commutative: Here's a concrete example that reliably distinguishes std_min_xy from std_min_yx. Godbolt:

bool distinguish1() {
    return 1 / std_min_xy(0.0, -0.0) > 0.0;
}
bool distinguish2() {
    return 1 / std_min_yx(0.0, -0.0) > 0.0;
}

distinguish1() evaluates to 1 / 0.0 > 0.0, i.e. INFTY > 0.0, or true.
distinguish2() evaluates to 1 / -0.0 > 0.0, i.e. -INFTY > 0.0, or false.
(All this under IEEE rules, of course. I don't think the C++ standard mandates that compilers preserve this particular behavior. Honestly I was surprised that the expression -0.0 actually evaluated to a negative zero in the first place!

-ffinite-math-only eliminates this way of telling the difference, and -ffinite-math-only -funsafe-math-optimizations completely eliminates the difference in codegen.

Argument order to std::min changes compiler output for floating-point

3 Answers3

Linked