60

There are cases where you know that a certain floating-point expression will always be non-negative. For example, when computing the length of a vector, one does sqrt(a[0]*a[0] + ... + a[N-1]*a[N-1]) (NB: I am aware of std::hypot, this is not relevant to the question), and the expression under the square root is clearly non-negative. However, GCC outputs the following assembly for sqrt(x*x):

        mulss   xmm0, xmm0
        pxor    xmm1, xmm1
        ucomiss xmm1, xmm0
        ja      .L10
        sqrtss  xmm0, xmm0
        ret
.L10:
        jmp     sqrtf

That is, it compares the result of x*x to zero, and if the result is non-negative, it does the sqrtss instruction, otherwise it calls sqrtf.

So, my question is: how can I force GCC into assuming that x*x is always non-negative so that it skips the comparison and the sqrtf call, without writing inline assembly?

I wish to emphasize that I am interested in a local solution, and not doing things like -ffast-math, -fno-math-errno, or -ffinite-math-only (though these do indeed solve the issue, thanks to ks1322, harold, and Eric Postpischil in the comments).

Furthemore, "force GCC into assuming x*x is non-negative" should be interpreted as assert(x*x >= 0.f), so this also excludes the case of x*x being NaN.

I am OK with compiler-specific, platform-specific, CPU-specific, etc. solutions.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
lisyarus
  • 13,729
  • 3
  • 40
  • 61
  • i would have expected that `x*x` is convincing enough. Fwiw, also placing the return inside a `if (x*x > 0)` wont change the assembly much: https://godbolt.org/z/5im2-z – 463035818_is_not_a_number Aug 27 '19 at 11:42
  • 12
    `x*x` is not necessarily zero or positive. It may be a NaN. I am not sure that is what GCC is dealing with here, though. – Eric Postpischil Aug 27 '19 at 11:43
  • 8
    `-fno-math-errno` is the safer option that also removes the call to `sqrtf` – harold Aug 27 '19 at 11:44
  • 1
    @EricPostpischil Sure! I still want to force the compiler into thinking it is not NaN, though. – lisyarus Aug 27 '19 at 11:46
  • @ks1322 This indeed does work, but is too harsh, probably. I would be happy with a more specific solution. – lisyarus Aug 27 '19 at 11:46
  • @harold This also works, thank you. Still, I'd want a more specific solution than disabling math-errno entirely. – lisyarus Aug 27 '19 at 11:48
  • 1
    If `sqrtf` sets `errno` upon an error, then GCC is correct not to use `sqrtss` for the NaN case. (Whether math routines set errno is implementation-dependent according to the C standard, and C++ inherits that.) – Eric Postpischil Aug 27 '19 at 11:48
  • 6
    Adding `-ffinite-math-only` tells GCC it can assume there are no infinities or NaNs. Using this eliminates the branch and the call to `sqrtf`. Since infinity is not an error for `sqrtf`, this confirms GCC’s concern in the sample code in the question is a NaN. Unfortunately, I do not see a switch to just say assume no NaNs, rather than assume no NaNs or infinities, and inserting `if (std::isnan(x)) return x;` before the `sqrt` does not result in GCC recognizing `x*x` cannot be a NaN. – Eric Postpischil Aug 27 '19 at 12:01
  • @EricPostpischil: If you allow infinities but not NaN, what should `inf - inf` and `inf / inf` evaluate to? – dan04 Aug 27 '19 at 19:52
  • 4
    @dan04: The switch does not say you cannot have NaNs; it says the compiler may assume there are no NaNs. So then it is your responsibility to avoid NaNs or suffer the consequences. If you evaluated the quotient of two infinities, the subsequent code might have been optimized with the assumption that a NaN was not produced, so it might go down the wrong path, for example. – Eric Postpischil Aug 27 '19 at 20:21
  • Isn't the usual way to avoid compiler-generated inefficiencies to simply write manual assembler code? – Peter - Reinstate Monica Aug 28 '19 at 12:18
  • @PeterA.Schneider Not really. There are sometimes many ways you can help the compiler - writing the code differently, using intrinsics, etc. – lisyarus Aug 28 '19 at 12:22

4 Answers4

50

You can write assert(x*x >= 0.f) as a compile-time promise instead of a runtime check as follows in GNU C:

#include <cmath>

float test1 (float x)
{
    float tmp = x*x;
    if (!(tmp >= 0.0f)) 
        __builtin_unreachable();    
    return std::sqrt(tmp);
}

(related: What optimizations does __builtin_unreachable facilitate? You could also wrap if(!x)__builtin_unreachable() in a macro and call it promise() or something.)

But gcc doesn't know how to take advantage of that promise that tmp is non-NaN and non-negative. We still get (Godbolt) the same canned asm sequence that checks for x>=0 and otherwise calls sqrtf to set errno. Presumably that expansion into a compare-and-branch happens after other optimization passes, so it doesn't help for the compiler to know more.

This is a missed-optimization in the logic that speculatively inlines sqrt when -fmath-errno is enabled (on by default unfortunately).

What you want instead is -fno-math-errno, which is safe globally

This is 100% safe if you don't rely on math functions ever setting errno. Nobody wants that, that's what NaN propagation and/or sticky flags that record masked FP exceptions are for. e.g. C99/C++11 fenv access via #pragma STDC FENV_ACCESS ON and then functions like fetestexcept(). See the example in feclearexcept which shows using it to detect division by zero.

The FP environment is part of thread context while errno is global.

Support for this obsolete misfeature is not free; you should just turn it off unless you have old code that was written to use it. Don't use it in new code: use fenv. Ideally support for -fmath-errno would be as cheap as possible but the rarity of anyone actually using __builtin_unreachable() or other things to rule out a NaN input presumably made it not worth developer's time to implement the optimization. Still, you could report a missed-optimization bug if you wanted.

Real-world FPU hardware does in fact have these sticky flags that stay set until cleared, e.g. x86's mxcsr status/control register for SSE/AVX math, or hardware FPUs in other ISAs. On hardware where the FPU can detect exceptions, a quality C++ implementation will support stuff like fetestexcept(). And if not, then math-errno probably doesn't work either.

errno for math was an old obsolete design that C / C++ is still stuck with by default, and is now widely considered a bad idea. It makes it harder for compilers to inline math functions efficiently. Or maybe we're not as stuck with it as I thought: Why errno is not set to EDOM even sqrt takes out of domain arguement? explains that setting errno in math functions is optional in ISO C11, and an implementation can indicate whether they do it or not. Presumably in C++ as well.

It's a big mistake to lump -fno-math-errno in with value-changing optimizations like -ffast-math or -ffinite-math-only. You should strongly consider enabling it globally, or at least for the whole file containing this function.

float test2 (float x)
{
    return std::sqrt(x*x);
}
# g++ -fno-math-errno -std=gnu++17 -O3
test2(float):   # and test1 is the same
        mulss   xmm0, xmm0
        sqrtss  xmm0, xmm0
        ret

You might as well use -fno-trapping-math as well, if you aren't ever going to unmask any FP exceptions with feenableexcept(). (Although that option isn't required for this optimization, it's only the errno-setting crap that's a problem here.).

-fno-trapping-math doesn't assume no-NaN or anything, it only assumes that FP exceptions like Invalid or Inexact won't ever actually invoke a signal handler instead of producing NaN or a rounded result. -ftrapping-math is the default but it's broken and "never worked" according to GCC dev Marc Glisse. (Even with it on, GCC does some optimizations which can change the number of exceptions that would be raised from zero to non-zero or vice versa. And it blocks some safe optimizations). But unfortunately, https://gcc.gnu.org/bugzilla/show_bug.cgi?id=54192 (make it off by default) is still open.

If you actually ever did unmask exceptions, it might be better to have -ftrapping-math, but again it's very rare that you'd ever want that instead of just checking flags after some math operations, or checking for NaN. And it doesn't actually preserve exact exception semantics anyway.

See SIMD for float threshold operation for a case where -fno-trapping-math incorrectly blocks a safe optimization. (Even after hoisting a potentially-trapping operation so the C does it unconditionally, gcc makes non-vectorized asm that does it conditionally! So not only does it block vectorization, it changes the exception semantics vs. the C abstract machine.)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • 3
    `assert(x*x >= 0.f)` won't get into preprocessed code in release mode (with `NDEBUG` defined). – Ruslan Aug 27 '19 at 20:54
  • 2
    @Ruslan: I can't think of a way to word that first sentence that's as clear and easy to read while avoiding implying that `assert()` is *always* a runtime check instead of sometimes nothing at all. :/ I'm just going to leave it. I guess I could put a footnote inside the answer, but if anyone else is bothered by me glossing over that, upvote Ruslan's comment :) – Peter Cordes Aug 28 '19 at 03:55
  • *Real-world FPU hardware does in fact have these sticky flags that stay set until cleared* C is supposed to be portable and you can't be sure that code will always be run on hardware that behaves like this. You can't know for sure that the code ends up running on systems with different CPUs, older CPUs or that future CPUs won't change the behavior you are expecting. – StephenG Aug 28 '19 at 11:25
  • 1
    @StephenG: In C, `#pragma STDC FENV_ACCESS ON` support and associated [fenv stuff](https://en.cppreference.com/w/cpp/numeric/fenv) is (I think) required at least in theory as part of an ISO C implementation. My point was that most C++ implementations also support it in practice, thanks to widespread hardware (or soft-fp) support. Those flags aren't a special or recent hardware features, it's standard on mainstream FPU hardware. (e.g. on x86, it's part of x87 since 8087, and also SSE.) And of course also in non-x86 ISAs. Anyway, that's why there is a *portable* ISO C way to access it! – Peter Cordes Aug 28 '19 at 12:10
  • 1
    @StephenG: More importantly, I doubt math errno is supported on C or C++ implementations on systems without FPU flags, like maybe some software FP. If the HW or SW can detect FP exceptions, a quality implementation will expose that via `fenv`. Therefore your point isn't a reason for code to check `errno` after math functions, or to avoid `-fno-math-errno`. – Peter Cordes Aug 28 '19 at 12:11
  • 2
    @StephenG -- outside of a few rather unconventional environments (Cell vector processors, some earlier GPGPUs, and prehistoric things that most folks won't ever touch), IEEE 754 support (which is what gives you the sticky flags etc) *will* be present – LThode Aug 28 '19 at 13:46
  • While I'm not quite satisfied with any of the answers (which are still awesome, though), I'll accept this one due to being the most thorough and involved. – lisyarus Aug 29 '19 at 14:25
  • 1
    @lisyarus: the only remedy would be to report the missed-optimization gcc bug and hope that `sqrt()` expansion for `-fmath-errno` gets more logic to skip the runtime check when it can be checked at compile time. I'm pretty sure there's nothing better you can do with current gcc. (since enabling `-fno-math-errno` on a per-function basis with pragmas or attributes doesn't seem to work). Is there any specific reason you find using `-fno-math-errno` unsatisfactory? – Peter Cordes Aug 29 '19 at 14:42
  • @PeterCordes Thank you, I was thinking about reporting a bug too. I actually like the idea of using `-fno-math-errno` in general; from my floating-point experience, checking `errno` & trying to recover from floating-point errors is pretty much useless. However, in cases like this I usually hope to convince the compiler to do The Right Thing™ just tweaking the code, without any invasive actions. – lisyarus Aug 29 '19 at 15:07
  • 1
    @lisyarus: the real point of this answer is that checking `errno` is a terrible way to detect math errors in the first place. Use `fetestexcept` instead if you ever want it for any reason (including reporting and aborting, not just trying to recover). FPU flags (the FP environment) is part of thread context while `errno` is global. `-fno-math-errno` isn't "invasive", it just turns off a backwards-compatibility feature that you aren't using and should never use. Support for this misfeature is not free, and spending effort making it cheaper in some rare cases has low benefit vs. turning off. – Peter Cordes Aug 29 '19 at 15:25
  • 1
    I filed a bug on GCC bugzilla: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91645. After a bit of discussion, they've found a version that ***does*** the optimization: namely, one has to do `if (std::isless(tmp, 0.f)) __builtin_unreachable();`. – lisyarus Sep 05 '19 at 09:10
  • This answer blurs the distinction between signalling and quiet NaNs. Please note that [POSIX `sqrt`](https://pubs.opengroup.org/onlinepubs/9699919799/functions/sqrt.html) should not set `errno` when passed a quiet NaN, the only type of NaN that could arise in this example. The story is more complicated for signalling NaNs: [glibc](http://www.gnu.org/software/libc/manual/html_node/Math-Error-Reporting.html) again doesn't set `errno`, while [POSIX](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1011.htm) optionally supports setting `errno`. – nknight Oct 15 '20 at 20:13
11

Pass the option -fno-math-errno to gcc. This fixes the problem without making your code unportable or leaving the realm of ISO/IEC 9899:2011 (C11).

What this option does is not attempting to set errno when a math library function fails:

       -fno-math-errno
           Do not set "errno" after calling math functions that are executed
           with a single instruction, e.g., "sqrt".  A program that relies on
           IEEE exceptions for math error handling may want to use this flag
           for speed while maintaining IEEE arithmetic compatibility.

           This option is not turned on by any -O option since it can result
           in incorrect output for programs that depend on an exact
           implementation of IEEE or ISO rules/specifications for math
           functions. It may, however, yield faster code for programs that do
           not require the guarantees of these specifications.

           The default is -fmath-errno.

           On Darwin systems, the math library never sets "errno".  There is
           therefore no reason for the compiler to consider the possibility
           that it might, and -fno-math-errno is the default.

Given that you don't seem to be particularly interested in math routines setting errno, this seems like a good solution.

fuz
  • 76,641
  • 24
  • 165
  • 316
  • Thank you for your effort, but I specifically stated in the question that compiler options (and `-fno-math-errno` in particular) are not an option; I want an ad-hoc solution for a specific case. – lisyarus Aug 27 '19 at 12:17
  • @lisyarus Sorry, seems like I have missed this. I think you can set this option using an `__attribute__` for just a single function. Would this solve your problem? – fuz Aug 27 '19 at 12:18
  • It seems like a thing I would be happy with! However, I have no idea on how to put `no-math-errno` into a function attribute. – lisyarus Aug 27 '19 at 12:20
  • 5
    @lisyarus It should work with `__attribute__((optimize ("no-math-errno")))` or `#pragma GCC optimize ("no-math-errno")` but I couldn't get either to work. Weird. – fuz Aug 27 '19 at 12:27
  • Maybe I'll file a bug on this, too. – lisyarus Aug 29 '19 at 15:17
  • @lisyarus Let me know when you did. – fuz Sep 03 '19 at 11:18
5

Without any global options, here is a (low-overhead, but not free) way to get a square root with no branch:

#include <immintrin.h>

float test(float x)
{
    return _mm_cvtss_f32(_mm_sqrt_ss(_mm_set1_ps(x * x)));
}

(on godbolt)

As usual, Clang is smart about its shuffles. GCC and MSVC lag behind in that area, and don't manage to avoid the broadcast. MSVC is doing some mysterious moves as well..

There are other ways to turn a float into an __m128, for example _mm_set_ss. For Clang that makes no difference, for GCC that makes the code a little bigger and worse (including a movss reg, reg which counts as a shuffle on Intel, so it doesn't even save on shuffles).

harold
  • 53,069
  • 5
  • 75
  • 140
  • Thank you! Not sure whether I am ok with directly calling SSE intrinsics (it's the compiler's job, right?), but this is still an interesting way to do it. – lisyarus Aug 27 '19 at 12:19
  • @lisyarus well the point of their existence is so you can use them when the compiler could not, so this seems like a fine (but perhaps unusual) use-case to me – harold Aug 27 '19 at 12:23
  • Right, but i want the code to still work as expected on, say, platforms that don't have SSE support, probably. – lisyarus Aug 27 '19 at 12:25
  • @lisyarus so a Pentium 2? Or a different ISA like ARM? – harold Aug 27 '19 at 12:27
  • ARM probably, yes. But the point is that I want just to help the compiler in optimization, not do it entirely myself. – lisyarus Aug 27 '19 at 12:28
  • @lisyarus OK I can see that, I don't know how to do it then, sorry – harold Aug 27 '19 at 12:30
4

After about a week, I asked on the matter on GCC Bugzilla & they've provided a solution which is the closest to what I had in mind

float test (float x)
{
    float y = x*x;
    if (std::isless(y, 0.f))
        __builtin_unreachable();
    return std::sqrt(y);
}

that compiles to the following assembly:

test(float):
    mulss   xmm0, xmm0
    sqrtss  xmm0, xmm0
    ret

I'm still not quite sure what exactly happens here, though.

lisyarus
  • 13,729
  • 3
  • 40
  • 61