2

Check out this simple code:

#include <cmath>

float foo(float in) {
    return sqrtf(in);
}

With -ffast-math, clang generates sqrtss, as it is expected. But, if I use -fstack-protector-all as well, it changes sqrtss to rsqrtss, as you can see at godbolt. Why?

geza
  • 26,117
  • 6
  • 47
  • 111
  • It generates `rsqrtss` with `-ffast-math -O3` also. – TypeIA Dec 17 '18 at 21:11
  • Related to https://stackoverflow.com/questions/1528727/why-is-sse-scalar-sqrtx-slower-than-rsqrtx-x? – Matthieu Brucher Dec 17 '18 at 21:16
  • 1
    @TypeIA: that is more-or-less understandable. I just don't get it, what does stack-protection has to do with sqrt. – geza Dec 17 '18 at 21:18
  • @MatthieuBrucher I think you're right, I was just about to post that link myself. It makes sense that `-ffast-math -O3` would select the optimization. I guess the question is really why `-ffast-math -fstack-protector-all` does too. – TypeIA Dec 17 '18 at 21:19
  • Probably one of the heuristic inside LLVM... – Matthieu Brucher Dec 17 '18 at 21:19

1 Answers1

0

The short and sweet:

rsqrtss is safer and, as a result, less accurate and slower.

sqrtss is faster and, as a result, less safe.

Why is rsqrtss safer?

  • It doesn't use the whole XMM register.

Why is rsqrtss slower?

  • Because it needs more registers to perform the same action as sqrtss.

Why does rsqrtss use a reciprocal?

  • In a pinch, it seems that the reciprocal of a square root can be calculated faster and with less memory. Pico-spelenda: Lots of math.

The long and bitter:

Research

  • What does -ffast-math do?

    -ffast-math
        Enable fast-math mode. This defines the __FAST_MATH__ preprocessor
        macro, and lets the compiler make aggressive, potentially-lossy
        assumptions about floating-point math. These include:
    
        Floating-point math obeys regular algebraic rules for real numbers (e.g. + and * are associative, x/y == x * (1/y), and (a + b) * c == a * c + b * c),
        operands to floating-point operations are not equal to NaN and Inf, and
        +0 and -0 are interchangeable.
    
  • What does -fstack-protector-all do?

    • This answer can be found here.

    • Basically, it "forces the usage of stack protectors for all functions".

  • What is a "stack protector"?

    • A nice article for you.

    • The blissfully short, quite terribly succient sparknotes is:

      • A "stack protector" is used to prevent exploitation of stack overwrites. the stack protector as implemented in gcc and clang adds an additional guard variable to each function’s stack area.
    • Interesting Drawback To Note:

      "Adding these checks will lead to a little runtime overhead: More stack space is needed, but that is negligible except for really constrained systems...Do you aim for maximum security at the cost of performance? -fstack-protector-all is for you."

  • What is sqrtss?

    According to @godbolt:

        Computes the square root of the low single-precision floating-point value
        in the second source operand and stores the single-precision floating-point
        result in the destination operand. The second source operand can be an XMM
        register or a 32-bit memory location. The first source and destination
        operands is an XMM register.
    
  • What is a "source operand"?

    • A tutorial can be found here

    • In essence, an operand is a location of data in a computer. Imagine the simple instruction of x+x=y.You need to know what 'x' is, which is the source operand. And where the result will be stored, 'y', which is the destination operand. Notice how the '+' symbol, which is commonly called an 'operation' can be forgotten, because it doesn't matter in this example.

  • What is an "XMM register"?

    • An explanation can be found here.

    • It's just a specific type of register. It's primarily used in floating math ( which, surpisingly enough, is the math you are trying to do ).

  • What is rsqrtss?

    • Again, according to @godbolt:

      Computes an approximate reciprocal of the square root of the low
      single-precision floating-point value in the source operand (second operand)
      stores the single-precision floating-point result in the destination operand.
      The source operand can be an XMM register or a 32-bit memory location. The
      destination operand is an XMM register. The three high-order doublewords of
      the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and
      IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration
      of a scalar single-precision floating-point operation.
      
  • What is a "doubleword"?

    • A simple definition.

    • It is a unit of measurement of computer memory, just like 'bit' or 'byte'. However, unlike 'bit' or 'byte', it is not universal and depends on the architectures of the computer.

  • What does "Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1" look like?

    • Here you go.



Disclaimer: Most of this knowlegde comes from outside sources. I literally install clang just now to help answer your question. I'm not an expert.
  • Welcome to stackoverflow! :) You wrote a lot of things, which are "around" the subject, but the my question remains to be answered. And there are some inaccuracies in your answer as well (like the difference between `sqrtss` and `rsqrtss`). – geza Dec 18 '18 at 02:08