Why does clang generate rsqrt, if stack-protector is turned on?

Question

Check out this simple code:

#include <cmath>

float foo(float in) {
    return sqrtf(in);
}

With -ffast-math, clang generates sqrtss, as it is expected. But, if I use -fstack-protector-all as well, it changes sqrtss to rsqrtss, as you can see at godbolt. Why?

Related to https://stackoverflow.com/questions/1528727/why-is-sse-scalar-sqrtx-slower-than-rsqrtx-x? — Matthieu Brucher, Dec 17 '18 at 21:16
@TypeIA: that is more-or-less understandable. I just don't get it, what does stack-protection has to do with sqrt. — geza, Dec 17 '18 at 21:18
@MatthieuBrucher I think you're right, I was just about to post that link myself. It makes sense that `-ffast-math -O3` would select the optimization. I guess the question is really why `-ffast-math -fstack-protector-all` does too. — TypeIA, Dec 17 '18 at 21:19

score 0 · Answer 1 · answered Dec 18 '18 at 00:57

The short and sweet:

rsqrtss is safer and, as a result, less accurate and slower.

sqrtss is faster and, as a result, less safe.

Why is rsqrtss safer?

It doesn't use the whole XMM register.

Why is rsqrtss slower?

Because it needs more registers to perform the same action as sqrtss.

Why does rsqrtss use a reciprocal?

In a pinch, it seems that the reciprocal of a square root can be calculated faster and with less memory. Pico-spelenda: Lots of math.

The long and bitter:

Research

What does -ffast-math do?

-ffast-math
    Enable fast-math mode. This defines the __FAST_MATH__ preprocessor
    macro, and lets the compiler make aggressive, potentially-lossy
    assumptions about floating-point math. These include:

    Floating-point math obeys regular algebraic rules for real numbers (e.g. + and * are associative, x/y == x * (1/y), and (a + b) * c == a * c + b * c),
    operands to floating-point operations are not equal to NaN and Inf, and
    +0 and -0 are interchangeable.

What does -fstack-protector-all do?
- This answer can be found here.
- Basically, it "forces the usage of stack protectors for all functions".
What is a "stack protector"?
- A nice article for you.
- The blissfully short, quite terribly succient sparknotes is:
  - A "stack protector" is used to prevent exploitation of stack overwrites. the stack protector as implemented in gcc and clang adds an additional guard variable to each function’s stack area.
- Interesting Drawback To Note:
  
  "Adding these checks will lead to a little runtime overhead: More stack space is needed, but that is negligible except for really constrained systems...Do you aim for maximum security at the cost of performance? -fstack-protector-all is for you."

What is sqrtss?

According to @godbolt:

    Computes the square root of the low single-precision floating-point value
    in the second source operand and stores the single-precision floating-point
    result in the destination operand. The second source operand can be an XMM
    register or a 32-bit memory location. The first source and destination
    operands is an XMM register.

What is a "source operand"?
- A tutorial can be found here
- In essence, an operand is a location of data in a computer. Imagine the simple instruction of x+x=y.You need to know what 'x' is, which is the source operand. And where the result will be stored, 'y', which is the destination operand. Notice how the '+' symbol, which is commonly called an 'operation' can be forgotten, because it doesn't matter in this example.
What is an "XMM register"?
- An explanation can be found here.
- It's just a specific type of register. It's primarily used in floating math ( which, surpisingly enough, is the math you are trying to do ).

What is rsqrtss?

Again, according to @godbolt:

Computes an approximate reciprocal of the square root of the low
single-precision floating-point value in the source operand (second operand)
stores the single-precision floating-point result in the destination operand.
The source operand can be an XMM register or a 32-bit memory location. The
destination operand is an XMM register. The three high-order doublewords of
the destination operand remain unchanged. See Figure 10-6 in the Intel® 64 and
IA-32 Architectures Software Developer’s Manual, Volume 1, for an illustration
of a scalar single-precision floating-point operation.

What is a "doubleword"?
- A simple definition.
- It is a unit of measurement of computer memory, just like 'bit' or 'byte'. However, unlike 'bit' or 'byte', it is not universal and depends on the architectures of the computer.
What does "Figure 10-6 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1" look like?
- Here you go.

Disclaimer: Most of this knowlegde comes from outside sources. I literally install clang just now to help answer your question. I'm not an expert.

Welcome to stackoverflow! :) You wrote a lot of things, which are "around" the subject, but the my question remains to be answered. And there are some inaccuracies in your answer as well (like the difference between `sqrtss` and `rsqrtss`). — geza, Dec 18 '18 at 02:08

Why does clang generate rsqrt, if stack-protector is turned on?

1 Answers1

The short and sweet:

The long and bitter:

Research