fast inversion algorithm slower than math.h 1/sqrt function

Question

i just want to learn why fast inversion algorithm is slower than math.h sqrt function. Here is my code sample

Code tries to demonstrate compare slow inversion and fast inversion. While debugging i see 1 seconds for slow inversion and 4 second for fast inversion. Where is the problem?

    #include<stdio.h>
    #include<time.h>
    #include<math.h>
    #include"inverse.h"

    #define SIZE 256

    int main()
    {
       char buffer[SIZE];
       time_t curtime;
       time_t curtime2;
       struct tm *loctime;
       int i = 0;
       float x = 0;

       curtime = time(NULL);
       loctime = localtime (&curtime);
       fputs (asctime (loctime), stdout);

       while(i < 100000000)
       {
          i++;
          //x = 1/sqrt(465464.015465);
          x = inverse_square_root(465464.015465);
       }

       curtime = time(NULL);
       loctime = localtime (&curtime);
       fputs (asctime (loctime), stdout);

       getchar();
       return 0;
    }

    float inverse_square_root(float number)
    {
       long i;
       float x2, y;
       const float threehalfs = 1.5F;

       x2 = number * 0.5F;
       y  = number;
       i  = * ( long * ) &y;             // evil floating point bit level hacking
       i  = 0x5f3759df - ( i >> 1 );     // what the heck?
       y  = * ( float * ) &i;
       y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
    // y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
       return y;
    }

Have you tried looking at the assembler code generated for both versions? Floating point libraries are often heavily optimized, and use hardware support. — Chris J. Kiick, Mar 14 '14 at 15:56

score 2 · Accepted Answer · edited May 23 '17 at 12:13

2

"The problem" might be that you have hardware that implements sqrt() now, making it faster than a software approach. It's hard to tell without much more detail about your system and perhaps some profiling and disassembly data.

See this answer for details about the number of cycles for the x86 fsqrt instruction, for instance.

edited May 23 '17 at 12:13

Community

1
1

answered Mar 14 '14 at 15:51

unwind

364,555
61
449
578

On unix/linux, you can compile to assembly-only by doing `gcc -S your_source_file.c`, or disassemble the final executable with `objdump -d your_executable`. You can then check the assembly listing to see exactly what hardware instructions are being used. – laindir Mar 14 '14 at 15:54

score 1 · Answer 2 · edited May 23 '17 at 12:21

1

Contrary to this question, sqrt or inverse sqrt may have been optimized in CPU level.
Further: have you benchmarked the code with highest level of optimizations?

The odd magic constant exploits the representation of 32-bit IEEE floating point, extracting a good initial approximation for the Newton's iteration.

edited May 23 '17 at 12:21

Community

1
1

answered Mar 14 '14 at 15:54

Aki Suihkonen

15,929
1
30
50

In gcc, use e.g. `-march=native -ffast-math -O6 -S` – Aki Suihkonen Mar 14 '14 at 15:59

score 1 · Answer 3 · answered Mar 14 '14 at 17:08

1

If you really want to demonstrate 'slow' vs 'fast' you need to actually know what both algorithms do, since there's no special reason to think sqrt() is slow. Write your own slow_sqrt function.

answered Mar 14 '14 at 17:08

user3303729

284
1
6

fast inversion algorithm slower than math.h 1/sqrt function

3 Answers3