1

i just want to learn why fast inversion algorithm is slower than math.h sqrt function. Here is my code sample

Code tries to demonstrate compare slow inversion and fast inversion. While debugging i see 1 seconds for slow inversion and 4 second for fast inversion. Where is the problem?

    #include<stdio.h>
    #include<time.h>
    #include<math.h>
    #include"inverse.h"

    #define SIZE 256

    int main()
    {
       char buffer[SIZE];
       time_t curtime;
       time_t curtime2;
       struct tm *loctime;
       int i = 0;
       float x = 0;

       curtime = time(NULL);
       loctime = localtime (&curtime);
       fputs (asctime (loctime), stdout);

       while(i < 100000000)
       {
          i++;
          //x = 1/sqrt(465464.015465);
          x = inverse_square_root(465464.015465);
       }

       curtime = time(NULL);
       loctime = localtime (&curtime);
       fputs (asctime (loctime), stdout);

       getchar();
       return 0;
    }

    float inverse_square_root(float number)
    {
       long i;
       float x2, y;
       const float threehalfs = 1.5F;

       x2 = number * 0.5F;
       y  = number;
       i  = * ( long * ) &y;             // evil floating point bit level hacking
       i  = 0x5f3759df - ( i >> 1 );     // what the heck?
       y  = * ( float * ) &i;
       y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st iteration
    // y  = y * ( threehalfs - ( x2 * y * y ) );   // 2nd iteration, this can be removed
       return y;
    }
Robert Harvey
  • 168,684
  • 43
  • 314
  • 475
  • 1
    Have you tried looking at the assembler code generated for both versions? Floating point libraries are often heavily optimized, and use hardware support. – Chris J. Kiick Mar 14 '14 at 15:56

3 Answers3

2

"The problem" might be that you have hardware that implements sqrt() now, making it faster than a software approach. It's hard to tell without much more detail about your system and perhaps some profiling and disassembly data.

See this answer for details about the number of cycles for the x86 fsqrt instruction, for instance.

Community
  • 1
  • 1
unwind
  • 364,555
  • 61
  • 449
  • 578
  • On unix/linux, you can compile to assembly-only by doing `gcc -S your_source_file.c`, or disassemble the final executable with `objdump -d your_executable`. You can then check the assembly listing to see exactly what hardware instructions are being used. – laindir Mar 14 '14 at 15:54
1

Contrary to this question, sqrt or inverse sqrt may have been optimized in CPU level.
Further: have you benchmarked the code with highest level of optimizations?

The odd magic constant exploits the representation of 32-bit IEEE floating point, extracting a good initial approximation for the Newton's iteration.

Community
  • 1
  • 1
Aki Suihkonen
  • 15,929
  • 1
  • 30
  • 50
1

If you really want to demonstrate 'slow' vs 'fast' you need to actually know what both algorithms do, since there's no special reason to think sqrt() is slow. Write your own slow_sqrt function.

user3303729
  • 284
  • 1
  • 6