Why thread having int variable calculation is faster than thread having double variable?

Question

I had prepared two sample code for showing thread having int variable calculation is faster than thread having double variable.

Only difference between two code is, in first i am using only integers and in other i am using only double.

Time difference between them is almost 30%.

Reason might be very simple/basic, but can anyone please give me the possible reason(s)?

Note: please ignore the logic of the code, because it is just prepared for demo.

Using integer :

    #include <stdio.h>
    #include <pthread.h>

    pthread_t pth1,pth2,pth3,pth4;

    void *threadfunc1(void *parm)
    {
        int i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 1\n");
        return NULL ;
    }
    void *threadfunc2(void *parm)
    {
        int i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 2\n");
        return NULL ;
    }


    int main () {
        pthread_create(&pth1, NULL, threadfunc1, "foo");
        pthread_create(&pth2, NULL, threadfunc2, "foo");
        pthread_join( pth1, NULL);
        pthread_join( pth2, NULL);
        return 1;
    }

Using double:

    #include <stdio.h>
    #include <pthread.h>

    pthread_t pth1,pth2,pth3,pth4;

    void *threadfunc1(void *parm)
    {
        double i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 1\n");
        return NULL ;
    }
    void *threadfunc2(void *parm)
    {
        double i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 2\n");
        return NULL ;
    }


    int main () {
        pthread_create(&pth1, NULL, threadfunc1, "foo");
        pthread_create(&pth2, NULL, threadfunc2, "foo");
        pthread_join( pth1, NULL);
        pthread_join( pth2, NULL);
        return 1;
    }

This has nothing to do with threads. Floating-point operations are simply much slower than integral operations. — Jonathon Reinhart, Dec 02 '13 at 06:35
ok. I was wondering it is in the case of threads only. Let me get the benchmark for simple code without thread. thanks @JonathonReinhart — Vishwadeep Singh, Dec 02 '13 at 06:36
http://stackoverflow.com/questions/2550281/floating-point-vs-integer-calculations-on-modern-hardware — Jeyaram, Dec 02 '13 at 06:37
yes got that @JonathonReinhart and @ jeyaram found the difference in benchmark with simple code also. thanks — Vishwadeep Singh, Dec 02 '13 at 06:40
Aside from any inherent performance differences between integer and floating point operations, you're comparing apples and oranges -- the two programs do not compute the same thing. — R.. GitHub STOP HELPING ICE, Dec 02 '13 at 06:40
@R.. please elaborate more on "the two programs do not compute the same thing." I would like to learn more on this statement. — Vishwadeep Singh, Dec 02 '13 at 06:45
Unless `l + j` is a[n integer] multiple of `k`, the expression `(l + j)/ k` has completely different meaning when the types of `l`, `j`, and `k` are floating point types as opposed to integer types. — R.. GitHub STOP HELPING ICE, Dec 02 '13 at 06:47
hmmm.. right.. removing j = k + ((l + j)/ k) + j + k + (l / k); from all threads changed the time difference between both code. — Vishwadeep Singh, Dec 02 '13 at 06:57

score 2 · Accepted Answer · answered Dec 02 '13 at 08:12

2

This difference is because of usage of floating point. For example, have a look at the following simple program:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  TYPE i,s=0;

  for (i = 0; i < 100; i++) {
    s += i;
  }

  printf("Sum=%d\n", s);
  return 0;
}

Compile it with gcc -o main main.c and have a look on its main() function disassembly for TYPE defined as fixed (left) and double (right): fixed vs float, no optimization Arrows show for(){} loop from main. Target is X86 processor.

For gcc -O3 -o main main.c fixed point still wins: enter image description here

Thus fixed point is more preferable for high speed computations if algorithm allows its usage. And this situation remains almost the same if double is replaced with a float.

Moreover some processors have no floating point at all and use special optimized emulation libraries (for instance - TI C64x+ family). In that case difference between performance of fixed and floating point will ~10x.

answered Dec 02 '13 at 08:12

Michael

1,475
14
26

Why do you have to use legacy target? – Aki Suihkonen Dec 02 '13 at 08:26
@AkiSuihkonen I meant that I used x86 compatible processor, not exactly 8086 or similar processor. You think it will be better to correct it? – Michael Dec 02 '13 at 08:37
The stack based FP processor is probably slower (having 80 bit internal precision) than its xmm based counterpart. – Aki Suihkonen Dec 02 '13 at 08:46
@AkiSuihkonen Undoubtedly, `SSE` can be advantageous, but does compiler itself can add it to program code? I think [instrinsics](http://www.cs.uaf.edu/2009/fall/cs301/lecture/11_13_sse_intrinsics.html) should be used to do that. In addition it is not very comfortable to use it in a loops like in the question. – Michael Dec 02 '13 at 09:08
My gcc 4.6.3 on x64 produces SSE instructions by default. – Aki Suihkonen Dec 02 '13 at 09:33
@AkiSuihkonen My gcc-4.8.1 needs flag -mfpmath=sse and -march=corei7 to start generate them. By default - no SSE (Ubuntu 13.10/32bits). You're right - it becomes little faster: 0m3.246s(no SSE) vs 0m2.981s(SSE). But fixed point still wins: 0m2.599s. – Michael Dec 02 '13 at 10:12

score 0 · Answer 2 · answered Dec 02 '13 at 07:26

Floating point arithmetic operations take more CPU cycles than integers, the HW is much (much much) more complex.

This has nothing to do with threads.

Also most processors have more parallel execution resources for integers than they have for floating point as integer operations are used more than floating point in general.

Why thread having int variable calculation is faster than thread having double variable?

2 Answers2