2

I had prepared two sample code for showing thread having int variable calculation is faster than thread having double variable.

Only difference between two code is, in first i am using only integers and in other i am using only double.

Time difference between them is almost 30%.

Reason might be very simple/basic, but can anyone please give me the possible reason(s)?

Note: please ignore the logic of the code, because it is just prepared for demo.

Using integer :

    #include <stdio.h>
    #include <pthread.h>

    pthread_t pth1,pth2,pth3,pth4;

    void *threadfunc1(void *parm)
    {
        int i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 1\n");
        return NULL ;
    }
    void *threadfunc2(void *parm)
    {
        int i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 2\n");
        return NULL ;
    }


    int main () {
        pthread_create(&pth1, NULL, threadfunc1, "foo");
        pthread_create(&pth2, NULL, threadfunc2, "foo");
        pthread_join( pth1, NULL);
        pthread_join( pth2, NULL);
        return 1;
    }

Using double:

    #include <stdio.h>
    #include <pthread.h>

    pthread_t pth1,pth2,pth3,pth4;

    void *threadfunc1(void *parm)
    {
        double i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 1\n");
        return NULL ;
    }
    void *threadfunc2(void *parm)
    {
        double i,j,k,l;
        j = 0;
        k = 0;
        l = 5;
        for (i = 0; i < 5000000; i ++) {
            j = k + 152;
            k = j + 21;
            l = j + k + (j * 5) + (k * 2) + (l * 3);
            j = k + ((l + j)/ k) + j + k + (l / k);
            j = 0;
            k = 0;
            l = 5;
        }
        printf("Completed Thread 2\n");
        return NULL ;
    }


    int main () {
        pthread_create(&pth1, NULL, threadfunc1, "foo");
        pthread_create(&pth2, NULL, threadfunc2, "foo");
        pthread_join( pth1, NULL);
        pthread_join( pth2, NULL);
        return 1;
    }
Vishwadeep Singh
  • 1,025
  • 13
  • 35

2 Answers2

2

This difference is because of usage of floating point. For example, have a look at the following simple program:

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char *argv[]) {
  TYPE i,s=0;

  for (i = 0; i < 100; i++) {
    s += i;
  }

  printf("Sum=%d\n", s);
  return 0;
}

Compile it with gcc -o main main.c and have a look on its main() function disassembly for TYPE defined as fixed (left) and double (right): fixed vs float, no optimization Arrows show for(){} loop from main. Target is X86 processor.

For gcc -O3 -o main main.c fixed point still wins: enter image description here

Thus fixed point is more preferable for high speed computations if algorithm allows its usage. And this situation remains almost the same if double is replaced with a float.

Moreover some processors have no floating point at all and use special optimized emulation libraries (for instance - TI C64x+ family). In that case difference between performance of fixed and floating point will ~10x.

Michael
  • 1,475
  • 14
  • 26
  • Why do you have to use legacy target? – Aki Suihkonen Dec 02 '13 at 08:26
  • @AkiSuihkonen I meant that I used x86 compatible processor, not exactly 8086 or similar processor. You think it will be better to correct it? – Michael Dec 02 '13 at 08:37
  • The stack based FP processor is probably slower (having 80 bit internal precision) than its xmm based counterpart. – Aki Suihkonen Dec 02 '13 at 08:46
  • @AkiSuihkonen Undoubtedly, `SSE` can be advantageous, but does compiler itself can add it to program code? I think [instrinsics](http://www.cs.uaf.edu/2009/fall/cs301/lecture/11_13_sse_intrinsics.html) should be used to do that. In addition it is not very comfortable to use it in a loops like in the question. – Michael Dec 02 '13 at 09:08
  • My gcc 4.6.3 on x64 produces SSE instructions by default. – Aki Suihkonen Dec 02 '13 at 09:33
  • @AkiSuihkonen My gcc-4.8.1 needs flag -mfpmath=sse and -march=corei7 to start generate them. By default - no SSE (Ubuntu 13.10/32bits). You're right - it becomes little faster: 0m3.246s(no SSE) vs 0m2.981s(SSE). But fixed point still wins: 0m2.599s. – Michael Dec 02 '13 at 10:12
0

Floating point arithmetic operations take more CPU cycles than integers, the HW is much (much much) more complex.

This has nothing to do with threads.

Also most processors have more parallel execution resources for integers than they have for floating point as integer operations are used more than floating point in general.

egur
  • 7,470
  • 2
  • 25
  • 47