1

I have read a few posts (e.g., C++ built-in types), saying that for modern intel XEON CPU, there is no difference between using int32_t and using a double.

However, I have noticed that when I do vector multiplication,

std::vector<T> a, b, c;
// run some initialization
for( std::size_t i = 0; i < 1000000; ++i){
    c[i] = a[i] * b[i];
}  

if I set T as int32_t, this piece of code runs much faster than setting T to double.

I am running this on XEON E5620 + centOS

Can anyone clarify a bit here? Is using int32_t faster or not?

SovietFrontier
  • 1,684
  • 11
  • 27
user152503
  • 381
  • 1
  • 13

2 Answers2

3

You're running a million multiplications, using 2 million inputs and 1 million outputs. With 4 byte values, that's 12 MB. With 8 byte values, that's 24MB. The E5620 has 12 MB cache.

MSalters
  • 159,923
  • 8
  • 140
  • 320
2

This is the result from my cpu;

Intel(R) Core(TM) i5-8250U CPU @ 1.60GHz gcc 7.3

pure gcc, no optimization

short add/sub: 1.586071 [0]
short mul/div: 5.601069 [1]
long add/sub: 1.659803 [0]    
long mul/div: 8.145207 [0] 
long long add/sub: 1.826622 [0]    
long long mul/div: 8.161891 [0]  
float add/sub: 2.685403 [0]    
float mul/div: 3.758135 [0]
double add/sub: 2.662717 [0]
double mul/div: 4.189572 [0]

with gcc -O3

short add/sub: 0.000001 [0]
short mul/div: 4.491903 [1]
long add/sub: 0.000000 [0]
long mul/div: 6.535028 [0]
long long add/sub: 0.000000 [0]
long long mul/div: 6.543064 [0]
float add/sub: 1.182737 [0]
float mul/div: 2.218142 [0]
double add/sub: 1.183991 [0]
double mul/div: 2.529001 [0]

The result really depends on your architecture and the optimization. I remember that, there was an IBM Sparc workstation 20 years ago in my University that has better floating performance than integers.

Please read this nice talk;

xskxzr
  • 10,946
  • 11
  • 32
  • 70
kelalaka
  • 4,046
  • 4
  • 22
  • 39