5

If one has to calculate a fraction of a given int value, say:

int j = 78;
int i = 5* j / 4;

Is this faster than doing:

int i = 1.25*j; // ?

If it is, is there a conversion factor one could use to decide which to use, as in how many int divisions can be done in the same time a one float multiplication?

Edit: I think the comments make it clear that the floating point math will be slower, but the question is, by how much? If I need to replace each float multiplication by N int divisions, for what N will this not be worth it anymore?

phuclv
  • 27,258
  • 11
  • 104
  • 360
nbubis
  • 1,872
  • 4
  • 26
  • 40
  • 9
    Have you benchmarked each one? – Mysticial Sep 27 '13 at 23:02
  • 3
    How many of those numbers are dynamic? – Kerrek SB Sep 27 '13 at 23:02
  • 1
    @KerrekSB is referring to the compiler optimizing all this for you. Hence why benchmarking is important. – Adam Sep 27 '13 at 23:03
  • @KerrekSB - all numbers are dynamic. The example is just an example, in reality these will be dynamic. – nbubis Sep 27 '13 at 23:04
  • I'd say if you have three integers `a`, `b` and `c` and you want to compute `a * b / c`, then you should write `int compute(int a, int b, int c) { return a * b / c; }`. I doubt you could do better. – Kerrek SB Sep 27 '13 at 23:07
  • 1
    This seems to be premature optimization, also note that `1.25` literal is actually `double`, not `float`. – LihO Sep 27 '13 at 23:08
  • 5
    The two alternatives you showed might yield different results. Choose based on the desired result first, then based on performance. – Oswald Sep 27 '13 at 23:08
  • Is proper rounding the result value a possible issue? – Jongware Sep 27 '13 at 23:11
  • I don't see what you're hoping to achieve, either. If all three inputs are dynamic, then you could at best compute the floating point multiplier as `static_cast(a) / static_cast(c)`, then you need to convert `b` to floating point, multiply, and then round back using C++-mandated rounding, all and any of which is unlikely to be faster than two integer operations. – Kerrek SB Sep 27 '13 at 23:11
  • you can also do integer shifts and adds rather than a multiply (depends on what your hardware has). Using C and the compiler there is conversion overhead, etc that you wont have staying fixed point. but the answer is non-deterministic, you have to just test it... – old_timer Sep 27 '13 at 23:14
  • Are you sure that your program's performance bottleneck is this operation? (I'd bet it isn't.) Measure it, and then optimize the bottleneck. – pts Sep 27 '13 at 23:19
  • @pts - when this is the only operation, and you repeat it 10^7 times, it is a bottleneck. – nbubis Sep 27 '13 at 23:29
  • 1
    Your premise is actually somewhat flawed here. You can't possibly have all numbers be dynamic if you're able to switch between `5/4` and `1.25`. If the numerator and denominator were not compile-time constants, how can you know to multiply by `1.25`? (without actually doing a division which you're trying to avoid) – Mysticial Sep 27 '13 at 23:31
  • Basic arithmetic is so performance irrelevant I don't know why you are concerned about optimizing this. Are you going to run this code in a toaster? – Havenard Sep 27 '13 at 23:33
  • Just to clear things, a multiplication is executed at CPU level in a single operation, and the division is one of the most expensive of the CPU basic operations. Is the case where you are using multiple operations instead of only one, with a division included, be the faster even though you are using only integers? I sincerely doubt it. – Havenard Sep 27 '13 at 23:37

3 Answers3

5

You've said all the values are dynamic, which makes a difference. For the specific values 5 * j / 4, the integer operations are going to be blindingly fast, because pretty much the worst case is that the compiler optimises them to two shifts and one addition, plus some messing around to cope with the possibility that j is negative. If the CPU can do better (single-cycle integer multiplication or whatever) then the compiler typically knows about it. The limits of compilers' abilities to optimize this kind of thing basically come when you're compiling for a wide family of CPUs (generating lowest-common-denominator ARM code, for example), where the compiler doesn't really know much about the hardware and therefore can't always make good choices.

I suppose that if a and b are fixed for a while (but not known at compile time), then it's possible that computing k = double(a) / b once and then int(k * x) for many different values of x, might be faster than computing a * x / b for many different values of x. I wouldn't count on it.

If all the values vary each time, then it seems unlikely that the floating-point division to compute the 1.25, followed by floating-point multiplication, is going to be any faster than the integer multiplication followed by integer division. But you never know, test it.

It's not really possible to give simple relative timings for this on modern processors, it really depends a lot on the surrounding code. The main costs in your code often aren't the "actual" ops: it's "invisible" stuff like instruction pipelines stalling on dependencies, or spilling registers to stack, or function call overhead. Whether or not the function that does this work can be inlined might easily make more difference than how the function actually does it. As far as definitive statements of performance are concerned you can basically test real code or shut up. But the chances are that if your values start as integers, doing integer ops on them is going to be faster than converting to double and doing a similar number of double ops.

Steve Jessop
  • 257,525
  • 32
  • 431
  • 672
1

It is impossible to answer this question out of context. Additionally 5*j/4 does not generally produce the same result as (int) (1.25*j), due to properties of integer and floating-point arithmetic, including rounding and overflow.

If your program is doing mostly integer operations, then the conversion of j to floating point, multiplication by 1.25, and conversion back to integer might be free because it uses floating-point units that are not otherwise engaged.

Alternatively, on some processors, the operating system might mark the floating-point state to be invalid, so that the first time a process uses it, there is an exception, the operating system saves the floating-point registers (which contain values from another process), restores or initializes the registers for your process, and returns from the exception. This would take a great deal of time, relative to normal instruction execution.

The answer also depends on characteristics of the specific processor model the program is executing on, as well as the operating system, how the compiler translates the source into assembly, and possibly even what other processes on the system are doing.

Also, the performance difference between 5*j/4 and (int) (1.25*j) is most often too small to be noticeable in a program unless it or operations like it are repeated a great many times. (And, if they are, there may be huge benefits to vectorizing the code, that is, using the Single Instruction Multiple Data [SIMD] features of many modern processors to perform several operations at once.)

Eric Postpischil
  • 141,624
  • 10
  • 138
  • 247
0

In your case, 5*j/4 would be much faster than 1.25*j because division by powers of 2 can be easily manipulated by a right shift, and 5*j can be done by a single instruction on many architectures such as LEA on x86, or ADD with shift on ARM. Most others would require at most 2 instructions like j + (j >> 2) but that way it's still probably faster than a floating-point multiplication. Moreover by doing int i = 1.25*j you need 2 conversions from int to double and back, and 2 cross-domain data movements which is generally very costly

In other cases when the fraction is not representable in binary floating-point (like 3*j/10) then using int multiply/divide would be more correct (because 0.3 isn't exactly 0.3 in floating-point), and most probably faster (because the compiler can optimize out division by a constant by converting it to a multiplication)


In cases that i and j are of a floating-point type, multiplying by another floating-point value might be faster. Because moving values between float and int domains takes time and conversion between int and float also takes time as I said above

An important difference is that 5*j/4 will overflow if j is too large, but 1.25*j doesn't

That said, there's no general answer for the questions "which is faster" and "how much faster", as it depends on a specific architecture and in a specific context. You must measure on your system and decide. But if an expression is done repeatedly to a lot of values then it's time to move to SIMD

See also

phuclv
  • 27,258
  • 11
  • 104
  • 360