22

Depending on this question Floating point division vs floating point multiplication. Division is slower than multiplication due to some reasons.

Will the compiler, usually, replace division by multiplication if it is possibe?

For example:

float a;
// During runtime a=5.4f
float b = a/10.f;

Will it be:

float a;
// During runtime a=5.4f
float b = a*0.1f;

If it is considered a compiler dependable question, I am using VS2013 default compiler. However, it would be nice if I got a generic answer (theoretical validity of this optimization)

Community
  • 1
  • 1
Humam Helfawi
  • 17,706
  • 12
  • 64
  • 134
  • 1
    Wouldn't the compiler have to do a division in order to be able to multiply by the reciprocal? – NathanOliver Feb 19 '16 at 13:16
  • 2
    This is not case included under "if possible", this is a case where it's not possible unless a loss of accuracy is accepted. So, hopefully only when compiling with a flag that specifically allows it. – harold Feb 19 '16 at 13:16
  • @NathanOliver compile time one.. will not hurt – Humam Helfawi Feb 19 '16 at 13:18
  • 1
    http://goo.gl/AV8MlT Doesn't look like the compiler would optimize here. – Simon Kraemer Feb 19 '16 at 13:19
  • 1
    You can look at assembly output in MSVC++ with `fp:fast` option: https://msdn.microsoft.com/en-us/library/e7s85ffb.aspx – Serge Rogatch Feb 19 '16 at 13:29
  • 3
    I know this is for VS2013, but for the interest of GCC users: in GCC, the flag that specifically enables this particular optimization is `-freciprocal-math`, which is also automatically enabled when selecting either `-funsafe-math-optimizations`, `-ffast-math` or `-Ofast`. See https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/Optimize-Options.html – Pedro Gimeno Nov 26 '18 at 19:56

1 Answers1

21

No, the compiler is not allowed to do that for the general case: the two operations could produce results that are not bit-identical due to the representation error of the reciprocal.

In your example, 0.1 does not have an exact representation as float. This causes the results of multiplication by 0.1 and division by 10 to differ:

float f = 21736517;
float a = f / 10.f;
float b = f * 0.1f;
cout << (a == b) << endl; // Prints zero

Demo.

Note: As njuffa correctly notes in the comment below, there are situations when the compiler could make some optimizations for a wide set of numbers, as described in this paper. For example, multiplying or dividing by a power of two is equivalent to addition to the exponent portion of the IEEE-754 float representation.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
  • 8
    Though the compiler will do the transformation if you tell it that you don't care (gcc's `-ffast-math`, whatever MSVC's equivalent is). – Marc Glisse Feb 19 '16 at 13:25
  • 1
    Given how imprecisely C++ defines floating point, I think you are wrong. The compiler *is* allowed to do that. However, most of them appear to choose not to (preferring to provide more accuracy, at the cost of speed). – Martin Bonner supports Monica Feb 19 '16 at 13:44
  • 5
    One has to seperate out cases where a floating-point division can easily be replaced with a multiplication *while maintaining bit-identical results*. For platforms with IEEE-754 arithmetic, this is true for constant divisors that are a power of 2, when the inverse is representable. I have seen compilers apply this optimization (e.g. division by 2.0 becomes multiply with 0.5). There is a technique applicable to a wider range of other constant divisors, as described in [this paper](http://perso.ens-lyon.fr/nicolas.brisebarre/Publi/fpdivision.pdf). Sadly, I have not seen any compiler use it. – njuffa Feb 19 '16 at 17:04
  • @njuffa Thank you very much for a great comment! – Sergey Kalinichenko Feb 19 '16 at 17:51
  • @njuffa If I was a conscientious compiler author I would still worry about `x * C1` rounding up to `+inf` when `x / C` remained finite, among other corner cases. They could start by replacing `x_float / C_float` by `(float)((double) x_float * double_C1)`, which is faster on many target architectures, does not require a FMA, and which no compiler currently uses either. – Pascal Cuoq Feb 19 '16 at 17:53
  • @PascalCuoq I have used the technique from the paper (note that I posted a link to a *draft / preprint*, as the published paper is paywalled best I know) as a manual optimization and don't recall encountering any issues. As you are probably aware, mixing float and double computation has practical issues (vastly different throughput on many GPUs, SIMD vectorization obstacle). As far as the use of FMA is concerned: support for this operation is fast becoming ubiquitous (GPUs, x86, Power, SPARC, ARM), and certainly any forward-looking research or compiler work should assume it is available (IMHO) – njuffa Feb 19 '16 at 18:05
  • @njuffa To be clear, you are talking about replacing a single-precision division `x / 0x1.f7cf3p-4f` by `fmaf(-0x1.3da6a4p-22f, x, x * 0x1.042974p+3f)`, right? – Pascal Cuoq Feb 19 '16 at 18:34
  • @PascalCuoq Replacing `x / 0x1.f7cf3p-4f` by `fmaf (0x1.042974p+3f, x, -0x1.3da6a4p-22f * x)` is I think what you meant? If so, yes. I can see that it breaks down for x=+/-INF and `|x| <= 0x1.f1eb78p-108f`. Where I used this previously as a manual optimization neither very large nor very small operands would occur, so no problems there. Using this technique in a compiler as a general solution would require range analysis or a fast path / slow path approach, which may well be impractical – njuffa Feb 19 '16 at 19:08
  • @njuffa sorry, I was mis-remembering the technique, but yes, infinities cause problems when C1 and C2 have different sign is what I was coming at. – Pascal Cuoq Feb 19 '16 at 19:14
  • 2
    I definitely appreciate the difficulty of adopting this technique as a general purpose solution (obvious issues arise when the head and tail of the precomputed reciprocal are of opposite sign, and when the dividend is so small as to cause underflow in the tail computation of the quotient). To be clear, I am only referring to transformations that guarantee bitwise identical results and are thus "safe". There are already plenty of compiler optimization options available for people who don't mind the occasional wrong result :-) – njuffa Feb 19 '16 at 19:19