11

I saw a tweet recently that confused me (this was posted by an XNA coder, in the context of writing an XNA game):

Microoptimization tip of the day: when possible, use multiplication instead of division in high frequency areas. It's a few cycles faster.

I was quite surprised, because I always thought compilers where pretty smart (for example, using bit-shifting), and recently read a post by Shawn Hargreaves saying much the same thing. I wondered how much truth there was in this, since there are lots of calculations in my game.

I inquired, hoping for a sample, however the original poster was unable to give one. He did, however, say this:

Not necessarily when it's something like "center = width / 2". And I've already determined "yes, it's worth it". :)

So, I'm curious...

Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself.

Danny Tuppeny
  • 32,892
  • 19
  • 124
  • 246
  • 1
    Related: [Optimizing integer divisions with Multiply Shift in C#](http://www.codeproject.com/KB/cs/FindMulShift.aspx) found in related-is [What's the fastest way to divide an integer by 3?](http://stackoverflow.com/questions/171301/whats-the-fastest-way-to-divide-an-integer-by-3)) -- shows how divisions can be reduced e.g. as compiler optimization. Utilizes the fact of *constants* which is about the best the compiler can do while ensuring programmers semantics. As soon as the division is not a constant about the best compiler(s) can do is just "perform it as is". –  Feb 20 '11 at 00:04
  • @pst Interesting stuff - thanks! – Danny Tuppeny Feb 21 '11 at 07:08

5 Answers5

7

Most compilers can do a reasonable job of optimizing when you give them a chance. For example, if you're dividing by a constant, chances are pretty good that the compiler can/will optimize that so it's done about as quickly as anything you can reasonably substitute for it.

When, however, you have two values that aren't known ahead of time, and you need to divide one by the other to get the answer, if there was much way for the compiler to do much with it, it would -- and for that matter, if there was much room for the compiler to optimize it much, the CPU would do it so the compiler didn't have to.

Edit: Your best bet for something like that (that's reasonably realistic) would probably be something like:

double scale_factor = get_input();

for (i=0; i<values.size(); i++)
    values[i] /= scale_factor;

This is relatively easy to convert to something like:

scale_factor = 1.0 / scale_factor;

for (i=0; i<values.size(); i++)
    values[i] *= scale_factor;

I can't really guarantee much one way or the other about a particular compiler doing that. It's basically a combination of strength reduction and loop hoisting. There are certainly optimizers that know how to do both, but what I've seen of the C# compiler suggests that it may not (but I never tested anything exactly like this, and the testing I did was a few versions back...)

Mark A. Donohoe
  • 23,825
  • 17
  • 116
  • 235
Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • 2
    @DanTup - Danny Tuppeny: Rearranging an entire algorithm to work inverted is beyond even the most sophisticated compilers. – Jerry Coffin Feb 19 '11 at 22:44
  • @Jerry That makes sense, though I don't think the intention of the tweet was to do that (I'd call that optimising the algorithm more than "changing divisions to multiplications"). I updated my question to be a little less vague on what I was after: Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself. – Danny Tuppeny Feb 19 '11 at 23:03
  • 1
    @Jerry: Would that transformation produce the same result? I have a feeling floating point arithmetic will screw it up somewhere... Sure the small difference probably won't matter most of the time, but I don't think it is ok for a compiler to do that. – R. Martinho Fernandes Feb 19 '11 at 23:54
  • @Martinho Fernandes: I haven't read that part of the C# spec recently enough to be able to say for sure. I suggested it specifically because I wouldn't really expect most compilers to do it, but in many cases (e.g., graphics) the result would be fine, and the transformation is easy. – Jerry Coffin Feb 20 '11 at 00:02
  • @Jerry The sample looks fairly simple, but I can see a few reasons why the compiler wouldn't/couldn't change it. I haven't tested it out (not figured out how to get cs running on my iPad...) but I'd guess it's along the same lines as the original tweeter was going :-) – Danny Tuppeny Feb 20 '11 at 11:15
4

Although the compiler can optimize out divisions and multiplications by powers of 2, other numbers can be difficult or impossible to optimize. Try optimizing a division by 17 and you'll see why. This is of course assuming the compiler doesn't know that you are dividing by 17 ahead of time (it is a run-time variable, not a constant).

Darkhydro
  • 1,822
  • 4
  • 23
  • 39
  • 1
    +1 For the simple "difficult or *impossible* to optimize ... not a constant". It can actually optimize out a good but more (integer) division than that. See http://stackoverflow.com/questions/171301/whats-the-fastest-way-to-divide-an-integer-by-3 -- not that it necessarily does. –  Feb 20 '11 at 00:08
3

Bit late but never mind.

The answer to your question is yes.

Have a look at my article here, http://www.codeproject.com/KB/cs/UniqueStringList2.aspx, which uses information based on the article mentioned in the first comment to your question.

I have a QuickDivideInfo struct which stores the magic number and the shift for a given divisor thus allowing division and modulo to be calculated using faster multiplication. I pre-computed (and tested!) QuickDivideInfos for a list of Golden Prime Numbers. For x64 at least, the .Divide method on QuickDivideInfo is inlined and is 3x quicker than using the divide operator (on an i5); it works for all numerators except int.MinValue and cannot overflow since the multiplication is stored in 64 bits before shifting. (I've not tried on x86 but if it doesn't inline for some reasons then the neatness of the Divide method would be lost and you would have to manually inline it).

So the above will work in all scenarios (except int.MinValue) if you can precalculate. If you trust the code that generates the magic number/shift, then you can deal with any divisor at runtime.

Other well-known small divisors with a very limited range of numerators could be written inline and may well be faster if they don't need an intermediate long.

Division by multiple of two: I would expect the compiler to deal with this (as in your width / 2) example since it is constant. If it doesn't then changing it to width >> 1 should be fine

Simon Hewitt
  • 1,333
  • 8
  • 20
-2

To give some numbers, on this pdf

http://cs.smith.edu/dftwiki/index.php/CSC231_Pentium_Instructions_and_Flags

of the Pentium we get some numbers, and they aren't good:

  • IMUL 10 or 11
  • FMUL 3+1
  • IDIV 46 (32 bits operand)
  • FDIV 39

We are speaking of BIG differences

xanatos
  • 102,557
  • 10
  • 176
  • 249
  • I'm not asking about the difference in speed of actual operations on the CPU, but rather, the difference in writing C#, and why the compiler wouldn't be able to optimise this. – Danny Tuppeny Feb 19 '11 at 22:11
  • @DanTup You mean it should multiply x 0.5 instead of dividing by 2? Or (even better) that it should shift by 1 if it's an integer? – xanatos Feb 19 '11 at 22:13
  • @xanatos I'm asking under what circumstances would the compiler *not* be able to optimise something that I could easily optimise. Should I change all my "/ 2" to "* 0.5"? If I have varA * varB should I change it to division? etc. – Danny Tuppeny Feb 19 '11 at 22:34
  • 1
    @DanTup if you consider the side-effects, then probably it can't. For example with floating points, I'm not sure / 5 and * 0.2 are the same thing. 0.2 can't be represented by floating points (it's periodic in base 2), so you have moved the error to "before your operation" instead to "after your operation". Is 10000000000 / 5 the same as 10000000000 * 0.2 (where 0.2 is really 1.99999999999999999...)? – xanatos Feb 19 '11 at 22:38
  • @xanatos This is exactly one of the things I was thinking about. If the change could result in different values, then the compiler will of course not be able change it. This would explain an optimisation you could make "by hand", but the original tweet seemed to suggest you can make changes with no side-effects, but a boost in performance. Is there any possibility of this? – Danny Tuppeny Feb 19 '11 at 22:44
  • Whilst `/5` and `*0.2` may be different, from the programmer's perspective, in many real-world situations, either would suffice. In my application, I'd be happy for the compiler to do this, but I can see scenarios where it would not be good to allow. – David Heffernan Feb 19 '11 at 22:44
  • I updated my question to be a little less vague on what I was after: Can anyone give an example of some code where you can change a division to a multiplication and get a performance gain, where the C# compiler wasn't able to do the same thing itself. – Danny Tuppeny Feb 19 '11 at 23:04
  • 3
    It's worth pointing out that the resource you have linked is from ***1993***. These days the differences aren't even remotely as "BIG" as your answer would imply. – Andrew Russell Feb 20 '11 at 02:33
  • They are number for the "Pentium", probably "Original Pentium" :-) – xanatos Feb 20 '11 at 06:48
  • Multiply is still significantly fewer clock cycles (much faster) than Divide on modern processors such as SandyBridge, IvyBridge, etc. See: https://stackoverflow.com/questions/4125033/floating-point-division-vs-floating-point-multiplication and https://stackoverflow.com/questions/12333638/how-much-cycles-math-functions-take-on-modern-processors and other more current links. – deegee Nov 28 '17 at 20:33
-2
 while(start<=end)
    {
    int mid=(start+end)/2;
    if(mid*mid==A)
    return mid;
    if(mid*mid<A)
    {
    start=mid+1;
    ans=mid;
    }

If i am doing this way the outcome is the TIME LIMIT EXCEEDED for square root of 2147483647

But if i am doing the following way then the thing is clear that for Division compiler responds faster than for multiplication.

while(start<=end)
    {
    int mid=(start+end)/2;
    if(mid==A/mid)
    return mid;
    if(mid<A/mid)
    {
    start=mid+1;
    ans=mid;
    }
    else
    end=mid-1;
    }
Parveen Kumar
  • 111
  • 2
  • 4
  • What do you mean by "the outcome is the TIME LIMIT EXCEEDED"? Do you mean an infinite loop, or do you mean it's just too slow than expected? – Ignatius Apr 06 '19 at 12:18
  • It's taking more time to compute than the case described by the use of division @Taegyung – Parveen Kumar Apr 07 '19 at 06:29