12

Say I had a really performance-critical loop in my program where I need to check if a point was inside a rectangle, but I know at compile time that the lower bounds are always going to be 0, like the following: (x >= 0 && y >= 0 && x < width && y < height)

Could I eliminate the first two comparisons by type-punning x and y to unsigned integers (for instance with something like reinterpret_cast<>() or a union in C++), since the sign bit would guarantee any negative number would turn into an unsigned int large enough to fail the bounds check? If so, how would you go about implementing this in C++ or another language? Could you gain any performance improvement by doing this?

Stuntddude
  • 212
  • 1
  • 15

2 Answers2

12

Yes, it's a perfectly valid optimization when you're testing a signed integer and the lower bound is zero. In fact it's such a common optimization that your compiler will almost certainly do it automatically; obfuscating your code by doing it yourself is very likely to be a pointless premature optimization.

I just tested this on GCC 4.9, and confirmed by inspecting the generated assembly code that it performs this optimization automatically at -O1 and above. I would expect all modern compilers to do the same.

Ross Smith
  • 3,470
  • 1
  • 25
  • 22
  • If I understand your test results, you're saying that manually doing this doesn't matter... it happens regardless? – Drew Dormann Jan 19 '15 at 02:13
  • 1
    Well, it doesn't have to obfuscate the code, if placed in an appropriately named inline function. – Ben Voigt Jan 19 '15 at 02:13
  • 3
    That's right, the compiler is smart enough to spot this optimization opportunity and do it for you. – Ross Smith Jan 19 '15 at 02:13
  • 1
    So the compiler detects that you are performing a greater than zero test on an integer, and doing another less than test (against a positive value!) on the same integer, and then switches it to unsigned comparison mode to save less than 1 clock cycle? – BWG Jan 19 '15 at 02:14
  • 1
    @BWG what do you mean by "less than one clock cycle"? It's converting two comparisons into one when reconsidering the signed-ness of the variable allows that. – Drew Dormann Jan 19 '15 at 02:17
  • I guess I should have expected that GCC would have a case for this. I take it it's a fair assumption that the JVM's just-in-time compiler will give a similar result for the Java equivalent? – Stuntddude Jan 19 '15 at 02:19
  • 2
    @Stuntddude that's a brand-new question. For other people. – Drew Dormann Jan 19 '15 at 02:32
  • 1
    @Stuntddude Java doesn't have unsigned integers, but of course that doesn't stop a compiler from making use of unsigned op codes. – rici Jan 19 '15 at 04:05
  • 1
    I confirm that the Clang/LLVM toolchain also optimizes this starting at O1: it produces `%1 = icmp sgt i32 %i, -1` for `i >= 0 and i <= 2147483647`. – Matthieu M. Jan 19 '15 at 10:29
  • 2
    Isn't this optimization only automatically possible if compiler can prove `width` is non-negative? – zch Jan 19 '15 at 10:37
3

Maybe...

Whilst on "paper" this would seem to allow you to perform only two comparisons rather than four (which is nice) - you cannot guarantee how this will perform. Most CPUs these days can perform multiple parallel operations simultaneously - and the four comparisons you have are easily computed in parallel.

Your question depends on compiler, CPU and also the code before and after the check - so my answer is "maybe".

Avoid casting x,y to a type that is of a different size than what they currently are - i.e. cast from int8_t to uint8_t is fine, int8_t to uint32_t might incur a penalty.

Rewriting as you desire:

if ( ( static_cast<uint8_t>(x) < width ) &&
     ( static_cast<uint8_t>(y) < length ) )

Testing the performance delta is quite difficult, you will need to wrap your code with some assembly using the RDTSC instruction to catch the time before and after. You will likely also need to use the CPUID instruction to flush the pipeline as well.

In short, to me your optimization seems reasonable, but probably won't yield much if anything. It will work though.

stackmate
  • 780
  • 7
  • 15