x86 Comparison Instruction That Uses XOR Instead of Subtraction

Question

I've heard that the x86 comparison instruction: cmp x, y does a subtraction and sets various flags based on the result.

Now, what if I just want to test if the two operands are equal? Wound't doing an XOR instead of a subtraction be much faster? My question is, is there an instruction that does a comparison with an XOR to tell if the two operands are equal? Maybe it would look like this: cmpeq x, y or cmpxor x, y.

I would guess that if I just want to test for equality, using a cmpxor would be faster than the cmp, which does a subtraction. Is there such an instruction like cmpxor that would give me a speedup?

I also want to say that I'm aware that xor will set the zero flag. But if I do xor x, y, it will change x. I don't want that. I want a comparison instruction that will leave both the operands alone.

It wouldn't be “much faster”, since `cmp` is already as fast as any other instruction (including `xor`). On some µarchs, the `cmp` instruction can even be fused with a dependent branch instruction into a single µop by the front end, effectively making it even faster. This is all documented in Intel’s Optimization Manual, which is freely available and worth reading. — Stephen Canon, May 07 '13 at 13:49
Ira Baxter said it very well: *Technically a SUB should take longer than XOR because the carry has to "ripple" through all the bits, whereas XOR is bit-by-bit parallel.* That was my logic. — Aaron, May 07 '13 at 14:51
[*The best choice for comparing a register with zero on modern x86 is `test reg, reg`*](https://stackoverflow.com/a/33724806/995714) — phuclv, Apr 05 '18 at 16:18
@Aaron no modern CPUs use ripple-carry adders like that, otherwise it'll take a ridiculous 32 cycles just to do a simple 32-bit addition/subtraction, and twice the number on a 64-bit system. There are many faster variants like [carry-lookahead adder](https://en.wikipedia.org/wiki/Carry-lookahead_adder) which trades off space for speed. Similar ways were used to speed up more complex operations like multiplications, square root... so that they can finish in a few cycles — phuclv, Apr 05 '18 at 16:21

Ira Baxter · Accepted Answer · 2014-03-04T19:00:56.033

Basic machine operations such as XOR, SUB, CMP, TEST are all simple enough so they all operate extremely fast. They also set the same condition code bits. From the point of view of compare-for-equal, these all set the Z bit the same way; other bits are set differently because these operations compute different results.

For the x86 CPUs, there is no difference in execution times of these, because they all use identical pathways through the chip. Consequently you can use any of them without performance penalty where it computes the answer that you want. (Technically a SUB should take longer than XOR because the carry has to "ripple" through all the bits, whereas XOR is bit-by-bit parallel. The CPU designers have figured out ways to build extremely fast carry-computing logic so the effective time difference isn't significant. They have huge motivation to do so, since most of what a computer does is "add"]).

As a style convention, if you think you are "comparing two (machine-word-sized) values", you should probably use the CMP instruction, because that communicates what you are thinking to the reader of your code. And it has the advantage that it doesn't destroy one of the operands, which you will find ultimately a very persuasive argument for using it instead of XOR, once you've written enough code. (TEST has this nice property, useful for checking bits, too).

There are compares of other kinds of values for which other x86 instructions are better: floating compares, string compares, vector register compares, etc. These instructions take different times than the basic operations because they must do more complicated things (like comparing multiple data words).

Do you have a table where one could look up the execution times? I was looking for this, but haven't found one. — Devolus, May 07 '13 at 13:28
Intel's performance optimization manuals have just such tables — jalf, May 07 '13 at 13:29
So there would be no difference in speed anyway between a `cmp` and `cmpxor`. That answers my question, thanks. — Aaron, May 07 '13 at 14:56
A pretty well-known resource for such tables is in Agner Fog's optimization guides, http://www.agner.org/optimize/ - item 4. Those cover non-Intel x86-compatible CPUs (AMD, and others) as well. — FrankH., May 09 '13 at 10:14

x86 Comparison Instruction That Uses XOR Instead of Subtraction

1 Answers1