4

Is there a major positive quality when using "or" instead of "cmp"?

Consider this function prologue:

push  ebp
mov   ebp,esp
push  ebx

xor   eax,eax        ;error return code
mov   ecx,[ebp +8]   ;first integer arg after return addr.
mov   edx,[ebp +12]  ;second integer argument

The function shall calculate a / b or a % b.
First I need to check against a 0 divisor.

My intuitive move would be to assemble

cmp  edx,0
je   InvalidDivisor

But when I look into advanced books on assembler there would be used this:

or   edx,edx
jz   InvalidDivisor   

My question is why is this second solution "more correct"?
Would it not take longer to calculate an or-operation and check for zero-flag than just compare two values?

Is it just a matter of more advanced coding style?

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
clockw0rk
  • 503
  • 3
  • 25
  • 1
    They have the same latency and throughput, but the `or` form is smaller since there's no need to encode an immediate into the machine code. – Michael Mar 20 '17 at 17:53
  • this has been asked and answered several times, need to find the duplicate link... – old_timer Mar 20 '17 at 19:56

4 Answers4

5

or edx,edx is two bytes, cmp edx, 0 is three so you know which to pick if you care about the size.

If you care more about speed then you actually need to measure. Or will obviously "change" the register and might add latency if the next instruction uses the same register.

The best choice for comparing a register with zero is test reg, reg.

83 fa 00  cmp    edx,0x0
09 d2     or     edx,edx ; Smaller
85 d2     test   edx,edx ; Smaller and better, updates ZF but does not store the result
Community
  • 1
  • 1
Anders
  • 83,372
  • 11
  • 96
  • 148
  • choosing this answere cause Anders posted a link to more complex explanations, cheers to all other posters and thank you very much – clockw0rk Mar 20 '17 at 18:15
3

Both instructions assemble to:

83 fa 00                cmp    edx,0x0
09 d2                   or     edx,edx

As you can see using or is shorter (so less code to load on runtime) and has the same effect. However it is actually better to use:

85 d2                   test   edx,edx 

which also sets the zero flag if edx is zero and further operations know that they don't need to rely on the result even if the CPU didn't figure that out already.

sannaj
  • 340
  • 2
  • 7
2

While the example in the question is for the Intel x86, the CMP and OR instructions exist in other processors as well.

On the MOS 6502, where there are not that many registers and you might pass parameters or return values in status flags as well, you could want to avoid instructions that affect the C flag. So, you could prefer EOR (exclusive or), AND, OR to CMP. On the 6502, almost all instructions that copy data affect the N and Z flags.

On the Atmel 8-bit AVR microcontroller series, there is a handy instruction CPSE (compare and skip if equal) which does not affect any flags, if I remember correctly. AVR-GCC designates one of the 32 registers as a "zero register" and then emits code to use CPSE with that register.

Marko Mäkelä
  • 396
  • 2
  • 4
1

You don't state explicitly what processor this is for, but in general terms:

You're coding in assembler, therefore you care about memory and clock-cycles.

All you want to do is detect if EAX is zero. ORinx EAX with itself will set the Z status bit if EAX is zero, without any side-effects on the contents of EAX, and more quickly than a direct compare with zero.

Comparing a register with an immediate value likely takes (at least) an extra cycle to load and an extra byte (or 2, or 4) for the constant value '0'.
On the other hand, there are a limited number of registers and the reference to EAX is likely encoded directly in the instruction using 3 or 4 bits.

Ed Randall
  • 5,239
  • 1
  • 39
  • 38
  • Actual x86 CPUs can decode a 3-byte `cmp` along with up to 3 or 4 other instructions in the same clock cycle. But yes, code-size is the disadvantage of `cmp reg,imm8` vs. `or reg,reg`. There are plenty of cases where a 3-byte `cmp` would be better than a 2-byte `or`, though, on out-of-order exec CPUs, so this is not the full story. Fortunately `test reg,reg` gives us the best of both worlds. – Peter Cordes Feb 23 '20 at 08:02