6

What's the most efficient way to set and also to clear the zero flag (ZF) in x86-64?

Methods that work without the need for a register with a known value, or without any free registers at all are preferred, but if a better method is available when those or other assumptions are true it is also worth mentioning.

BeeOnRope
  • 51,419
  • 13
  • 149
  • 309

2 Answers2

9

ZF=0

This is harder. cmp between any two regs known to be not equal. Or cmp reg,imm with any value some reg couldn't possibly have. e.g. cmp reg,1 with any known-zero register.

In general test reg,reg is good with any known-non-0 register value, e.g. a pointer.

I don't see a way to create ZF=0 in one instruction without a false dependency on some input reg. xor eax,eax / inc eax or dec will do the trick in 2 uops if you don't mind destroying a register, breaking false dependencies. (not doesn't set FLAGS, and neg will just do 0-0 = 0.)

or eax, -1 doesn't need any pre-condition for the register value. (False dependency, but not a true dependency so you can pick any register even if it might be zero.) It doesn't have to be -1, it's not gaining you anything so if you can make it something useful so much the better.

or eax,-1 FLAG results: ZF=0 PF=1 SF=1 CF=0 OF=0 (AF=undefined).

If you need to do this in a loop, you can obviously set up for it outside the loop, if you can dedicate a register to being non-zero for use with test.


ZF=1

xor-zeroing (like xor eax,eax using any free register) is definitely the most efficient way on SnB-family (same cost as a 2-byte nop, or 3-byte if your free reg is r8d..r15 and needs a REX prefix): 1 front-end uop, zero back-end uops, and the FLAGS result is ready in the same cycle it issues. (Relevant only in case the front-end was stalled, or a uop depending on it issues the cycle and there aren't any older uops in the RS.)

Flag results: ZF=1 PF=1 SF=0 CF=0 OF=0 (AF=undefined).

xor-zero is very cheap on all other uarches as well, of course: no input dependencies, and doesn't need any pre-existing register value. (And thus doesn't contribute to P6-family register-read stalls). So it will be at worst tied with anything else you could do on any other uarch (where it does require an execution unit.)

(On early P6-family, before Pentium M, xor-zeroing does not break dependencies; it only triggers the special al=eax state that avoids partial-register stuff. But none of those CPUs are x86-64, all 32-bit only.)

It's pretty common to want a zeroed register for something anyway, e.g. as a sub destination for 0 - x to copy-and-negate, so take advantage of it by putting the xor-zeroing where you need it to also create a useful FLAG condition.


As @prl suggested, cmp same,same with any register will work without disturbing a value. I suspect this is not special-cased as dependency breaking the way sub same,same is on some CPUs, so pick a "cold" register. Again 2 or 3 bytes, 1 uop. It can micro-fuse with a JCC, but that would be dumb (unless the JCC is also a branch target from some other condition?)

Flag results: same as xor-zeroing.

Downsides:

  • (probably) false dependency
  • on P6-family can contribute to a register-read stall, so pick a cold register you're already reading in nearby instructions.
  • needs a back-end execution unit on SnB-family

Just for fun, other as-cheap alternatives include test al, 0. 2 bytes for AL, 3 or 4 bytes for any other 8-bit register. (REX) + opcode + modrm + imm8. The original register value doesn't matter because an imm8 of zero guarantees that reg & 0 = 0.


If you happen to have a 1 or -1 in a register you can destroy, 32-bit mode inc or dec would set ZF in only 1 byte. But in x86-64 that's at least 2 bytes. Nothing comes to mind for a 1-byte instruction in 64-bit mode that's actually efficient and sets FLAGS.


ZF=CF

sbb same,same can set ZF=CF (leaving CF unmodified), and setting the reg to 0 (CF=0) or -1 (CF=1). On Bulldozer-family, this has no dependency on the GP register, only CF, but on other uarches it's not special cased and there is a false dep on the reg.


ZF=bool(integer register)

To set ZF=integer_reg, obviously the normal test reg,reg is your best bet. (Better than and reg,reg or or reg,reg, unless you're intentionally rewriting the register to avoid P6 register-read stalls.)


Other FLAGS conditions:

CF has clc/stc/cmc instructions. (clc is as efficient as xor-zeroing on SnB-family.)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • @BeeOnRope: luckily I was already working on a ZF=0 section, because I figured people searching on this might well be wanting to inverse. – Peter Cordes Feb 03 '19 at 03:29
3

Assuming you don’t need to preserve the values of the other flags,

cmp eax, eax
prl
  • 9,514
  • 2
  • 8
  • 26