ZF=0
This is harder. cmp
between any two regs known to be not equal. Or cmp reg,imm
with any value some reg couldn't possibly have. e.g. cmp reg,1
with any known-zero register.
In general test reg,reg
is good with any known-non-0 register value, e.g. a pointer.
I don't see a way to create ZF=0 in one instruction without a false dependency on some input reg. xor eax,eax
/ inc eax
or dec
will do the trick in 2 uops if you don't mind destroying a register, breaking false dependencies. (not
doesn't set FLAGS, and neg
will just do 0-0 = 0.)
or eax, -1
doesn't need any pre-condition for the register value. (False dependency, but not a true dependency so you can pick any register even if it might be zero.) It doesn't have to be -1
, it's not gaining you anything so if you can make it something useful so much the better.
or eax,-1
FLAG results: ZF=0 PF=1 SF=1 CF=0 OF=0 (AF=undefined).
If you need to do this in a loop, you can obviously set up for it outside the loop, if you can dedicate a register to being non-zero for use with test
.
ZF=1
xor-zeroing (like xor eax,eax
using any free register) is definitely the most efficient way on SnB-family (same cost as a 2-byte nop
, or 3-byte if your free reg is r8d..r15 and needs a REX prefix): 1 front-end uop, zero back-end uops, and the FLAGS result is ready in the same cycle it issues. (Relevant only in case the front-end was stalled, or a uop depending on it issues the cycle and there aren't any older uops in the RS.)
Flag results: ZF=1 PF=1 SF=0 CF=0 OF=0 (AF=undefined).
xor-zero is very cheap on all other uarches as well, of course: no input dependencies, and doesn't need any pre-existing register value. (And thus doesn't contribute to P6-family register-read stalls). So it will be at worst tied with anything else you could do on any other uarch (where it does require an execution unit.)
(On early P6-family, before Pentium M, xor
-zeroing does not break dependencies; it only triggers the special al=eax state that avoids partial-register stuff. But none of those CPUs are x86-64, all 32-bit only.)
It's pretty common to want a zeroed register for something anyway, e.g. as a sub
destination for 0 - x
to copy-and-negate, so take advantage of it by putting the xor-zeroing where you need it to also create a useful FLAG condition.
As @prl suggested, cmp same,same
with any register will work without disturbing a value. I suspect this is not special-cased as dependency breaking the way sub same,same
is on some CPUs, so pick a "cold" register. Again 2 or 3 bytes, 1 uop. It can micro-fuse with a JCC, but that would be dumb (unless the JCC is also a branch target from some other condition?)
Flag results: same as xor-zeroing.
Downsides:
- (probably) false dependency
- on P6-family can contribute to a register-read stall, so pick a cold register you're already reading in nearby instructions.
- needs a back-end execution unit on SnB-family
Just for fun, other as-cheap alternatives include test al, 0
. 2 bytes for AL, 3 or 4 bytes for any other 8-bit register. (REX) + opcode + modrm + imm8. The original register value doesn't matter because an imm8
of zero guarantees that reg & 0 = 0
.
If you happen to have a 1
or -1
in a register you can destroy, 32-bit mode inc
or dec
would set ZF in only 1 byte. But in x86-64 that's at least 2 bytes. Nothing comes to mind for a 1-byte instruction in 64-bit mode that's actually efficient and sets FLAGS.
ZF=CF
sbb same,same
can set ZF=CF (leaving CF unmodified), and setting the reg to 0 (CF=0) or -1 (CF=1). On Bulldozer-family, this has no dependency on the GP register, only CF, but on other uarches it's not special cased and there is a false dep on the reg.
ZF=bool(integer register)
To set ZF=integer_reg, obviously the normal test reg,reg
is your best bet. (Better than and reg,reg
or or reg,reg
, unless you're intentionally rewriting the register to avoid P6 register-read stalls.)
Other FLAGS conditions:
CF has clc
/stc
/cmc
instructions. (clc
is as efficient as xor-zeroing on SnB-family.)