0

Because all 16 registers can be 8 bytes in x86-64, at the beginning of a function call when the function (callee) has to push the callee saved registers (%rbx, %rbp and %r12-15) it wants to use, it has no way of knowing whether the caller stored 64-bit or 32-bit or 16-bit or 8-bit values in those registers, so do they always have to call pushq to push the entire 8 bytes of those registers onto the stack, rather than pushl? In other words, are pushl and pushw ever used in x86-64?

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
tgrnie
  • 395
  • 2
  • 10

1 Answers1

2

The entire register is call-preserved, not just the low dword or word. Normal functions always save/restore the whole qword register because that's the only safe thing to do, and it's also efficient enough that there's no reason to create a mechanism for functions to know when they could do anything else.

It's always efficient to read a full register after the 32-bit low half was written because 32-bit register writes implicitly zero-extend to 64-bit. Reading a 64-bit register after the caller wrote the low 8 or 16-bits could cause a partial-register stall on Intel P6-family microarchitectures, if the caller was careless about how it used the register before making a call. On modern uarches (not Intel P6), the 8/16-bit operand size register write already paid whatever merging penalty might have existed (typically a false dependency). (I'm glossing over a couple details like partial AH renaming still being a thing on modern Intel, including Skylake)


While you could move the stack pointer with sub $24, %rsp and use movl or movb to store the 32-bit or 8-bit low parts of some registers, that's only safe if you know something about how your caller uses registers and want to optimize accordingly. (Making your function dependent on the caller's internals, not just the ABI). Even if that was an option for some helper function, it normally wouldn't be worth it to reduce the footprint of your stack frame by a few bytes.

(It's rare for functions to be using 16-bit data, but 8-bit data is not rare. bool and char are common. Compilers usually use movzx aka movzbl loads from memory to zero-extend to full registers, and can often use 32-bit operand size to avoid actually dealing with partial register shenanigans. But they wouldn't care if you saved/restored only the low 8 bits with a mov store / movzbl reload, for registers where a compile is keeping a zero-extended bool or char.)

Are pushl and pushw ever used in x86-64?

pushl literally doesn't exist in 64-bit mode; 32-bit operand-size for push is not encodeable even with a REX.W=0 prefix.

pushw encodeable but never used by compilers in 32 or 64-bit mode. (And generally not useful or recommended for humans, except for weird corner cases or hacks like maybe shellcode. I did use it once when code-golfing (optimizing for code size) merging two 16-bit values into one register for adler-32).

If a compiler did want to do word or dword stores, (e.g. in unoptimized builds to spill incoming register args), it would just use movw or movl.

You generally want to keep the stack aligned by 16 so you're ready to make another function call; that's why I suggested sub $24, %rsp above. (On function entry, RSP points at the return address your caller pushed. RSP+8 and RSP-8 are 16-byte aligned.)


pushq %reg is very efficient on modern CPUs: decodes to a single uop on CPUs with a stack engine (that handles the RSP updates) outside the OoO exec back-end. It's so efficient that clang uses push %rax or other dummy register instead of sub $8, %rsp when it only needs to move the stack pointer by 8 bytes, e.g. to realign the stack before another call.

pushq %reg is a 1 byte instruction (or 2 bytes for r8..r15 including a REX prefix)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606