2

I was reading a textbook which shows generated assembly code based on a C function:

C Code:

void proc(long a1, long *a1p,
          int a2, int *a2p,
          short a3, short *a3p,
          char a4, char *a4p)
{
   *a1p += a1;
   *a2p += a2;
   *a3p += a3;
   *a4p += a4;
}

Generated assembly code:

proc:
   movq 16(%rsp), %rax     //Fetch a4p (64 bits)     Line 1
   addq %rdi, (%rsi)       //*a1p += a1 (64 bits)    Line 2
   addl %edx, (%rcx)       //*a2p += a2 (32 bits)    Line 3
   addw %r8w, (%r9)        //*a3p += a3 (16 bits)    Line 4
   movl 8(%rsp), %edx      //Fetch a4 ( 8 bits)      Line 5
   addb %dl, (%rax)        //*a4p += a4 ( 8 bits)    Line 6
   ret return                                     

Observe that the movl instruction of line 5 reads 4 bytes from memory; the following addb instruction only makes use of the low-order byte.

but I'm wondering why we don't use "movb" in the line 5 directly as:

...
movb 8(%rsp), %dl      //Fetch a4 ( 8 bits)      
addb %dl, (%rax)        //*a4p += a4 ( 8 bits)    

isn't this approach more concise and straightforward?

amjad
  • 2,936
  • 7
  • 25
  • does this help to answer your question? [https://stackoverflow.com/questionshttps://stackoverflow.com/questions/1898834/why-would-one-use-movl-1-eax-as-opposed-to-say-movb-1-eax/1898834/why-would-one-use-movl-1-eax-as-opposed-to-say-movb-1-eax](https://stackoverflow.com/questionshttps://stackoverflow.com/questions/1898834/why-would-one-use-movl-1-eax-as-opposed-to-say-movb-1-eax/1898834/why-would-one-use-movl-1-eax-as-opposed-to-say-movb-1-eax) – John Herwig Jul 15 '20 at 02:10
  • sorry, meant to post this, but it got messed up: https://stackoverflow.com/questions/1898834/why-would-one-use-movl-1-eax-as-opposed-to-say-movb-1-eax/ – John Herwig Jul 15 '20 at 02:13
  • 1
    Well, in machine code, it's 4 bytes either way. – Calculuswhiz Jul 15 '20 at 02:15
  • 2
    When a scalar `char` value is passed to a function, it is sign extended to 32 bits, and that 32 bit value is pushed onto the stack [as if you passed an `int`]. That is, the effective prototype is: `...,int a4,...`. IIRC [and I could be wrong about this], doing `movl` is actually _faster_ than doing `movb` [on an x86] in this context. The memory _fetch_ from the stack still has to populate the cache line, so it's already fetched the other bytes. – Craig Estey Jul 15 '20 at 02:25
  • 3
    Basically the same as [Instructions to copy the low byte from an int to a char: Simpler to just do a byte load?](https://stackoverflow.com/q/62787483), except that the value in memory *is* just a char, not an `int`, so it's more surprising that a compiler would use a dword load instead of a `movzbl 8(%rsp), %edx`. But actual performance problems (store-forwarding stall) are unlikely because as @CraigEstey points out, [narrow args are extended to 32-bit by gcc/clang as an undocumented extension to the x86-64 System V ABI](https://stackoverflow.com/a/36760539/224132) – Peter Cordes Jul 15 '20 at 02:28
  • Both ways are valid, though; clang chooses to use a `movb` load: https://godbolt.org/z/969jv3 – Peter Cordes Jul 15 '20 at 02:29
  • @PeterCordes Hey, Peter glad to see you're "on the job" ;-) I was considering finding a recent page you had answered/commented and link this to you because I felt it would be something you were interested in. I guess I shouldn't have worried because of the `x86-64` tag. – Craig Estey Jul 15 '20 at 02:34
  • @CraigEstey: heh, I had family visiting for the past week so I wasn't putting as much time into stack overflow as usual. But yeah, feel free to ping me with a link if there's a question you think I might have missed. – Peter Cordes Jul 15 '20 at 02:36

0 Answers0