2

I'm learning data movement(MOV) in assembly.
I tried to compile some code to see the assembly in a x86_64 Ubuntu 18.04 machine:

typedef unsigned char src_t;
typedef xxx dst_t;

dst_t cast(src_t *sp, dst_t *dp) {
    *dp = (dst_t)*sp;
    return *dp;
}

where src_t is unsigned char. As for the dst_t, I tried char, short, int and long. The result is shown below:

// typedef unsigned char src_t;
// typedef char dst_t;
//  movzbl  (%rdi), %eax
//  movb    %al, (%rsi)

// typedef unsigned char src_t;
// typedef short dst_t;
//  movzbl  (%rdi), %eax
//  movw    %ax, (%rsi)

// typedef unsigned char src_t;
// typedef int dst_t;
//  movzbl  (%rdi), %eax
//  movl    %eax, (%rsi)

// typedef unsigned char src_t;
// typedef long dst_t;
//  movzbl  (%rdi), %eax
//  movq    %rax, (%rsi)

I wonder why movzbl is used in every case? Shouldn't it correspond to dst_t? Thanks!

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
Sean
  • 477
  • 4
  • 7
  • *where `src_t` is `unsigned char`* When you cast a `char` pointer to another type, if the source memory isn't actually of the type you cast to, you are violating [the strict aliasing rule](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule), and you may also be violating any alignment restrictions your system imposes - like [here](https://stackoverflow.com/questions/19114491/structure-assignment-in-linux-fails-in-arm-but-succeeds-in-x86). – Andrew Henle Oct 24 '19 at 10:33
  • 1
    @AndrewHenle: the OP is casting the `*sp` *value* to `int`, not the *pointer* `sp` to `int *`. Strict aliasing doesn't come into it. From their 2nd code block, we see `typedef unsigned char src_t;` I think just `unsigned` (int) in the first code block is a typo. I fixed that mistake in how the question is presented. – Peter Cordes Oct 24 '19 at 10:37
  • @fuz: The return value really is just `al` when `dst_t` is `char`. The x86-64 System V ABI specifies that high bits of return-value register can hold garbage. (And even the unwritten convention that clang relies on of [extending narrow args to 32-bit](https://stackoverflow.com/questions/36706721/is-a-sign-or-zero-extension-required-when-adding-a-32bit-offset-to-a-pointer-for/36760539#36760539) only applies to args, not return values.) I posted an answer here that addresses this question from all the possible angles I could think of :P – Peter Cordes Oct 24 '19 at 10:41
  • @AndrewHenle: (edit: you deleted your comment while I was typing this). Why are you assuming that the caller passed the address of something other than a `dst_t` object as the 2nd arg? That's also clearly not what the OP is asking about; the generated asm makes sense for the no-UB case (as it must), and that's what's being asked about, not inlining into some unspecified caller. – Peter Cordes Oct 24 '19 at 10:45
  • @PeterCordes There's not context provided. It's more of a "be careful" warning, hence it's just a comment. – Andrew Henle Oct 24 '19 at 10:46
  • @AndrewHenle: Fair enough I guess, but would you really comment to "be careful" on every question involving a function that takes a pointer arg? Anyway, the `movzbl` load would still be there (for the return value) even if we remove the `dp` output arg entirely. Storing the result as well as returning does nicely expose the fact that the compiler is widening more (or apparently "less" for rax) than the width of `dst_t` which we can see from the reg in the store. – Peter Cordes Oct 24 '19 at 10:49

1 Answers1

3

If you're wondering why not movzbw (%rdi), %ax for short, that's because writing to 8-bit and 16-bit partial registers has to merge with the previous high bytes.

Writing a 32-bit register like EAX implicitly zero-extends into the full RAX, avoiding a false dependency on the old value of RAX or any ALU merging uop. (Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?)

The "normal" way to load a byte on x86 is with movzbl or movsbl, same as on a RISC machine like ARM ldrb or ldrsb, or MIPS lbu / lb.

The weird-CISC thing that GCC usually avoids is a merge with the old value that replaces only the low bits, like movb (%rdi), %al. Why doesn't GCC use partial registers? Clang is more reckless and will more often write partial regs, not just read them for stores. You might well see clang load into just %al and store when dst_t is signed char.


If you're wondering why not movsbl (%rdi), %eax (sign-extension)

The source value is unsigned, therefore zero-extension (not sign-extension) is the correct way to widen it according to C semantics. To get movsbl, you'd need return (int)(signed char)c.

In *dp = (dst_t)*sp; the cast to dst_t is already implicit from the assignment to *dp.


The value-range for unsigned char is 0..255 (on x86 where CHAR_BIT = 8).

Zero-extending this to signed int can produce a value range from 0..255, i.e. preserving every value as signed non-negative integers.

Sign-extending this to signed int would produce a value range from -128..+127, changing the value of unsigned char values >= 128. That conflicts with C semantics for widening conversions preserving values.


Shouldn't it correspond to dst_t?

It has to widen at least as wide as dst_t. It turns out that widening to 64-bit by using movzbl (with the top 32 bits handled by implicit zero-extension writing a 32-bit reg) is the most efficient way to widen at all.

Storing to *dp is a nice demo that the asm is for a dst_t with a width other than 32-bit.

Anyway, note that there's only one conversion happening. Your src_t gets converted to dst_t in al/ax/eax/rax with a load instruction, and stored to dst_t of whatever width. And also left there as the return value.

A zero-extending load is normal even if you're just going to read the low byte of that result.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606