repeat / broadcast a byte to every position of an integer register

Question

I am struggling with a problem in assembly, where I have to take the first byte (FF) of the hex code and copy it over the entire value:

0x045893FF      input
0xFFFFFFFF      output

What I did is:

movl $0x04580393FF, %eax
shl $24, %eax     # to get only the last byte 0xFF000000

Now I want to copy this byte into the rest of the register.

Does your system really have 4-bit bytes? You're using `%eax`, which makes me think x86 (which has 8-bit bytes), but you say that `FF` is two bytes. — Carl Norum, Mar 20 '12 at 00:14
@Carl: There is no rule stating that a byte must have 8 bits :) http://en.wikipedia.org/wiki/Octet_(computing) — Kendall Frey, Mar 20 '12 at 00:15
I didn't say there was such a rule, did I? I'm just surprised. — Carl Norum, Mar 20 '12 at 00:15
@CarlNorum You need to squeeze the digits hard enough: `sqezl %eax` — Gunther Piez, Mar 20 '12 at 00:38
I was mistaking, sorry I meant byte and not two bytes for FF — juliensaad, Mar 20 '12 at 00:44

Gunther Piez · Accepted Answer · 2012-03-20T00:42:05.980

5

You could do it for instance like this:

mov %al, %ah    #0x0458FFFF
mov %ax, %bx    #0xFFFF
shl $16, %eax   #0xFFFF0000
mov %bx, %ax    #0xFFFFFFFF

Another way would be:

movzx %al, %eax
imul $0x1010101, %eax

The last one is possibly faster on modern architectures.

edited Mar 20 '12 at 00:42

answered Mar 20 '12 at 00:21

Gunther Piez

28,058
6
62
101

great answer, that was exactly what I was looking for! Thank you very much – juliensaad Mar 20 '12 at 00:43
Why not `movsx %al, %eax` - after all, `0xff`, treated as signed, will extend to `0xffffffff` directly. No need for the "replicate bytes by multiplication" trick. – FrankH. Mar 20 '12 at 17:30
Agreed though that of course this only works for the special case of the byte being `0xff`. – FrankH. Mar 20 '12 at 17:31
1

@FrankH. As assembly programmer, you are entitled to do code obfuscation at your will. So I take advantage of that and insert a multiplication for an innocent move operation where it comes unexpected and hurts the reader the most. – Gunther Piez Mar 20 '12 at 18:46
The multiply way is *much* faster on Intel P6-family (partial register stalls), and still likely better on modern x86. Certainly for throughput, especially given the gratuitous partial-register write in `mov %ax, %bx` instead of `mov %eax, %ebx`. (Writing AX at the end is needed to merge into EAX, or with BMI2 and fast SHLD you could replace the last 3 insns with `rorx $16, %eax, %edx` + `shld $16, %ebx, %eax`. But then you'd still have [an AH merge on Intel](https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to) – Peter Cordes Feb 16 '21 at 11:58
See also [Why doesn't GCC use partial registers?](https://stackoverflow.com/q/41573502). (And note the multiply version could save 1 cycle of latency on Intel SnB-family by using a different temporary, allowing mov-elimination to work: `movzbl %al, %ecx` / `imul $0x1010101, %ecx, %eax`) – Peter Cordes Feb 16 '21 at 12:00

score 1 · Answer 2 · answered Mar 04 '13 at 18:57

1

Another solution using one register (EAX) only:

            (reg=dcba)
mov ah, al  (reg=dcaa)
ror eax, 8  (reg=adca)
mov ah, al  (reg=adaa)
ror eax, 8  (reg=aada)
mov ah, al  (reg=aaaa)

This is a bit slower than the above solution, though.

answered Mar 04 '13 at 18:57

Luis Paris

21
2

score 1 · Answer 3 · answered Mar 20 '12 at 00:26

1

I am used to NASM assembly syntax, but this should be fairly simple.

; this is a comment btw
mov eax, 0x045893FF ; mov to, from

mov ah, al
mov bx, ax
shl eax, 16
mov ax, bx

; eax = 0xFFFFFFFF

answered Mar 20 '12 at 00:26

Kendall Frey

39,334
18
104
142

repeat / broadcast a byte to every position of an integer register

3 Answers3