0

I have a question about how to move values around in the x86 Assembly eax register. I know that the 32-bit register breaks down into smaller component registers with the lower 16-bits being ax and that 16- bits breaks down even further into the 8-bit registers ah and al.

I'm currently writing a program for an x86 Assembly Language assignment that wants me to move four 8-bit hex values around in the register using only the mov, add, and sub commands. The program starts by having you shift the values of the variables around by adding and subtracting them, and that's no problem.

The second part (phase2) is to put each of the values into each of the eax 8-bit positions. But, I know you can only access the lower two 8-bit positions ("ah" and "al".) I need to somehow move ah and al together into that leading 16-bits of eax, pushing the values added to ah and al left two-byte positions? (question mark, because I do not know.) I am fairly certain that I can then just add the correct values back to ah and al to finish the solution.

I believe the way to do this is to add 'some hex value' to ah and have that overflow left, but I can't seem to wrap my head around the logic of it. "Logically," I want to say this seems like the best course of action, but I'm not sure how to implement it. And, since I can't wrap my head around it, I can't find the hidden algorithm I'm supposed to find. Phase2 is only supposed to be aprx 21 lines so I know it is not a massive column of add instructions.

Any direction on how to think about this would be highly appreciated. Thanks to whomever.

.386
.model flat,stdcall
.stack 4096
ExitProcess proto,dwExitCode:dword

.data
    var1 BYTE 'A'
    var2 BYTE 'B'
    var3 BYTE 'C'
    var4 BYTE 'D'
    
.code
main proc
;phase1
mov al, var1; store 'A'
mov ah, var4; store 'D'
mov var1, ah; move 'D' to var1
sub ah, 1; make ah 'C'
mov var4, ah; move 'C' to var4
sub ah, 1; make ah 'B'
mov var3, ah; move 'B' to var3
mov var2, al; 'mov al to var2 

    ;var1 BYTE 'D'
    ;var2 BYTE 'A'
    ;var3 BYTE 'B'
    ;var4 BYTE 'C'


;phase2
mov ah, var1; store 'D'
mov al, var2; store 'A'

; this is where I want to shift al and ah left two bytes 
; once the first two bytes of eax equal 'DA' move 'B' 'C' 
; into ah and al

mov ah, var3; store 'B'
mov al, var4; store 'C'

;eax should read 'DABC' = 44414243
    
    invoke ExitProcess,0
main endp
end main
  • Try shifting bits. – Erik Eidt Mar 14 '21 at 23:24
  • Thanks for the quick reply Erik. If you mean the shift instructions shr and shl, the assignment restricts us to the mov add and sub commands. The shift command is what all the searches tell me to do, but I cannot use it. It's a logic hex adding problem. – DeweyWoodz Mar 14 '21 at 23:43
  • A shift left is multiplication by two, which can be achieved by an addition. So 16 additions are a boring but working way to shift by 16 bits. Alternatively you can rearrange the data in memory. – Jester Mar 14 '21 at 23:48
  • Thanks, Jester I'll try that. But, would it just be a repeated ``` add ah, al ``` kind of thing? or do you mean like ``` add ax, '0FFh' ``` sixteen times? -edit- Or; once ah and al are in place, then move that to bx then add bx to ax 16 times...? – DeweyWoodz Mar 15 '21 at 00:02

1 Answers1

1

If you can't use shl eax, 16 like a normal person, your other options include the following:

  • add eax,eax repeated 16 times (yuck, slow), in a loop partially unrolled, or fully unrolled.
  • store / reload at an offset: also slow, but only for latency (store-forwarding stall). Throughput can be ok, while latency is pretty close to the same 16 cycles as the 16x add way on a typical modern x86.
    sub  esp, 16             ; reserve some stack space.

    ...
    mov  [esp+2], ax         ; 2 byte store
    mov  eax, [esp]          ; 4-byte reload with previous AX in the top half
    
    mov  ah, ...             ; overwrite whatever garbage in the low 2 bytes
    mov  al, ...

x86 is little-endian, so load/store of EAX to addr loads/stores AL to that same addr, and AH to addr+1., with the high 2 bytes coming from addr+2 and +3.

Reading EAX after writing AH and AL will also force the CPU to merge partial registers if it renamed AH (and maybe AL) separately from the full EAX, but clearly if you're restricting yourself to only a tiny subset of the ISA then high performance isn't your top goal. (See Why doesn't GCC use partial registers? and How exactly do partial registers on Haswell/Skylake perform? Writing AL seems to have a false dependency on RAX, and AH is inconsistent for more details.)

For the store-forwarding stall part, see Can modern x86 implementations store-forward from more than one prior store?


Depending how much you're doing with the new low part (the new AH and AL), you might actually do them in a separate register (like DH and DL), so out-of-order exec can get started on that work, without a false dependency on the store-forwarding reload, especially on CPUs that don't rename AL (or even AH) separately from EAX. (i.e. CPUs that aren't Intel P6 family, like crusty old Nehalem).

So you'd do

    mov  [esp+2], ax         ; 2 byte store
    mov  eax, [esp]          ; 4-byte reload with previous AX in the top half
    
    mov  dl, ...
    mov  dh, ...
    ... more computation with these two

    mov  ax, dx              ; replace low 2 bytes of EAX

mov ax,dx might need to wait for the old EAX value to be "ready", i.e. for the reload to complete, so it can merge into it as part of running that instruction. (On Intel Sandybridge-family, and on all non-Intel CPUs.) So this lets the computations on DL/DH overlap with the store-forwarding latency.

Just to be clear, all this discussion about tradeoffs is about performance, not correctness; all ways I've shown here are fully correct. (unless I made a mistake :P)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • Thank you, Peter. You are correct, that I can't use shift like a normal person. I think that since this is my first introduction to Assembly, the idea is to force me to use the baby step till I understand them fully. This answer was perfect, btw. Sixteen of those ``` add eax, eax ``` got the job done. And, I understand how. Thanks! – DeweyWoodz Mar 15 '21 at 02:32
  • @DeweyWoodz: I mentioned `shl` for the benefit of future readers who find the title but don't have your restrictions. Yeah, I wouldn't be surprised if the point of the exercise was to make sure you understood x86's funky partial registers and/or endianness. IDK if they expect a memory store/reload or an ADD look, but the store/reload is more efficient. You could also store the other 2 bytes to build a complete "image" in memory of the dword value you want, then load that. (Then you "just" have one store-forwarding stall, no partial-register merging). – Peter Cordes Mar 15 '21 at 02:41