add 16 bits to 64 bits register

Question

Here is what i want to do:

   add     rsi, word [rsi+16]

I want to read the unsigned short value which is at rsi+16 adress. And i want to add this value to rsi.

Here is the error i get in nasm:

s2.asm:62: error: mismatch in operand sizes

This is strange. Why nasm and my cpu are not able to add 16 bits to 64 bits register ?

Here is what i do which works:

   mov     rbx,0
   mov     bx, word [rsi+16]
   add     rsi, rbx

This is strange is there a best way to do that ?

Thanks

instruction operands must have the same size, except sign and zero extend move instructions. And [don't use `mov reg, 0`. Always use `xor reg, reg`](https://stackoverflow.com/q/33666617/995714) — phuclv, Mar 07 '18 at 13:22
As such the solution is to either extend the operand or use a 16 bit addition and handle carry. — Jester, Mar 07 '18 at 13:24
[Cannot move 8 bit address to 16 bit register](https://stackoverflow.com/q/33959446/995714), [Error “operands do not match: 16 bit and 8 bit register”](https://stackoverflow.com/q/24274265/995714) — phuclv, Mar 07 '18 at 13:29
`movzx ebx, word [rsi+16]` will zero-extend word into `rbx` (write into `ebx` in x86_64 will automatically clear upper 32 bits of `rbx`, but the instruction encoding is 1 byte shorter than `movzx rbx, word [rsi+16]`). Then `add rsi,rbx` ... i.e. your workaround is correct in principle, and it's not strange, use combinatorics knowledge to imagine the explosion of machine instruction encodings to have all possible combinations of operands. The conversion between types are so rare in the code (if written by somebody understanding this limitation), that paying extra ins. for conversion is best. — Ped7g, Mar 07 '18 at 13:35
@LưuVĩnhPhúc I don't agree to the “always.” There are valid reasons to use `mov reg,0`, the one I know being that you want to preserve flags. For more discussion, read [this bug report](https://github.com/golang/go/issues/22325). — fuz, Mar 07 '18 at 14:22
@TigerTV.ru that will have incorrect result. (the OP asked specifically for 64b add result) I.e. `1 + 0xFFFF` would result into zero, instead of `0x10000`. — Ped7g, Mar 07 '18 at 14:35
@TigerTV.ru sure.. how? `mov ebx,0` `setc bl` `shl ebx,16` `add rsi,rbx` (this is just result fixing code after `add si, word [rsi+16]`)? Would work, but performance wise this is much worse than two instructions `mozx + add`, which also read much better displaying the original intent in quite straightforward way. — Ped7g, Mar 07 '18 at 14:38
@Ped7g: so, it will work only if the exceptional situation doesn't happen. — TigerTV.ru, Mar 07 '18 at 14:52
@TigerTV.ru yes, if you know you are adding "fake" 64b value, i.e. the result of `add` will never set CF, you can use `add si,[rsi+16]`, but you will still pay performance penalty on some architectures, when you will use full `rsi` which is partially updated by `si` only, so performance wise it is still better to use some spare register to extend the word value to 64b first, and then add two 64b registers (if you will use `rsi`). Or if you need only word values, then use only `si`. — Ped7g, Mar 07 '18 at 15:27
@fuz: the takeaway from that bug report is that code-gen needs to do the `xor`-zeroing ahead of the flag setting. IDK why this wasn't obvious to whoever created that buggy design in the first place; `xor`-zero / `test` / `setcc` works, but `test` / `xor`-zero / `setcc` doesn't, as described in the bottom of the answer Lu'u linked to when he said "always". The only time you'd want to use `test` / `setcc` / `movzx eax,al` or use `mov eax,0` after flag-setting is when register pressure leaves you with no spare registers until after the flag-setting instruction. And `movq` is silly vs. `movl`. — Peter Cordes, Mar 08 '18 at 05:04
Anyway, `mov eax,0` is sometimes the least-bad option, but it's not great. (And has huge downsides before partial-register stuff on P6-family, like Nehalem: https://stackoverflow.com/questions/41573502/why-doesnt-gcc-use-partial-registers). — Peter Cordes, Mar 08 '18 at 05:06
@PeterCordes : given this is shell code question `mov eax, 0` is undesirable for the 0 in the byte stream (unless of couse you have a decoder that can deal with it) — Michael Petch, Mar 08 '18 at 05:13
@PeterCordes that bug looks to me (didn't bother to study it deeply enough, as I'm not interested into "go" language), like the optimizer does replace `mov reg,0` in some kind of JIT way, i.e. after the original code was already compiled and optimized, so the optimizer has no idea whether flags at that point have to be preserved or not, which did introduce all kind of weird bugs (no sh*t, Sherlock, really?). :) ... pretty irrelevant to C/C++/asm people, who pay attention to the machine code at build time, and don't bother with JIT or post-link code patching later. — Ped7g, Mar 09 '18 at 12:38

TigerTV.ru · Answer 1 · 2018-03-08T19:10:09.640

Instruction operands must have the same size, except sign and zero extend move instructions.

In your case, you can add 16 bits to 64 bits register rsi in one instruction only this way:

add si, word [rsi+16]

translated into:

\x66\x03\x76\x10

Because si register(size a word) is a low part of rsi register, you can add to si without disturbing the upper bytes of rsi.

But it will work the same as a 64-bit add only if the 16-bit add result doesn't overflow. For example:

Let's say we have esi=0x0000FFFF, and we add 1 to si. We've got esi=0x00000000. And CF will be set, because of carry-out from the 16-bit add.

If you do need carry to propagate to the rest of RSI, zero-extend into any other register.

movzx  rax, word ptr [rsi+16]
add    rsi, rax

translated into:

\x48\x0F\xB7\x46\x10
\x48\x01\xC6

Also Ped7g noted:

but you will still pay performance penalty on some architectures, when you will use full rsi which is partially updated by si only, so performance wise it is still better to use some spare register to extend the word value to 64b first, and then add two 64b registers (if you will use rsi).

See also Why doesn't GCC use partial registers? for possible performance issues from writing SI and then reading RSI on P6-family CPUs, although that's not relevant for shellcode exploit payloads.

I think the general moral of this story is "don't use different types for values which are expected to interoperate". There are valid reasons sometimes to break this rule, i.e. having large amount of values which are only of 8/16/32b range, but the processing itself must be done in 64b range (or even more), then just pay the price of conversion, it's not that big, especially if it is put at reasonable place of algorithm, i.e. loading the compressed value from memory directly into 64b register, etc... But whenever possible use the same type for all values involved in the particular calculation. — Ped7g, Mar 09 '18 at 12:45

add 16 bits to 64 bits register

1 Answers1

Linked

Related