3

I am using NASM assembler on linux 64 bit. There is something with variables and registers I can't understand. I create a variable named "msg":

 msg db "hello, world"  

Now when I want to write to the stdout I move the msg to rsi register, however I don't understand the mov instruction bitwise ... the rsi register consists of 64 bit , while the msg variable has 12 symbols which is 8 bits each , which means the msg variable has a size of 12 * 8 bits , which is greater than 64 bits obviously.

So how is this even possible to make an instruction like:
mov rsi, msg , without overflowing the memory allocated for rsi.

Or does the rsi register contain the memory location of the first symbol of the string and after writing 1 symbol it changes to the memory location of the next symbol?

Sorry if I wrote complete nonsense, I'm new to assembly and i just can't get the grasp of it for a while.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • 1
    It's the address of the message, just like how in C you'd use a pointer to the string. – harold Nov 05 '17 at 15:19
  • Thanks for the fast response . Studied hight level languages other than C so I'm not familiarised with pointers. But I understand now . – korsunskyroma Nov 05 '17 at 15:24

2 Answers2

7

In NASM syntax (unlike MASM syntax) mov rsi, symbol puts the address of the symbol into RSI. (Inefficiently with a 64-bit absolute immediate; use a RIP-relative LEA or mov esi, symbol instead. How to load address of function or label into register in GNU Assembler)

mov rsi, [symbol] would load 8 bytes starting at symbol. It's up to you to choose a useful place to load 8 bytes from when you write an instruction like that.

mov   rsi,  msg           ; rsi  = address of msg.  Use lea rsi, [rel msg] instead
movzx eax, byte [rsi+1]   ; rax  = 'e' (upper 7 bytes zeroed)
mov   edx, [msg+6]        ; rdx  = ' wor' (upper 4 bytes zeroed)

Note that you can use mov esi, msg because symbol addresses always fit in 32 bits (in the default "small" code model, where all static code/data goes in the low 2GB of virtual address space). NASM makes this optimization for you with assemble-time constants (like mov rax, 1), but probably it can't with link-time constants. Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?

and after writing 1 symbol it changes to the memory location of the next symbol?

No, if you want that you have to inc rsi. There is no magic. Pointers are just integers that you manipulate like any other integers, and strings are just bytes in memory.

Accessing registers doesn't magically modify them.

There are instructions like lodsb and pop that load from memory and increment a pointer (rsi or rsp respectively), but x86 doesn't have any pre/post-increment/decrement addressing modes, so you can't get that behaviour with mov even if you want it. Use add/sub or inc/dec.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
0

Disclaimer: I'm not familiar with the flavor of assembly that you're dealing with, so the following is more general. The particular flavor may have more features than what I'm used to. In general, assembly deals with single byte/word entities where the size depends on the processor. I've done quite a bit of work on 8 and 16-bit processors, so that is where my answer is coming from.

General statements about Assembly: Assembly is just like a high level language, except you have to handle a lot more of the details. So if you're used to some operation in say C, you can start there and then break the operation down even further.

For instance, if you have declared two variables that you want to add, that's pretty easy in C:

x = a + b;

In assembly, you have to break that down further:

mov R1, a  * get value from a into register R1
mov R2, b  * get value from b into register R2
add R1,R2  * perform the addition (typically goes into a particular location I'll call it the accumulator
mov x, acc * store the result of the addition from the accumulator into x

Depending on the flavor of assembly and the processor, you may be able to directly refer to variables in the addition instruction, but like I said I would have to look at the specific flavor you're working with.

Comments on your specific question: If you have a string of characters, then you would have to move each one individually using a loop of some sort. I would set up a register to contain the starting address of your string, and then increment that register after each character is moved. It acts like a pointer in C. You will need to have some sort of indication for the termination of the string or another value that tells the size of the string, so you know when to stop.

rallen911
  • 106
  • 7