4

What exactly is the difference between mov and lea when I use them to get an address?

Let's say if I have a program printing out a character string starting from its 5th character whose code is shown below:

section .text
    global _start
_start:
    mov edx, 0x06  ;the length of msg from its 5th char to the last is 6.
    lea ecx, [msg + 4]
    mov ebx, 1
    mov eax, 4
    int 0x80

section .data
msg db '1234567890'

Then, if I swap lea ecx, [msg + 4] for mov ecx, msg + 4, would it run differently?

I tried both and the outputs appeared to be the same. However, I read from this link, What's the purpose of the LEA instruction?, in the comment section of this first answer, it seemed that someone claimed that something like mov ecx, msg + 4 was invalid, but I failed to see it. Can someone help me to understand this? Thanks in advance!

Michael Petch
  • 42,023
  • 8
  • 87
  • 158
glenjoker
  • 81
  • 6
  • "`mov ecx, msg + 4` was invalid, but I failed to see it" - Have you even tried it? `mov ecx, msg + 4` is not a valid instruction. – Mysticial Feb 18 '16 at 07:35
  • Thanks for your reply, and yes, I tried it. As I mentioned in my question, I tried both, and they gave me identical outputs. – glenjoker Feb 18 '16 at 07:39
  • 2
    @Mysticial: It's certainly valid NASM syntax. – Michael Feb 18 '16 at 07:41
  • @Michael I guess NASM does constant propagation? – Mysticial Feb 18 '16 at 07:42
  • @ Michael: So I can use them interchangeably? – glenjoker Feb 18 '16 at 07:43
  • @Mysticial: `mov ecx, msg + 4` just loads the address of `msg`, plus 4, into `ecx`. I don't see why that's any more special than being able to do `lea ecx, [msg + 4]`. – Michael Feb 18 '16 at 07:44
  • 1
    @glenjoker: The encoding for `mov` is shorter, but `lea` is more flexible because it lets you do things like `lea ecx, [msg + eax*4]`. – Michael Feb 18 '16 at 07:45
  • @Michael Oh ic, so the assembler treats it as memory operand. As someone who only reads assembly output by compilers, I wouldn't have known that since they always inline the address with the `[]`. – Mysticial Feb 18 '16 at 07:48
  • 1
    @Mysticial: no, you're thinking of MASM syntax, where it *is* invalid. In NASM syntax, it's an immediate constant. (MASM: `mov ecx, OFFSET msg + 4` or something). Linker relocations allow symbol+displacement, so `symbol + 4` is still a link-time constant. The exact same thing happens to generate the 4B displacement for `LEA`'s effective address. – Peter Cordes Feb 18 '16 at 07:50
  • @Michael: Okay, that makes sense. Thank you for your clarification. – glenjoker Feb 18 '16 at 07:50
  • @PeterCordes Ahh... Good to know. Thanks! – Mysticial Feb 18 '16 at 07:52
  • 1
    @PeterCordes: Um, wouldn't `mov ecx, msg+4` in MASM syntax mean the same thing as `mov ecx, [msg+4]`? I haven't used MASM/TASM in a while, but that's the way I recall it. – Michael Feb 18 '16 at 07:53
  • @Michael: That's probably not valid. I think you'd need `mov ecx, msg[4]`, or `[msg + 4]`. MASM allows the square brackets, so IMO you should *always* use them when referencing memory for readability, NASM style. I haven't ever used MASM. I only bothered learning anything about it to read/answer SO questions. I'm not a MASM programmer, I just play one on the Internet. – Peter Cordes Feb 18 '16 at 07:54
  • 1
    `mov reg, label+imm` assembles for me with MASM 6.14 anyway. I didn't bother to look what it assembled _to_. – Michael Feb 18 '16 at 07:58

1 Answers1

8

When the absolute address is a link-time constant, mov r32, imm32 and lea r32, [addr] will both get the job done. The imm32 can be any valid NASM expression. In this case msg + 4 is a link-time constant. The linker will find the final address of msg, add 4 to it (because the placeholder in the .o had the +4 as the displacement). That final value replaces the 4B placeholder when copying the bytes from the .o to the linker output.

Exactly the same thing happens to the 4B displacement in lea's effective address.


mov has a slightly shorter encoding, and can run on more execution ports. Use mov reg, imm unless you can take advantage of lea to do some useful math with registers at the same time. (for example: lea ecx, [msg + 4 + eax*4 + edx])

In 64-bit mode, where RIP-relative addressing is possible, using LEA lets you make efficient position-independent code (that doesn't need to be modified if mapped to a different virtual address). There's no way to achieve this functionality with mov. See How to load address of function or label into register in GNU Assembler (also covers NASM) and Referencing the contents of a memory location. (x86 addressing modes)

Also see the tag wiki for many good links.


Also note that you can use a symbolic constant for the size. You can also format and comment your code better. (indenting the operands looks less messy in code that has some instructions with longer mnemonics).

section .text
    global _start
_start:
    mov    edx, msgsize - 4
    mov    ecx, msg + 4     ; In MASM syntax, this would be mov ecx, OFFSET msg + 4
    mov    ebx, 1       ; stdout
    mov    eax, 4       ; NR_write
    int    0x80         ; write(1, msg+4, msgsize-4)

    mov    eax, 1       ; NR_exit
    xor    ecx, ecx
    int    0x80         ; exit(0)
    ;; otherwise execution falls through into non-code and segfaults

section .rodata
msg db '1234567890'     ; note, not null-terminated, and no newline
msgsize equ $-msg       ; current position - start of message
Peter Cordes
  • 245,674
  • 35
  • 423
  • 606