740

For me, it just seems like a funky MOV. What's its purpose and when should I use it?

Michael Petch
  • 42,023
  • 8
  • 87
  • 158
user200557
  • 7,559
  • 3
  • 16
  • 7
  • 5
    See also [Using LEA on values that aren't addresses / pointers?](//stackoverflow.com/a/46597375): LEA is just a shift-and-add instruction. It was probably added to 8086 because the hardware is already there to decode and calculate addressing modes, not because it's "intended" only for use with addresses. Remember that pointers are just integers in assembly. – Peter Cordes Apr 17 '18 at 05:10

16 Answers16

860

As others have pointed out, LEA (load effective address) is often used as a "trick" to do certain computations, but that's not its primary purpose. The x86 instruction set was designed to support high-level languages like Pascal and C, where arrays—especially arrays of ints or small structs—are common. Consider, for example, a struct representing (x, y) coordinates:

struct Point
{
     int xcoord;
     int ycoord;
};

Now imagine a statement like:

int y = points[i].ycoord;

where points[] is an array of Point. Assuming the base of the array is already in EBX, and variable i is in EAX, and xcoord and ycoord are each 32 bits (so ycoord is at offset 4 bytes in the struct), this statement can be compiled to:

MOV EDX, [EBX + 8*EAX + 4]    ; right side is "effective address"

which will land y in EDX. The scale factor of 8 is because each Point is 8 bytes in size. Now consider the same expression used with the "address of" operator &:

int *p = &points[i].ycoord;

In this case, you don't want the value of ycoord, but its address. That's where LEA (load effective address) comes in. Instead of a MOV, the compiler can generate

LEA ESI, [EBX + 8*EAX + 4]

which will load the address in ESI.

I. J. Kennedy
  • 21,946
  • 16
  • 59
  • 87
  • 127
    Wouldn't it have been cleaner to extend the `mov` instruction and leave off the brackets? `MOV EDX, EBX + 8*EAX + 4` – Natan Yellin Aug 15 '11 at 12:43
  • 15
    @imacake By replacing LEA with a specialized MOV you keep the syntax clean: [] brackets are always the equivalent of dereferencing a pointer in C. Without brackets, you always deal with the pointer itself. – Natan Yellin Nov 04 '11 at 13:54
  • 1
    @Natan Hey ASM is never trivial anyway! :D – imacake Nov 04 '11 at 18:40
  • 148
    Doing math in a MOV instruction (EBX+8*EAX+4) isn't valid. LEA ESI, [EBX + 8*EAX + 4] is valid because this is an addressing mode that x86 supports. http://en.wikipedia.org/wiki/X86#Addressing_modes – Erik Jan 07 '12 at 06:07
  • So `LEA` is EXACTLY the same thing as `MOV` except it can copy the value into indexing registers (and can do the addressing calculation)? – Jonathan Dickinson Oct 23 '12 at 07:12
  • 32
    @JonathanDickinson LEA is like a `MOV` with an indirect source, except it only does the indirection and not the `MOV`. It doesn't actually *read from* the computed address, just computes it. – hobbs Aug 28 '13 at 02:57
  • 26
    Erik, tour comment is not accurate. MOV eax, [ebx+8*ecx+4] is valid. However MOV returns the contents of thst memory location whereas LEA returns the address – Olorin Apr 23 '15 at 15:40
  • And there is even more general segment version of form `ds:(bx)` that specifies the segment: http://stackoverflow.com/questions/18736663/what-does-the-colon-mean-in-x86-assembly-gas-syntax-as-in-dsbx – Ciro Santilli新疆棉花TRUMP BAN BAD May 10 '15 at 10:06
  • @CiroSantilli巴拿馬文件六四事件法轮功 but that doesn't affect result of `lea`. – Ruslan May 06 '16 at 11:20
  • 11
    I still don't understand why the brackets are required. `LEA` doesn't do anything with whatever is stored at EBX + 8*EAX + 4; it just "loads the address" of whatever is stored there.. which is like doing `ptr2 = &(*ptr1)`. Too convoluted for what's basically just another arithmetic operation. – Martin Oct 03 '16 at 18:59
  • 6
    @Martin The syntax used in LEA has to use the parentheses, because LEA is built on addressing mode syntax. given `MOV dest, whatever`, we can replace `MOV` with `LEA` to make `LEA dest, whatever`, without changing the syntax of `whatever`. This is important because `whatever` has a syntax which denotes some kind of addressing mode. We want the address which pops out of that addressing mode, rather than the addressed object. **Not all addressing modes have brackets!** For instance, `MOV EAX, globalvar` goes to `LEA EAX, globalvar`. – Kaz Apr 26 '17 at 18:53
  • In that case, I'd argue that `LEA` should have its own, separate syntax since the brackets don't add any real value. – Martin Apr 26 '17 at 19:03
  • 4
    @Martin The missing piece of the puzzle is that LEA applies to any addressing mode, including ones that don't have a syntax with brackets. It's orthogonal to that syntax and I believe it's that way at the instruction encoding level also: the "mod/rm" bits in LEA are like those in MOV and other instructions. We could invent a syntax like this "MOV dest, &source` where the `&` converts it to a `LEA` (and is available only in `MOV`, not in `ADD` or whatever). A half page of Unix scripting can provide this syntax as a preprocessor. :) – Kaz Apr 26 '17 at 19:46
  • Shouldnt the syntax of mov is `mov src dest`? – Emma He Jul 12 '17 at 00:29
  • @EmmaHe It depends on which assembler you're using. If the assembler uses the Intel syntax (for example MASM, NASM, FASM) the equivalent is `mov dest src`, but if the assembler follows the AT&T syntax (such as GAS) it would be `mov src dest`. – arthropod Apr 12 '18 at 01:26
  • 1
    Changing the **assembly syntax** to `MOV EDX, EBX + 8*EAX + 4` (without the brackets) would not change a thing. This would just be a different spelling for exactly the same opcode and operands. A conventional disassembler would look at the instruction that was assembled from this syntax and print it as `LEA [EDX, EBX + 8*EAX + 4]`. However, the syntax `MOV EDX, EBX + 8*EAX + 4` looks silly because it's doing arithmetic but is being called "move". `LEA` is good the way it is: perform just the address calculation of a MOV and then yield that as the result. – Kaz Jun 07 '19 at 00:35
  • from http://www.cs.virginia.edu/~evans/cs216/guides/x86.html: mov syntax mov , or mov ,; so EBX + 8*EAX + 4 is not a const but a mem. if it's a memory, it must use bracket and `mov` will move the data out instead of the address. – Izana May 22 '20 at 23:52
  • Wouldn't it have to be `mov edx, [ebx + 8*eax]` and then `mov edx, [edx + 4]` ? – Ed_ Jul 01 '20 at 06:22
590

From the "Zen of Assembly" by Abrash:

LEA, the only instruction that performs memory addressing calculations but doesn't actually address memory. LEA accepts a standard memory addressing operand, but does nothing more than store the calculated memory offset in the specified register, which may be any general purpose register.

What does that give us? Two things that ADD doesn't provide:

  1. the ability to perform addition with either two or three operands, and
  2. the ability to store the result in any register; not just one of the source operands.

And LEA does not alter the flags.

Examples

  • LEA EAX, [ EAX + EBX + 1234567 ] calculates EAX + EBX + 1234567 (that's three operands)
  • LEA EAX, [ EBX + ECX ] calculates EBX + ECX without overriding either with the result.
  • multiplication by constant (by two, three, five or nine), if you use it like LEA EAX, [ EBX + N * EBX ] (N can be 1,2,4,8).

Other usecase is handy in loops: the difference between LEA EAX, [ EAX + 1 ] and INC EAX is that the latter changes EFLAGS but the former does not; this preserves CMP state.

Community
  • 1
  • 1
Frank Krueger
  • 64,851
  • 44
  • 155
  • 203
  • 43
    @AbidRahmanK some examples: `LEA EAX, [ EAX + EBX + 1234567 ]` calculates the sum of `EAX`, `EBX` and `1234567` (that's three operands). `LEA EAX, [ EBX + ECX ]` calculates `EBX + ECX` _without_ overriding either with the result. The third thing `LEA` is used for (not listed by Frank) is _multiplication by constant_ (by two, three, five or nine), if you use it like `LEA EAX, [ EBX + N * EBX ]` (`N` can be 1,2,4,8). Other usecase is handy in loops: the difference between `LEA EAX, [ EAX + 1 ]` and `INC EAX` is that the latter changes `EFLAGS` but the former does not; this preserves `CMP` state – FrankH. Aug 22 '13 at 10:01
  • @FrankH. I still don't understand, so it loads a pointer onto somewhere else? –  Oct 27 '13 at 15:04
  • 6
    @ripDaddy69 yes, sort of - if by "load" you mean "performs the address calculation / pointer arithmetics". It does _not access memory_ (i.e. not "dereference" the pointer as it'd be called in C programming terms). – FrankH. Oct 29 '13 at 09:04
  • 2
    +1: This makes explicit what kinds of 'tricks' `LEA` can be used for... (see "LEA (load effective address) is often used as a "trick" to do certain computations" in IJ Kennedy's popular answer above) – Assad Ebrahim Mar 31 '14 at 00:45
  • 6
    There's a big difference between 2 operand LEA which is fast and 3 operand LEA which is slow. The Intel Optimization manual says fast path LEA is single cycle and slow path LEA takes three cycles. Moreover, on Skylake there are two fast path functional units (ports 1 and 5) and there's only one slow path functional unit (port 1). Assembly/Compiler Coding Rule 33 in the manual even warns against using 3 operand LEA. – Olsonist Apr 01 '19 at 15:46
  • @Olsonist Interesting note. – St.Antario Apr 28 '19 at 07:07
  • 1
    I felt a lack of exact numbers for this example, so here they are. Let's say EBX=5, ECX=3. Then after `LEA EAX, [EBX + ECX]` EAX will contain 8. And after `LEA EAX, [EBX + ECX + 2]` EAX will contain 10. – Pavel Sapehin Sep 19 '19 at 04:53
  • " (N can be 1,2,4,8)." As documented in Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 1 3.7.5 Specifying an Offset – Roland May 10 '20 at 16:19
119

Another important feature of the LEA instruction is that it does not alter the condition codes such as CF and ZF, while computing the address by arithmetic instructions like ADD or MUL does. This feature decreases the level of dependency among instructions and thus makes room for further optimization by the compiler or hardware scheduler.

user157251
  • 64,489
  • 38
  • 208
  • 350
Angus Lee
  • 1,296
  • 1
  • 8
  • 6
  • 2
    Yes, `lea` is sometimes useful for the compiler (or human coder) to do math without clobbering a flag result. But `lea` isn't faster than `add`. Most x86 instructions write flags. High-performance x86 implementations have to rename EFLAGS or otherwise avoid [the write-after-write hazard](https://en.wikipedia.org/wiki/Hazard_(computer_architecture)#Data_hazards) for normal code to run fast, so instructions that avoid flag writes aren't better because of that. (*partial* flag stuff can create issues, see [INC instruction vs ADD 1: Does it matter?](//stackoverflow.com/q/36510095)) – Peter Cordes Apr 16 '18 at 02:58
  • 2
    @PeterCordes : Hate to bring this up here but - am I alone in thinking this new [x86-lea] tag is redundant and unnecessary? – Michael Petch Apr 16 '18 at 03:43
  • 2
    @MichaelPetch: Yeah, I think it's too specific. It seems to confuse beginner who don't understand machine language and that everything (including pointers) are just bits / bytes / integers, so there are lots of questions about it with huge numbers of votes. But having a tag for it implies that there's room for an open-ended number of future questions, when in fact there are about 2 or 3 total that aren't just duplicates. (what is it? How to use it for multiplying integers? and how it runs internally on AGUs vs. ALUs and with what latency / throughput. And maybe it's "intended" purpose) – Peter Cordes Apr 16 '18 at 05:14
  • @PeterCordes : I agree, and if anything all these posts being edited are pretty much a duplicate of a few of the exiting LEA related questions. Rather than a tag, any duplicates should be identified and marked imho. – Michael Petch Apr 16 '18 at 05:16
  • 1
    @EvanCarroll: hang on tagging all the LEA questions, if you haven't already finished. As discussed above, we think [tag:x86-lea] too specific for a tag, and there's not a lot of scope for future non-duplicate questions. I think it would be a lot of work to *actually* choose a "best" Q&A as a dup target for most of them, though, or to actually decide which ones to get mods to merge. – Peter Cordes Apr 16 '18 at 05:19
99

Despite all the explanations, LEA is an arithmetic operation:

LEA Rt, [Rs1+a*Rs2+b] =>  Rt = Rs1 + a*Rs2 + b

It's just that its name is extremelly stupid for a shift+add operation. The reason for that was already explained in the top rated answers (i.e. it was designed to directly map high level memory references).

user157251
  • 64,489
  • 38
  • 208
  • 350
hdante
  • 6,100
  • 1
  • 27
  • 34
  • 8
    And that the arithmetic is performed by the address-calculation hardware. – Ben Voigt Jul 12 '13 at 17:37
  • 31
    @BenVoigt I used to say that, because I'm an old bloke :-) Traditionally, x86 CPUs did use the addressing units for this, agreed. But the "separation" has become very blurry these days. Some CPUs no longer have _dedicated_ AGUs at all, others have chosen not to execute `LEA` on the AGUs but on the ordinary integer ALUs. One has to read the CPU specs very closely these days to find out "where stuff runs" ... – FrankH. Aug 22 '13 at 10:06
  • 2
    @FrankH.: out-of-order CPUs typically run LEA on ALUs, while some in-order CPUs (like Atom) sometimes run it on an AGUs (because they can't be busy handling a memory access). – Peter Cordes Dec 03 '15 at 17:03
  • 4
    No, the name is not stupid. `LEA` gives you the address which arises from any memory-related addressing mode. It is not a shift and add operation. – Kaz Apr 26 '17 at 18:54
  • 3
    FWIW there are very few (if any) current x86 CPUs that perform the operation on the AGU. Most or all just use an ALU like any other arithmetic op. – BeeOnRope Apr 26 '17 at 22:06
  • 1
    it's very simple to understand the problem if you consider that an instruction set architecture is created to isolate the architecture details from the upper layers. Embedding the fact that LEA is executed in an AGU is clearly irrelevant and to the ISA and produces a wrong interpretation of the architecture and doubts about the instruction itself. – hdante Apr 27 '17 at 14:13
  • 1
    @Kaz: The only time `lea` isn't a shift-and-add instruction is x86-64 RIP-relative addressing modes. You can also use it as a move (immediate or register), but that's just a degenerate case of shifting by zero or adding nothing. I agree that the name is sensible, but it really is just a shift-and-add instruction that takes advantage of the memory-operand machine-code encoding. (Or it was when it was named, before x86-64 existed.) All x86 addressing modes (except RIP relative) are subsets of `[base + index*scale + displacement]`: i.e. shift and add. – Peter Cordes Jul 27 '17 at 03:49
  • @PeterCordes I suspect that in 386, at least, and its predecessors, it used the same logic as address mode calculation, subsequently retaining the address as the output, rather than doing a memory cycle to load or store through the calculated address. Most of those calculations are shift-and-add business, sure. – Kaz Jul 27 '17 at 03:55
  • 2
    @Kaz: Yeah, it's a useful instruction and was cheap to implement in early CPUs by exposing the address-generation hardware. Some even still work that way, like K8/K10 for complex-LEA. (As [Andy Glew explains](https://stackoverflow.com/questions/791798/what-is-the-eu-in-x86-architecture-calculates-effective-address), some CPUs have ALUs that can only handle 2 inputs, so some LEAs have to run on an AGU). But *the reason it's useful is that it's a shift+add instruction, and that it doesn't destroy any of its input registers*. The execution details only matter when tuning for specific uarches. – Peter Cordes Jul 27 '17 at 04:12
  • My main point is that you should think of it as a general-purpose shift+add instruction so you aren't confused by seeing it used for non-pointers, and so you can spot use-cases where it saves instructions. It's fast in all CPUs, but with weird latency quirks in in-order Atom where it does run on the AGUs. (Inputs need to be ready sooner than for ALU uops). (And yes, [RossRidge says 8086 ran it on the AGU](https://stackoverflow.com/a/29203242/224132), although I seem to recall someone else saying that 8086 didn't have dedicated AGUs, and did address calcs in the ALU... Anyway, same diff) – Peter Cordes Jul 27 '17 at 04:16
  • @PeterCordes It's not just that LEA is likely implemented with the same addressing logic, but more importantly, the programmer view **through the assembly language** is that the same operand **syntax** is used that is used for expressing addressing modes. – Kaz Jul 28 '17 at 03:28
  • in x86-16 you don't even have a shift and have a very limited number of register choices – phuclv Aug 08 '18 at 15:19
83

Maybe just another thing about LEA instruction. You can also use LEA for fast multiplying registers by 3, 5 or 9.

LEA EAX, [EAX * 2 + EAX]   ;EAX = EAX * 3
LEA EAX, [EAX * 4 + EAX]   ;EAX = EAX * 5
LEA EAX, [EAX * 8 + EAX]   ;EAX = EAX * 9
Koray Tugay
  • 20,438
  • 37
  • 155
  • 276
GJ.
  • 10,234
  • 2
  • 39
  • 58
  • 13
    +1 for the trick. But I would like to ask a question (may be stupid), why not directly multiply with three like this `LEA EAX, [EAX*3]` ? – Abid Rahman K Apr 06 '13 at 06:57
  • 14
    @Abid Rahman K: There is no such as instruction unde x86 CPU instruction set. – GJ. Apr 06 '13 at 17:23
  • 58
    @AbidRahmanK despite the intel asm syntax makes it look like a multiplication, the lea instruction can encode only shift operations. The opcode has 2 bits to describe the shift, hence you can multiply only by 1,2,4 or 8. – mkm Aug 05 '13 at 13:03
  • can you also use it for multiplying registers by 2,4 or 8? – Koray Tugay Jan 15 '15 at 19:42
  • 2
    @Koray Tugay: Yes you can, like: `lea eax, [eax * 8]` – GJ. Jan 15 '15 at 20:12
  • 1
    @GJ. Why did you give examples for 3 5 and 9 in your answer? – Koray Tugay Jan 15 '15 at 20:21
  • 6
    @Koray Tugay: You can use shift left like `shl`instruction for multiplying registers by 2,4,8,16... it is faster and shorter. But for multiplying with numbers different of power of 2 we normaly use `mul` instruction which is more pretentious and slower. – GJ. Jan 15 '15 at 20:45
  • 1
    maybe `lea` will be used when we need to preserve the flags, like in some cases that `lea` is used instead of `add – phuclv Feb 13 '15 at 17:29
  • 9
    @GJ. although there's no such encoding, some assemblers accept this as a shortcut, e.g. fasm. So e.g. `lea eax,[eax*3]` would translate to equivalent of `lea eax,[eax+eax*2]`. – Ruslan May 06 '16 at 11:24
  • 1
    @KorayTugay: It's sometimes worth using `lea edx, [eax*8]` to copy+shift, even though with no base register, the addressing mode needs a disp32 (4 bytes of zeros in this case). So that LEA takes 7 bytes to encode: opcode + ModR/M + SIB(indexed addressing mod) + disp32=0. mov + shl is actually smaller, but is 2 uops. With a base register, for `[eax + eax]`, `[eax + eax*2]`, or whatever, no displacement is needed. – Peter Cordes Mar 30 '18 at 00:51
  • 1
    @GJ.: normally you use `imul ecx, ebx, 12345`. You only use `mul` if you want the high-half result of a full multiply. It can be worth using 2 `lea` instructions to replace a multiply, but on modern CPUs not 3. Intel CPUs have 1c latency LEA with 1 or 2 components, including base + scaled-index, so you can do `lea eax, [eax+eax*4]` / `lea eax, [edx + eax*2]` as [part of a `tot = tot*10 + digit` loop for `atoi`, for example](https://stackoverflow.com/questions/19309749/nasm-assembly-convert-input-to-integer/49548057#49548057). That's 2 cycle latency vs. 4 for `imul eax,10` / `add eax,edx` – Peter Cordes Mar 30 '18 at 00:58
65

lea is an abbreviation of "load effective address". It loads the address of the location reference by the source operand to the destination operand. For instance, you could use it to:

lea ebx, [ebx+eax*8]

to move ebx pointer eax items further (in a 64-bit/element array) with a single instruction. Basically, you benefit from complex addressing modes supported by x86 architecture to manipulate pointers efficiently.

mmx
  • 390,062
  • 84
  • 829
  • 778
28

The biggest reason that you use LEA over a MOV is if you need to perform arithmetic on the registers that you are using to calculate the address. Effectively, you can perform what amounts to pointer arithmetic on several of the registers in combination effectively for "free."

What's really confusing about it is that you typically write an LEA just like a MOV but you aren't actually dereferencing the memory. In other words:

MOV EAX, [ESP+4]

This will move the content of what ESP+4 points to into EAX.

LEA EAX, [EBX*8]

This will move the effective address EBX * 8 into EAX, not what is found in that location. As you can see, also, it is possible to multiply by factors of two (scaling) while a MOV is limited to adding/subtracting.

David Hoelzer
  • 14,530
  • 4
  • 39
  • 61
  • Sorry everyone. @big.heart fooled me by giving an answer to this three hours ago, getting it to show up as "new" in my Assembly question scouring. – David Hoelzer May 06 '15 at 01:01
  • 2
    Why does the syntax use brackets when it does not do memory addressing? – golopot Mar 18 '17 at 22:28
  • 3
    @q4w56 This is one of those things where the answer is, "That's just how you do it." I believe it's one of the reasons that people have such a hard time figuring out what `LEA` does. – David Hoelzer Mar 19 '17 at 07:32
  • 3
    @q4w56: it's a shift+add instruction that uses memory operand syntax *and* machine-code encoding. On some CPUs it may even use the AGU hardware, but that's a historical detail. The still-relevant fact is that the decoder hardware already exists for decoding this kind of shift+add, and LEA lets us use it for arithmetic instead of memory addressing. (Or for address calculations if one input actually is a pointer). – Peter Cordes Sep 30 '17 at 00:44
23

The 8086 has a large family of instructions that accept a register operand and an effective address, perform some computations to compute the offset part of that effective address, and perform some operation involving the register and the memory referred to by the computed address. It was fairly simple to have one of the instructions in that family behave as above except for skipping that actual memory operation. Thus, the instructions:

mov ax,[bx+si+5]
lea ax,[bx+si+5]

were implemented almost identically internally. The difference is a skipped step. Both instructions work something like:

temp = fetched immediate operand (5)
temp += bx
temp += si
address_out = temp  (skipped for LEA)
trigger 16-bit read  (skipped for LEA)
temp = data_in  (skipped for LEA)
ax = temp

As for why Intel thought this instruction was worth including, I'm not exactly sure, but the fact that it was cheap to implement would have been a big factor. Another factor would have been the fact that Intel's assembler allowed symbols to be defined relative to the BP register. If fnord was defined as a BP-relative symbol (e.g. BP+8), one could say:

mov ax,fnord  ; Equivalent to "mov ax,[BP+8]"

If one wanted to use something like stosw to store data to a BP-relative address, being able to say

mov ax,0 ; Data to store
mov cx,16 ; Number of words
lea di,fnord
rep movs fnord  ; Address is ignored EXCEPT to note that it's an SS-relative word ptr

was more convenient than:

mov ax,0 ; Data to store
mov cx,16 ; Number of words
mov di,bp
add di,offset fnord (i.e. 8)
rep movs fnord  ; Address is ignored EXCEPT to note that it's an SS-relative word ptr

Note that forgetting the world "offset" would cause the contents of location [BP+8], rather than the value 8, to be added to DI. Oops.

1201ProgramAlarm
  • 30,320
  • 7
  • 40
  • 49
supercat
  • 69,493
  • 7
  • 143
  • 184
12

As the existing answers mentioned, LEA has the advantages of performing memory addressing arithmetic without accessing memory, saving the arithmetic result to a different register instead of the simple form of add instruction. The real underlying performance benefit is that modern processor has a separate LEA ALU unit and port for effective address generation (including LEA and other memory reference address), this means the arithmetic operation in LEA and other normal arithmetic operation in ALU could be done in parallel in one core.

Check this article of Haswell architecture for some details about LEA unit: http://www.realworldtech.com/haswell-cpu/4/

Another important point which is not mentioned in other answers is LEA REG, [MemoryAddress] instruction is PIC (position independent code) which encodes the PC relative address in this instruction to reference MemoryAddress. This is different from MOV REG, MemoryAddress which encodes relative virtual address and requires relocating/patching in modern operating systems (like ASLR is common feature). So LEA can be used to convert such non PIC to PIC.

Thomson
  • 18,073
  • 19
  • 75
  • 125
  • 3
    The "separate LEA ALU" part is mostly untrue. Modern CPUs execute `lea` on one or more of the same ALUs that execute other arithmetic instructions (but generally fewer of them than other arithmetic). For instance, the Haswell CPU mentioned can execute `add` or `sub` or most other basic arithmetic operations on _four different_ ALUs, but can only execute `lea` on one (complex `lea`) or two (simple `lea`). More importantly, those two `lea`-capable ALUs are simply two of the four that can execute other instructions, so there is no parallelism benefit as claimed. – BeeOnRope Apr 26 '17 at 22:04
  • 1
    The article you linked (correctly) shows that LEA is on the same port as an integer ALU (add/sub/boolean), and the integer MUL unit in Haswell. (And vector ALUs including FP ADD/MUL/FMA). The simple-only LEA unit is on port 5, which also runs ADD/SUB/whatever, and vector shuffles, and other stuff. The only reason I'm not downvoting is that you point out the use of RIP-relative LEA (for x86-64 only). – Peter Cordes Sep 30 '17 at 00:50
12

The LEA (Load Effective Address) instruction is a way of obtaining the address which arises from any of the Intel processor's memory addressing modes.

That is to say, if we have a data move like this:

MOV EAX, <MEM-OPERAND>

it moves the contents of the designated memory location into the target register.

If we replace the MOV by LEA, then the address of the memory location is calculated in exactly the same way by the <MEM-OPERAND> addressing expression. But instead of the contents of the memory location, we get the location itself into the destination.

LEA is not a specific arithmetic instruction; it is a way of intercepting the effective address arising from any one of the processor's memory addressing modes.

For instance, we can use LEA on just a simple direct address. No arithmetic is involved at all:

MOV EAX, GLOBALVAR   ; fetch the value of GLOBALVAR into EAX
LEA EAX, GLOBALVAR   ; fetch the address of GLOBALVAR into EAX.

This is valid; we can test it at the Linux prompt:

$ as
LEA 0, %eax
$ objdump -d a.out

a.out:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <.text>:
   0:   8d 04 25 00 00 00 00    lea    0x0,%eax

Here, there is no addition of a scaled value, and no offset. Zero is moved into EAX. We could do that using MOV with an immediate operand also.

This is the reason why people who think that the brackets in LEA are superfluous are severely mistaken; the brackets are not LEA syntax but are part of the addressing mode.

LEA is real at the hardware level. The generated instruction encodes the actual addressing mode and the processor carries it out to the point of calculating the address. Then it moves that address to the destination instead of generating a memory reference. (Since the address calculation of an addressing mode in any other instruction has no effect on CPU flags, LEA has no effect on CPU flags.)

Contrast with loading the value from address zero:

$ as
movl 0, %eax
$ objdump -d a.out | grep mov
   0:   8b 04 25 00 00 00 00    mov    0x0,%eax

It's a very similar encoding, see? Just the 8d of LEA has changed to 8b.

Of course, this LEA encoding is longer than moving an immediate zero into EAX:

$ as
movl $0, %eax
$ objdump -d a.out | grep mov
   0:   b8 00 00 00 00          mov    $0x0,%eax

There is no reason for LEA to exclude this possibility though just because there is a shorter alternative; it's just combining in an orthogonal way with the available addressing modes.

codeforester
  • 28,846
  • 11
  • 78
  • 104
Kaz
  • 48,579
  • 8
  • 85
  • 132
8

The LEA instruction can be used to avoid time consuming calculations of effective addresses by the CPU. If an address is used repeatedly it is more effective to store it in a register instead of calculating the effective address every time it is used.

red-E
  • 1,064
  • 9
  • 10
  • Not necessarily on modern x86. Most of the addressing modes have the same cost, with some caveats. So `[esi]` is rarely cheaper than say `[esi + 4200]` and is only rarely cheaper than `[esi + ecx*8 + 4200]`. – BeeOnRope Jun 26 '16 at 05:29
  • @BeeOnRope `[esi]` isn't cheaper than `[esi + ecx*8 + 4200]`. But why bother comparing? They are not equivalent. If you want the former to designate the same memory location as the latter, you need additional instructions: you have to add to `esi` the value of `ecx` multiplied by 8. Uh oh, multiplication is going to clobber your CPU flags! Then you have to add the 4200. These additional instructions add to the code size (taking up space in the instruction cache, cycles to fetch). – Kaz Apr 26 '17 at 19:28
  • 2
    @Kaz - I think you were missing my point (or else I missed the point of the OP). My understanding is that the OP is saying that if you are going to use something like `[esi + 4200]` repeatedly in a sequence of instructions, then it is better to first load the effective address into a register and use that. For example, rather than writing `add eax, [esi + 4200]; add ebx, [esi + 4200]; add ecx, [esi + 4200]`, you should prefer `lea edi, [esi + 4200]; add eax, [edi]; add ebx, [edi]; add ecx, [edi]`, which is rarely faster. At least that's the plain interpretation of this answer. – BeeOnRope Apr 26 '17 at 21:06
  • So the reason I was comparing `[esi]` and `[esi + 4200]` (or `[esi + ecx*8 + 4200]` is that this is the simplification the OP is proposing (as I understand it): that N instructions with the same complex address are transformed into N instructions with simple (one reg) addressing, plus one `lea`, since complex addressing is "time consuming". In fact, it is slower even on modern x86, but only latency-wise which seems unlikely to matter for consecutive instructions with the same address. – BeeOnRope Apr 26 '17 at 21:09
  • @BeeOnRope I see; we do this LEA up front to simplify the subsequent addresses (eliminate repetition of the same effective address). That doesn't, by itself, buy anything. Still, one thing we did achieve was that `ecx` is no longer required as an operand once we have the address in `esi`. If the register can be put to use, that could help in some way. – Kaz Apr 26 '17 at 21:51
  • 1
    Perhaps you relieve some register pressure, yes - but the opposite may be the case: if the registers you generated the effective address with are live, you need _another_ register to save the result of `lea` so it increases pressure in that case. In general, storing intermediates is a cause of register pressure, not a solution to it - but I think in most situations it is a wash. @Kaz – BeeOnRope Apr 26 '17 at 22:00
8

It seems that lots of answers already complete, I'd like to add one more example code for showing how the lea and move instruction work differently when they have the same expression format.

To make a long story short, lea instruction and mov instructions both can be used with the parentheses enclosing the src operand of the instructions. When they are enclosed with the (), the expression in the () is calculated in the same way; however, two instructions will interpret the calculated value in the src operand in a different way.

Whether the expression is used with the lea or mov, the src value is calculated as below.

D ( Rb, Ri, S ) => (Reg[Rb]+S*Reg[Ri]+ D)

However, when it is used with the mov instruction, it tries to access the value pointed to by the address generated by the above expression and store it to the destination.

In contrast of it, when the lea instruction is executed with the above expression, it loads the generated value as it is to the destination.

The below code executes the lea instruction and mov instruction with the same parameter. However, to catch the difference, I added a user-level signal handler to catch the segmentation fault caused by accessing a wrong address as a result of mov instruction.

Example code

#define _GNU_SOURCE 1  /* To pick up REG_RIP */
#include <stdio.h> 
#include <string.h>
#include <stdlib.h>
#include <stdint.h>
#include <signal.h>


uint32_t
register_handler (uint32_t event, void (*handler)(int, siginfo_t*, void*))
{
        uint32_t ret = 0;
        struct sigaction act;

        memset(&act, 0, sizeof(act));
        act.sa_sigaction = handler;
        act.sa_flags = SA_SIGINFO;
        ret = sigaction(event, &act, NULL);
        return ret;
}

void
segfault_handler (int signum, siginfo_t *info, void *priv)
{
        ucontext_t *context = (ucontext_t *)(priv);
        uint64_t rip = (uint64_t)(context->uc_mcontext.gregs[REG_RIP]);
        uint64_t faulty_addr = (uint64_t)(info->si_addr);

        printf("inst at 0x%lx tries to access memory at %ld, but failed\n",
                rip,faulty_addr);
        exit(1);
}

int
main(void)
{
        int result_of_lea = 0;

        register_handler(SIGSEGV, segfault_handler);

        //initialize registers %eax = 1, %ebx = 2

        // the compiler will emit something like
           // mov $1, %eax
           // mov $2, %ebx
        // because of the input operands
        asm("lea 4(%%rbx, %%rax, 8), %%edx \t\n"
            :"=d" (result_of_lea)   // output in EDX
            : "a"(1), "b"(2)        // inputs in EAX and EBX
            : // no clobbers
         );

        //lea 4(rbx, rax, 8),%edx == lea (rbx + 8*rax + 4),%edx == lea(14),%edx
        printf("Result of lea instruction: %d\n", result_of_lea);

        asm volatile ("mov 4(%%rbx, %%rax, 8), %%edx"
                       :
                       : "a"(1), "b"(2)
                       : "edx"  // if it didn't segfault, it would write EDX
          );
}

Execution result

Result of lea instruction: 14
inst at 0x4007b5 tries to access memory at 14, but failed
Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
ruach
  • 1,109
  • 8
  • 17
  • 1
    Breaking up your inline asm into separate statements is unsafe, and your clobbers lists are incomplete. The basic-asm block tells the compiler is has no clobbers, but it actually modifies several registers. Also, you can use `=d` to tell the compiler the result is in EDX, saving a `mov`. You also left out an early-clobber declaration on the output. This does demonstrate what you're trying to demonstrate, but is also a misleading bad example of inline asm that will break if used in other contexts. That's a Bad Thing for a stack overflow answer. – Peter Cordes Jan 03 '19 at 08:49
  • If you don't want to write `%%` on all those register names in Extended asm, then use input constraints. like `asm("lea 4(%%ebx, %%eax, 8), %%edx" : "=d"(result_of_lea) : "a"(1), "b"(2));`. Letting the compiler init registers means you don't have to declare clobbers, either. You're overcomplicating things by xor-zeroing before mov-immediate overwrites the whole register, too. – Peter Cordes Jan 03 '19 at 08:55
  • @PeterCordes Thanks, Peter, do you want me to delete this answer or modify it following your comments? – ruach Jan 03 '19 at 08:57
  • If you did want to leave in the mov instructions instead of telling people to single-step in GDB to see the compiler generated insns, `asm("mov ... %%eax\n" "mov ... %%ebx\n" "lea 4(%%ebx, %%eax, 8), %%edx" : "=d"(result_of_lea) :: "eax", "ebx");`. It doesn't need to be `volatile` because we print the result; it won't be optimized away. (The `mov` does, because you didn't write any code to use the `mov` result.) – Peter Cordes Jan 03 '19 at 08:57
  • 1
    If you fix the inline asm, it's not doing any harm and maybe makes a good concrete example for beginners that didn't understand the other answers. No need to delete, and it's an easy fix like I showed in my last comment. I think it would be worth an upvote if the bad example of inline asm was fixed into a "good" example. (I didn't downvote) – Peter Cordes Jan 03 '19 at 08:59
  • @PeterCordes I tried to modify the code following your comments; could you please go through it once again to check if it is still wrong? I appreciate your help. – ruach Jan 03 '19 at 09:16
  • Yup, that looks right. You might add a clobber or dummy output for the `mov`, though. Fixed that for you and added comments to the asm statements. – Peter Cordes Jan 03 '19 at 09:55
  • Thanks :D I also learned that input operand can be used to let the compiler initialize some registers. – ruach Jan 03 '19 at 12:38
  • BTW, it's strange to use 32-bit addressing modes in x86-64 code. Your signal handler use `REG_RIP` and other 64-bit-specific stuff, so it would have been more normal to use `lea 4(%%rbx, %%rax, 8), %%edx`, so the machine code doesn't need an address-size override prefix or a REX prefix. – Peter Cordes Jan 03 '19 at 12:50
  • Oh yes, you are right, I should have matched two code. However, I used 32bit assembly intentionally because some other answers use the same code for wrongly saying that mov cannot be used with 4(ebx, eax, 8) format; but I agree the unmatched code may confuse others. If I want to modify the code as you recommended, should I initialize the rax & rbx not the eax & ebx? – ruach Jan 03 '19 at 13:11
  • 1
    Where does anyone say that `mov 4(%ebx, %eax, 8), %edx` is invalid? Anyway, yes, for `mov` it would make sense to write `"a"(1ULL)` to tell the compiler you have a 64-bit value, and thus it needs to make sure it's extended to fill the whole register. In practice it will still use `mov $1, %eax`, because writing EAX zero-extends into RAX, unless you have a weird situation of surrounding code where the compiler knew that RAX = `0xff00000001` or something. For `lea`, you're still using 32-bit operand-size, so the any stray high bits in input registers have no effect on the 32-bit result. – Peter Cordes Jan 03 '19 at 13:22
  • See [Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?](https://stackoverflow.com/q/34377711) for LEA register choices, and [Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register?](https://stackoverflow.com/q/11177137) for writing to 32-bit registers. – Peter Cordes Jan 03 '19 at 13:25
  • Turns out `"a"(1)` does always result in the full RAX = 1, not just the low 32 bits (EAX). This appears to be an implementation detail, though, because the effect I was talking about (not zero or sign-extending to the full register for `int` inputs) does happen for variables that aren't known at compile time. https://godbolt.org/z/vocejs shows gcc and clang both `mov` repeatedly for constants but not runtime variables. But gcc reuse the EAX=1 if you repeat the same input. (clang doesn't!) – Peter Cordes Jan 03 '19 at 13:46
  • @PeterCordes Thanks for useful references and the code. I am still bit confused about the part that you've mentioned about the variables that aren't known at compile time. Does it mean that the second asm statement in the function foo is translated into mov rax? And for the above code in my answer, it seems that using 64bit input operands and 32bit output operand is the most efficient format of the lea instruction in the aspect of the binary size (1byte lesser than having 64bit output). Is it correct? Thanks a lot! – ruach Jan 04 '19 at 01:12
  • Yes, the default address size in 64-bit mode is 64-bit, and the default operand-size is 32-bit. Like I explained in detail in the footnotes in [Which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted?](https://stackoverflow.com/q/34377711), those sizes avoid both REX and address-size prefixes. As for `mov rax`, no, neither statement writes `rax`. If you meant `mov rdx, [mem]`, no, that would be a 64-bit load. You could think of `mov` to a 32-bit register as being implicitly a `movzx qword, dword`, but that's – Peter Cordes Jan 04 '19 at 02:12
6

Here is an example.

// compute parity of permutation from lexicographic index
int parity (int p)
{
  assert (p >= 0);
  int r = p, k = 1, d = 2;
  while (p >= k) {
    p /= d;
    d += (k << 2) + 6; // only one lea instruction
    k += 2;
    r ^= p;
  }
  return r & 1;
}

With -O (optimize) as compiler option, gcc will find the lea instruction for the indicated code line.

user3634373
  • 61
  • 1
  • 2
4

LEA : just an "arithmetic" instruction..

MOV transfers data between operands but lea is just calculating

the accountant
  • 476
  • 5
  • 14
  • LEA obviously moves data; it has a destination operand. LEA doesn't always calculate; it calculates if the effective address expressed in the source operand calculates. LEA EAX, GLOBALVAR doesn't calculate; it just moves the address of GLOBALVAR into EAX. – Kaz Apr 26 '17 at 19:26
  • @Kaz thanks for your feedback. my source was "LEA (load effective address) is essentially an arithmetic instruction—it doesn’t perform any actual memory access, but is commonly used for calculating addresses (though you can calculate general purpose integers with it)." form [Eldad-Eilam book](https://www.amazon.com/Reversing-Secrets-Engineering-Eldad-Eilam/dp/0764574817) page 149 – the accountant Jul 13 '17 at 20:26
  • @Kaz: That's why LEA is redundant when the address is already a link-time constant; use `mov eax, offset GLOBALVAR` instead. You *can* use LEA, but it's slightly larger code-size than `mov r32, imm32` and runs on fewer ports, *because it still goes through the address-calculation process*. `lea reg, symbol` is only useful in 64-bit for a RIP-relative LEA, when you need PIC and/or addresses outside the low 32 bits. In 32 or 16-bit code, there is zero advantage. LEA is an arithmetic instruction that exposes the ability of the CPU to decode / compute addressing modes. – Peter Cordes Mar 29 '18 at 01:37
  • 1
    @Kaz: by the same argument, you could say that `imul eax, edx, 1` doesn't calculate: it just copies edx to eax. But actually it runs your data through the multiplier with 3 cycle latency. Or that `rorx eax, edx, 0` just copies (rotate by zero). – Peter Cordes Mar 29 '18 at 01:39
  • @PeterCordes My point is that both LEA EAX, GLOBALVAL and MOV EAX, GLOBALVAR just grab the address from an immediate operand. There is no multiplier of 1, or offset of 0 being applied; it could be that way at the hardware level but it's not seen in the assembly language or instruction set. – Kaz Mar 29 '18 at 14:00
  • @Kaz: At the machine-code level, the addressing mode does have to encode that there's no base. I admit that's not quite the same as an explicit `+0` or something. But there's still a ModR/M byte for LEA, to specify the `disp32` immediate displacement, while `MOV r32,imm32` is just opcode + imm32, if you're thinking about the machine-code instruction set rather than the asm-source level syntax. (And you're using MASM syntax, which omits the `[]` around symbols as memory operands. So `mov eax, OFFSET globalvar` is taking something simpler than `globalvar`: just its offset.) – Peter Cordes Mar 29 '18 at 14:22
3

All normal "calculating" instructions like adding multiplication, exclusive or set the status flags like zero, sign. If you use a complicated address, AX xor:= mem[0x333 +BX + 8*CX] the flags are set according to the xor operation.

Now you may want to use the address multiple times. Loading such an addres into a register is never intended to set status flags and luckily it doesn't. The phrase "load effective address" makes the programmer aware of that. That is where the weird expression comes from.

It is clear that once the processor is capable of using the complicated address to process its content, it is capable of calculating it for other purposes. Indeed it can be used to perform a transformation x <- 3*x+1 in one instruction. This is a general rule in assembly programming: Use the instructions however it rocks your boat. The only thing that counts is whether the particular transformation embodied by the instruction is useful for you.

Bottom line

MOV, X| T| AX'| R| BX|

and

LEA, AX'| [BX]

have the same effect on AX but not on the status flags. (This is ciasdis notation.)

  • "This is a general rule in assembly programming: Use the instructions however it rocks your boat." I wouldn't personally hand that advice out, on account of things like `call lbl` `lbl: pop rax` technically "working" as a way to get the value of `rip`, but you'll make branch prediction very unhappy. Use the instructions however you want, but don't be surprised if you do something tricky and it has consequences you didn't foresee – The6P4C Dec 15 '19 at 07:23
  • @The6P4C That is a useful caveat. However if there is no alternative to making the branch prediction unhappy, one has to go for it. There is another general rule in assembly programming. There may be alternative ways to do something and you must choose wisely from alternatives.There are hundreds of way to get the content of register BL into register AL. If the remainder of RAX need not be preserved LEA may be an option. Not affecting the flags may be a good idea on some of the thousands of types of x86 processors. Groetjes Albert – Albert van der Horst Jan 28 '20 at 15:21
0

Forgive me if someone already mentioned, but in the days of x86 when memory segmentation was still relevant, you may not get the same results from these two instructions:

LEA AX, DS:[0x1234]

and

LEA AX, CS:[0x1234]
tzoz
  • 31
  • 2
  • 2
    The "effective address" is just the "offset" part of a `seg:off` pair. LEA isn't affected by the segment base; both those instructions will (inefficiently) put `0x1234` into AX. x86 unfortunately doesn't have an easy way to calculate a full linear address (effective + segment base) into a register or register-pair. – Peter Cordes Jun 07 '20 at 03:26
  • 1
    @PeterCordes Very useful, thanks for correcting me. – tzoz Jun 08 '20 at 20:45