14

IDM says the memory op uses SS segment if EBP is used as base register. As a result, [ebp + esi] and [esi + ebp] references SS and DS segments, respectively. See NASM's doc: 3.3 Effective Address.

In the above same section, NASM mentioned how to generate shorter machine code by replacing [eax*2] with [eax+eax].

However, NASM also generates [ebp + ebp] for [ebp*2] (i.e. no base register).

I suspect [ebp+ebp] references SS segment, and [ebp*2] references DS segment.

I asked NASM this question. They think [ebp*2] and [ebp+ebp] are the same, but it doesn't make sense to me. Obviously, [ebp+ebp] (ebp as base register) references SS segment. If they're the same, [ebp*2 must reference SS too. This means SS is referenced as long as ebp is the base or index register, which in turn means, both [ebp + esi] and [esi + ebp] reference SS segments, so they must be the same.

Does anyone know which segment [ebp*2] uses?

zx485
  • 24,099
  • 26
  • 45
  • 52
wildpie
  • 143
  • 4
  • Notice that if you had supported your claim "IDM says" with a link the same way you did with the NASM docs, you would have figured this out yourself a long time ago. – Ben Voigt Apr 08 '18 at 19:52
  • 2
    **This NASM optimization (`[ebp*2]` -> `[ebp+ebp]`) assumes a flat memory model where `ss` and `ds` are equivalent**, which is the case under all the major mainstream x86 OSes. It's an interesting corner case because a pure `[idx*2]` addressing mode without a register or 32-bit absolute base is also very unusual (except for LEA to copy-and-shift). Normally people use real pointers instead of faking word-addressable memory by scaling them by 2, or whatever you're doing. – Peter Cordes Apr 08 '18 at 22:46
  • 4
    *I asked NASM this question.* Do you mean you asked the NASM *developers*? Or that you assembled code with NASM and/or disassembled with `ndisasm` to see what the program itself "thought"? Because the info you got was wrong: `[esi + ebp]` uses `ds`. And if you're assuming that `ss` and `ds` are interchangeable, you'd optimize `[ebp + esi]` to `[esi + ebp]` to avoid needing a disp8 = 0. (EBP as a base register is only encodeable with a disp8 or disp32; the encoding that would mean EBP + no displacement actually means there's a disp32 with no base register. (But potentially an index). – Peter Cordes Apr 08 '18 at 22:55
  • This sounds like a great reason to prefer AT&T syntax. – R.. GitHub STOP HELPING ICE Apr 08 '18 at 23:04
  • 2
    @PeterCordes : He originally asked on the old (defunct) NASM forum that was on Sourceforge https://sourceforge.net/p/nasm/discussion/167169/thread/18e79c06/ . He had a problem getting email activated on nasm.us – Michael Petch Apr 08 '18 at 23:10
  • @BenVoigt not exactly, see my comment to sep roland. – wildpie Apr 09 '18 at 00:12
  • 3
    @PeterCordes thanks for your comments, very good point about flat memory model x86 os uses. i was focusing on the correctness of the assembler. i'm writing a simple assembler, so this assumption doesn't apply to me, but very good point. – wildpie Apr 09 '18 at 00:16
  • @PeterCordes one thing though, nasm can be used to write code for any purpose. it's not a good idea to assume the program it builds always run under some os. it could be used to write a boot load and then enter protected mode. nasm should offer a mode without this assumption. i run nasm again without "-f elf32" option. this selects the bin format. it still optimize [ebp*2] into [ebp+ebp]. not sure what bin format exactly means, but at least it's not elf32, where nasm can reasonably assume the code will run under a unix system. – wildpie Apr 09 '18 at 00:40
  • 2
    Indeed, any assumption of a flat memory model should be optional. This just explains why it was overlooked, since NASM does it even for `[symbol + ebp*2]`. `bin` is a flat binary, with no implications about what you might do with the resulting machine code. e.g. use it as a .COM executable, a boot sector, or embed it into something else. (The default mode for `bin` is `bits 16`, i.e. 16-bit mode.) – Peter Cordes Apr 09 '18 at 00:43
  • 1
    related: [How to force NASM to encode `[1 + rax*2]` as `disp32 + index*2` instead of `disp8 + base + index`?](https://stackoverflow.com/q/48848230/995714) – phuclv Apr 09 '18 at 02:03

2 Answers2

13

The Intel manual tells us below figure 3-11, which deals with Offset = Base + (Index * Scale) + Displacement:

The uses of general-purpose registers as base or index components are restricted in the following manner:

  • The ESP register cannot be used as an index register.
  • When the ESP or EBP register is used as the base, the SS segment is the default segment. In all other cases, the DS segment is the default segment.

This means that NASM is wrong when it changes [ebp*2] into [ebp+ebp] (in order to avoid the 32bit displacement).

[ebp*2] uses DS because ebp is not used as base
[ebp+ebp] uses SS because one of the ebp is used as base

It would then be best to specify that you don't want this behaviour from NASM.
Until the time NASM authors realize their mistake, you can disable this behaviour (where EBP is used as an index) by writing:

[NoSplit ebp*2]

Sep Roland
  • 20,265
  • 3
  • 36
  • 58
  • 1
    Thanks for confirming. This quotation is actually on the next page I was reading (Table 3-5 Default Segment Selection Rules). It doesn't mention the "in all other cases" part. If only I read one more page. Thanks, – wildpie Apr 08 '18 at 21:44
  • @MichaelPetch accepted, i browse stackoverflow quite often, but this is my first question. – wildpie Apr 08 '18 at 22:13
  • 1
    @wildpie : no problem at all. I had noticed this was your first question, so it was just an FYI. Thanks for taking the time to consider accepting the answer. – Michael Petch Apr 08 '18 at 22:14
  • 1
    It would be more correct to say that you *need* to disable this behaviour, not that you "don't need" the optimization, when `ss` vs `ds` matters and you're using `ebp`. In any other case, it doesn't change behaviour (other registers still use `ds`, and `esp` can't be an index so `[esp*2]` isn't encodeable either way). – Peter Cordes Apr 08 '18 at 22:50
2

Indeed, NASM's optimization choices are inconsistent, assuming that ss and ds are interchangeable (i.e. a flat memory model) when splitting [ebp*2] into [ebp+ebp] to save 3 bytes (disp32 vs. disp8), but not optimizing [ebp + esi] into [esi + ebp] to avoid a disp8.

(And the NASM manual even mentions the different default segment, contradicting the conclusion you drew from the wrong info you got about [0 + ebp*2] vs. [0+ebp+ebp*1].)

EBP or ESP as a base register imply SS, otherwise the default is DS. When two registers are used in a NASM addressing mode, the first one is the base, unless you write [ebp*1 + esi], explicitly applying the scale factor to the first one. An index register never implies a segment, which makes sense if you think about the design intent: an index relative to a segment:offset given by a base register or an absolute disp32.

As written, [ebp*2] is an indexed addressing mode, implicitly requiring 4 bytes of zeros as a 32-bit displacements. You can get NASM to encode it that way with [nosplit ebp*2].

Perhaps NASM and YASM overlooked this corner case, because flat memory models are nearly universal outside of 16-bit code. (And 16-bit addressing modes are different and don't support scale factors. Although you can use 32-bit addressing modes in 16-bit code to take advantage of scale factors and the wider choice of registers, even in pure real mode rather than "unreal" mode which lets you set segment limits high enough that offsets > 2^16 are usable.)

All mainstream 32 and 64-bit x86 OSes use a flat memory model, where SS and DS are interchangeable, making this optimization safe under those OSes when you aren't doing anything weird. Segmentation was sometimes used to make non-executable stacks before that was supported by page tables, but that's still a flat memory model. (64-bit code fixes the base/limit for CS/DS/ES/SS so this optimization is always safe there unless SS is an unusable segment entirely, like maybe write-protected if that's possible.)

Still, any assumption of a flat memory model should be optional. This is a bug in NASM and YASM. They should either respect the difference between SS and DS, or should take full advantage of a flat memory model to help out programmers who don't remember which addressing modes have "hidden" extra bytes required, like optimizing [ebp+esi] with no displacement into [esi+ebp]. Preferably there should be an option or directive to tell the assembler that it can assume SS and DS are the same.

Operands to LEA can always take advantage, because LEA only deals with the offset part of the address so segments are irrelevant. (And this would be the most common use case for an addressing mode like [ebp*2] with no displacement: using that as a memory address would maybe emulate word-addressable memory? That's just weird, normally there's a real pointer as one component of the address.)


Understanding x86 32/64-bit addressing modes:

Other than 64-bit RIP-relative addressing, 32/64-bit addressing modes are any subset of disp0/8/32 + base_reg + idx_reg*1/2/4/8, where each of the 3 terms / components are optional. But at least one of disp32 or base register is required. (See also Referencing the contents of a memory location. (x86 addressing modes)).

[disp32=0 + ebp*2] (with disp32=zero) has default segment = DS. You can get this encoding in NASM from [nosplit ebp*2], and addresses like [ebp*4] can't be split.

[ebp + ebp + disp8=0] has default segment = SS, because EBP is used as a base register.

The encoding that would mean ebp with no displacement actually means disp32 with no base reg, so the disp32 is effectively the base (implying segment register DS, because the base isn't EBP or ESP). This is the case with or without a SIB byte, so [ebp + ebp*1] still has to be encoded with a disp8=0. Other registers don't have that problem, so normally splitting saves 4 bytes instead of just 3 for EBP. (Except for r13 which uses the same ModR/M encoding as RBP, I guess so that part of the decode hardware doesn't need the extra bit from the REX prefix.)

ESP can't be an index register, so [esp*2] is impossible to encode with or without splitting. So the special case of NASM's optimization only affects EBP*2. (base=ESP is the escape code for a SIB byte, and index=ESP in the SIB byte means no index, allowing you to encode [esp + 12].)

But unfortunately NASM/YASM split EBP*2 even when there is a constant that needs a disp32 anyway, like [symbol + ebp*2], where it doesn't save any bytes and in fact hurts performance for LEA (but not loads/stores) on Sandybridge-family CPUs. 3-component lea eax, [symbol + ebp + ebp*1] is slower than 2-component lea eax, [symbol + ebp*2]: higher latency and 1-per-clock throughput instead of 2. According to http://agner.org/optimize/, those would be equally slow on AMD Bulldozer/Ryzen, because a scaled index makes a "slow-LEA" even with only 2 components.

IDK if any old CPUs do better with an unscaled index and 3-component addressing modes, for LEA or for actual memory operands.


NASM and YASM behaviour:

 $ nasm -felf32 -g -Fdwarf foo.asm
 $ objdump -drwC -Mintel -S foo.o | sed 's/DWORD PTR//'
 # (edited to put the NASM source line's addressing mode onto the same line as the disassembler output, instead of separate lines)
00000000 <sym-0x2c>:
   0:   8b 04 2e                mov    eax, [esi+ebp*1]         ; [esi+ebp]
   3:   8b 44 35 00             mov    eax, [ebp+esi*1+0x0]     ; [ebp + esi]
   7:   8b 04 2e                mov    eax, [esi+ebp*1]         ; [ebp*1 + esi]
   a:   8b 44 2d 00             mov    eax, [ebp+ebp*1+0x0]     ; [ebp*2]
   e:   8b 04 6d 00 00 00 00    mov    eax, [ebp*2+0x0]         ; [nosplit ebp*2]
  15:   8b 45 00                mov    eax, [ebp+0x0]           ; [ebp*1]   ; "split" into base=ebp with no SIB byte
  18:   8b 04 2d 00 00 00 00    mov    eax, [ebp*1+0x0]         ; [nosplit ebp*1]
  1f:   8b 84 2d d2 04 00 00    mov    eax, [ebp+ebp*1+0x4d2]   ; [ebp*2 + 1234]   ; bad split for LEA, neutral on modern CPUs for load/store
  26:   8b 85 15 cd 5b 07       mov    eax, [ebp+0x75bcd15]     ; [ebp*1 + 123456789]
sym:       ; using a symbol reference instead of a numeric constant doesn't change anything
  2c:   8b 84 2d 2c 00 00 00    mov    eax, [ebp+ebp*1+0x2c]    2f: R_386_32    .text   ; [ebp*2 + sym]
  33:   8b 84 2d 2c 00 00 00    mov    eax, [ebp+ebp*1+0x2c]    36: R_386_32    .text   ; [sym + ebp*2]

YASM encodes all these cases identically to NASM.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606