14
message db "Enter a digit ", 0xA,0xD
Length equ $- message

Is it used to get the length of a string?
How does it work internally?

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
Naveen prakash
  • 309
  • 4
  • 9
  • 5
    Have you tried [reading the documentation](http://www.nasm.us/doc/)? – Some programmer dude Nov 26 '17 at 09:36
  • 3
    I tried but could not understand! – Naveen prakash Nov 26 '17 at 09:37
  • 1
    `$` is usually referred to as the “location counter”. (In other assemblers it can be `*` or `.`) – Paul R Nov 26 '17 at 09:42
  • Related: what happens if you put `Length equ $- message` in the wrong place might help you understand how it works: https://stackoverflow.com/questions/26897633/in-nasm-labels-next-to-each-other-in-memory-are-causing-printing-issues – Peter Cordes Nov 26 '17 at 09:47
  • 3
    (`I tried but could not understand!` there's nothing wrong with failing. But it doesn't hurt to document and present what you have tried - even if it should go without saying that you tried *something* before asking for help.) – greybeard Nov 26 '17 at 09:58
  • also it helps tremendously to explain, which parts of docs you *do* understand, and which not... especially, if you don't understand particular word, etc... Then the answers may explain you things which you don't understand. (explaining whole `$` business from scratch would involve explaining you what is computer, what is CPU, etc... a whole book). – Ped7g Nov 26 '17 at 13:12
  • BTW, did you mean `0xD, 0xA` (CR LF) rather than the other way around? – Toby Speight Jan 12 '18 at 15:06

1 Answers1

23

This gets the assembler to calculate the string length for you at assemble time

$ is the address of the current position before emitting the bytes (if any) for the line it appears on. Section 3.5 of the manual doesn't go into much detail.

$ - msg is like doing here - msg, i.e. the distance in bytes between the current position (at the end of the string) and the start of the string. (See also this tutorial on NASM labels and directives like resb)

(Related: Most other x86 assemblers also use $ the same way, except for GAS which uses . (period). MMIX assembler uses @, which has the right semantic meaning).


To understand it better, it may help to see what happens when you get it wrong: In NASM labels next to each other in memory are causing printing issues. This person used

HELLO_MSG db 'Hello, World!',0
GOODBYE_MSG db 'Goodbye!',0

hlen equ $ - HELLO_MSG
glen equ $ - GOODBYE_MSG

resulting in hlen including the length of both strings.

EQU evaluates the right hand side right away, to a constant value. (In some assemblers like FASM, equ is a text substitution and you have to use glen = $ - GOODBYE_MSG to evaluate with $ at this position, instead of evaluating $ in a later mov ecx, glen instruction or something. But NASM's equ evaluates on the spot; use %define for text substitutions)


Using $ is exactly equivalent to putting a label at the start of the line and using it instead of $.

The object-size example can also be done using regular labels:

msg:   db "Enter a digit "
msgend: 
Length equ msgend - msg
Length2 equ $ - msg     ; Length2 = Length

newline: db 0xA,0xD
Length3 equ $ - msg     ; Length3 includes the \n\r LF CR sequence as well.
                        ; sometimes that *is* what you want

You can put Length equ msgend - msg anywhere, or mov ecx, msgend - msg directly. (It's sometimes useful to have a label on the end of something, e.g. cmp rsi, msgend / jb .loop at the bottom of a loop.

BTW, it's usually CR LF, not LF CR.


Less obvious examples:

times 4  dd $

assembles the same as this (but without creating a symbol table entry or clashing with an existing name):

here:    times 4 dd here

In times 4 dd $, $ doesn't update to its own address for each dword, it's still the address of the start of the line. (Try it in a file by itself and hexdump the flat binary: it's all zeros.)


But a %rep block is expanded before $, so

%rep 4
    dd $
%endrep

does produce 0, 4, 8, 12 (starting from an output position of 0 in a flat binary for this example.)

$ nasm -o foo  rep.asm  && hd foo
00000000  00 00 00 00 04 00 00 00  08 00 00 00 0c 00 00 00  

Manually encoding jump displacements:

A normal direct call is E8 rel32, with the displacement calculated relative to the end of the instruction. (i.e. relative to EIP/RIP while the instruction is executing, because RIP holds the address of the next instruction. RIP-relative addressing modes work this way, too.) A dword is 4 bytes, so in a dd pseudo-instruction with one operand, the address of the end is $+4. You could of course just put a label on the next line and use that.

earlyfunc:           ; before the call
    call func        ; let NASM calculate the offset
    db  0xE8
    dd  func - ($ + 4)       ; or do it ourselves
    db  0xE8
    dd  earlyfunc - ($ + 4)  ; and it still works for negative offsets

    ...

func:                ; after the call

disassembly output (from objdump -drwC -Mintel):

0000000000400080 <earlyfunc>:
  400080:       e8 34 00 00 00          call   4000b9 <func>    # encoded by NASM
  400085:       e8 2f 00 00 00          call   4000b9 <func>    # encoded manually
  40008a:       e8 f1 ff ff ff          call   400080 <earlyfunc>  # and backwards works too.

If you get the offset wrong, objdump will put the symbolic part as func+8, for example. The relative displacement in the first 2 call instructions differs by 5 because call rel32 is 5 bytes long and they have the same actual destination, not the same relative displacement. Note that the disassembler takes care of adding the rel32 to the address of the call instructions to show you absolute destination addresses.

You can use db target - ($+1) to encode the offset for a short jmp or jcc. (But beware: db 0xEB, target - ($+1) isn't right, because the end of the instruction is actually $+2 when you put both the opcode and displacement as multiple args for the same db pseudo-instruction.)


Related: $$ is the start of the current section, so $ - $$ is how far into the current section you are. But this is only within the current file, so linking two files that put stuff in .rodata is different from having two section .rodata blocks in the same source file. See What's the real meaning of $$ in nasm.

By far the most common use is times 510-($-$$) db 0 / dw 0xAA55 to pad (with db 0) a boot sector out to 510 bytes, and then add the boot sector signature to make 512 bytes. (The NASM manual explains how this works)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • 2
    Nice examples! You could perhaps add another one where you manually create a `call rel32` instruction using `db`, `dd`, and `$`. – fuz Nov 26 '17 at 18:35
  • @fuz: Thanks for the suggestion. Finally got around to finishing this edit. – Peter Cordes Dec 11 '17 at 11:48
  • 3
    Fun fact: the MMIX assembler uses `@` instead of `$` because `@` clearly denotes the place where we are at. – fuz Dec 11 '17 at 11:56
  • @PeterCordes can we consider **$** is equal to **RIP**, I was reading some code and I found this `lea rsi, [rel $ +0xffffffffffffffff ]`, which really confused me a lot about what that suppose to mean? – zerocool Oct 24 '20 at 05:05
  • 1
    @zerocool: No, RIP-relative addressing is relative to the start of the *next* instruction, but `$` is the start of *this* instruction. `[rel $ - 1]` would be the byte before that LEA instruction, addressed with a RIP-relative addressing mode. (I didn't check the machine code to see if that's how NASM actually assembles it). `[rip - 1]` would be the last byte of the LEA instruction, the high byte of the rel32. – Peter Cordes Oct 24 '20 at 05:24
  • so `lea rsi, [rel msg]` is equal to `lea rsi, [rip + msg - nextInsn]` ??? @PeterCordes – zerocool Oct 24 '20 at 06:01
  • @zerocool: In machine code encoding, yes. See [How do RIP-relative variable references like "\[RIP + \_a\]" in x86-64 GAS Intel-syntax work?]([How do RIP-relative variable references like "\[RIP + \_a\]" in x86-64 GAS Intel-syntax work?](https://stackoverflow.com/q/54745872)) for examples of machine code. (Note that GAS `.intel_syntax noprefix` does literally use `[rip + symbol]` as asm source syntax with the same meaning as `[rel symbol]`, even though that does *not* reflect the machine code encoding. But `[rip + A - B]` with a difference between symbols should probably work like `[rip + 1 – Peter Cordes Oct 24 '20 at 06:26