How would I find the length of a string using NASM?

Question

I'm trying to make a program using NASM that takes input from command line arguments. Since string length is not provided, I'm trying to make a function to compute my own. Here is my attempt, which takes a pointer to a string in the ebx register, and returns the length of the string in ecx:

len:
    push ebx
    mov ecx,0
    dec ebx
    count:
        inc ecx
        inc ebx
        cmp ebx,0
        jnz count
    dec ecx
    pop ebx
    ret

My method is to go through the string, character by character, and check if it's null. If it's not, I increment ecx and go to the next character. I believe the problem is that cmp ebx,0 is incorrect for what I'm trying to do. How would I properly go about checking whether the character is null? Also, are there other things that I could be doing better?

`cmp ebx,0` is wrong and also the `push ebx` at the end should probably be `pop ebx` (otherwise you'll get a *stack overflow* !). — Paul R, Jun 24 '11 at 14:03
try and get into the habit of copying and pasting your actual code, rather than re-typing it. Also please edit your question so that it matches the actual code. — Paul R, Jun 24 '11 at 15:05
@paul-r That's what I usually do. I was working in VirtualBox and clipboard sharing was not set up properly. — Vineel Adusumilli, Sep 01 '11 at 00:35

score 5 · Accepted Answer · answered Jun 24 '11 at 14:11

5

You are comparing the value in ebx with 0 which is not what you want. The value in ebx is the address of a character in memory so it should be dereferenced like this:

cmp byte[ebx], 0

Also, the last push ebx should be pop ebx.

answered Jun 24 '11 at 14:11

mtvec

16,168
4
47
79

score 1 · Answer 2 · edited Oct 18 '18 at 16:06

Here is how I do it in a 64-bit Linux executable that checks argv[1]. The kernel starts a new process with argc and argv[] on the stack, as documented in the x86-64 System V ABI.

_start:
    pop    rsi              ; number of arguments (argc)
    pop    rsi              ; argv[0] the command itself (or program name)
    pop    rsi              ; rsi = argv[1], a pointer to a string
    mov    ecx, 0           ; counter
.repeat:
    lodsb                   ; byte in AL
    test   al,al            ; check if zero
    jz     .done            ; if zero then we're done
    inc    ecx              ; increment counter
    jmp    .repeat          ; repeat until zero
.done:
    ; string is unchanged, ecx contains the length of the string


; unused, we look at command line args instead
section .rodata
    asciiz:    db    "This is a string with 36 characters.", 0

This is slow and inefficient, but easy to understand.

For efficiency, you'd want

only 1 branch in the loop (Why are loops always compiled into "do...while" style (tail jump)?)
avoid a false dependency by loading with movzx instead of merging into the previous RAX value (Why doesn't GCC use partial registers?).
subtract pointers after the loop instead of incrementing a counter inside.

And of course SSE2 is always available in x86-64, so we should use that to check in chunks of 16 bytes (after reaching an alignment boundary). See optimized hand-written strlen implementations like in glibc. (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/strlen.S.html).

score 0 · Answer 3 · answered Jun 26 '11 at 18:07

0

Here how I would have coded it

len:
      push ebx
      mov  eax, ebx
lp:
        cmp byte [eax], 0
        jz  lpend
        inc eax
        jmp lp
lpend:
        sub eax, ebx

      pop ebx
      ret

(The result is in eax). Likely there are better ways.

answered Jun 26 '11 at 18:07

ShinTakezou

8,726
23
38

1

Pointer-increment is better, but the terrible loop structure is a bottleneck on many CPUs. You have 2 branches (one taken, one not-taken) in each loop iteration, so it can only run at one iteration per 2 clocks on CPUs before Haswell, i.e. ones that existed when you wrote this. [Why are loops always compiled into "do...while" style (tail jump)?](https://stackoverflow.com/q/47783926). (Of course only checking one byte at a time is pretty terrible for modern x86 with SSE2.) – Peter Cordes Oct 18 '18 at 15:52

How would I find the length of a string using NASM?

3 Answers3