1

I am a beginner in assembly and I have this homework where I have to create a strlen function to find the length of any string.

I tried subtracting 4 from edx because I am seeing 4 extra characters at the end, but that did not fix anything. They are still there.

section .data   
text: db "Hello world, trying to find length of string using function."     ;our string to be outputted

section .text
global _start   ;declared for linker

_start:     
    mov eax, 4      ;system call number (sys write)
    mov ebx, 1      ;file descriptor to write-only
    mov ecx, text   ;message to output
    call strlen
    mov edx, len    ;length of string to print
    int 80h         ;interrupt

exit:       
    mov eax, 1  ;system call number (sys exit)
    mov ebx, 0  ;file descriptor to read-only
    int 80h     ;interrupt

strlen: 
    push ebp        ;prologue, save base pointer
    mov ebp, esp    ;copy esp to ebp
    push edi        ;push edi for use

                    ;body
    mov edi, text   ;save text to edi, and i think when i do that edi expands? if text = 5 bytes, and edi was originally 4, then edi becomes 5?
    sub edi, esp    ;subtract edi starting point by the esp starting point to get len. ex: edi = 100, esp = 95
    mov [len], edi  ;copy value of edi onto len

    pop edi         ;epilogue, pop edi out of stack
    mov esp, ebp    ;return esp back to top of stack
    pop ebp         ;pop ebp back to original
    ret             ;return address



section .bss    
len: resb 4 ;4 byte to integer

Let say I have the follow code in the .data section:

section .data   
text: db "Hello world, trying to find length of string using function."

The expected output should be "Hello world, trying to find length of string using function.", however I am getting "Hello world, trying to find length of string using function.####" where # is any random character.

This is the terminal output :

Thank you.

Jeemong
  • 11
  • 3
  • 1
    You're not looking at bytes in the string, you're calculating the distance between the `.data` section and the stack. There are a ton of examples of strlen functions here on Stack Overflow if you want to look one up, otherwise restart your `strlen` function from scratch because your current attempt is on completely the wrong track. You aren't even using the arg in `ecx` you pass in the caller, you're hard-coding it as taking the strlen of `text`. – Peter Cordes Oct 01 '19 at 04:59
  • @Peter Cordes Thanks, this is my first assignment (the professor just asked us to find the length of the text in .data without specifying it) and I got it to work after I realized that I was looking at the distance between .data sections and not at the bytes. I now have this as my [code](https://i.gyazo.com/e4497b47aeccf497f3f4cc606cf23d69.png) and it seems to output what I want. Please let me know if there is anything I can fix. – Jeemong Oct 01 '19 at 05:42
  • `text` - the caller's EDI value makes no sense either. If that works, it's by coincidence. `[esp]` points at the last thing you pushed. You need a loop, not a subtract. Or you could put a label at the *end* of the string and do `mov eax, text_end - text` to get the assembler to calculate the length for you at assemble time. Anyway, if you have an answer, feel free to post an answer to your own question. (But if you'd posted the text you made an image of, I'd have downvoted it because it's still not close to being useful.) – Peter Cordes Oct 01 '19 at 05:48
  • 3
    Most assemblers do not add a terminating 0 to a string declared using db "...". To handle this use db "...",0 to append the terminating 0. – rcgldr Oct 01 '19 at 05:57
  • @Peter Cordes I think it's a fluke, when I was cleaning the code and removed the .bss section, it messed up and I started getting a lot of random symbols at the end. Why does EDI not make sense? My plan was to set it to the string, and then I can subtract the memory address of ESP from EDI to get the bytes since ESP is at the end of EDI? I feel like I am misunderstanding something here of how this works. – Jeemong Oct 01 '19 at 06:04
  • @rcgldr Sorry but what does that do? I am very new to this. – Jeemong Oct 01 '19 at 06:05
  • Nowhere in your code does anything get a pointer to the end of the string. And no, ESP isn't "at the end of EDI". EDI is just an integer register; there's no magic that gives its value any special relation to ESP. Unless you mean that ESP is *pointing* to a saved EDI value, but that saved value has nothing to do with anything Go talk to your instructor, and/or single-step through your code in a debugger and look at register values. Maybe also use `strace ./my_program` to see what length you end up passing to the system call. – Peter Cordes Oct 01 '19 at 06:06
  • @PeterCordes I did `strace ./my_program` and got `write(1, "The function strlen is able to d"..., 543516756The Function strelen is able to detect the length of this text. Yeah!) = 2048`, so yeah I think it's broken. I'm going to talk to my professor about this, this isn't even an assembly course, but we have to learn about it it lab, so we literally have like a few slides to go off of... Thanks anyways. – Jeemong Oct 01 '19 at 06:13
  • 2
    Yup, you're just calculating some random-garbage large value for the length instead of looping over the bytes, counting until you find a terminating `0` byte. (Which you have to make sure is there in the data with db "foo", 0`). Then `write()` writes data up until it crosses into an unmapped page and returns `-EFAULT`, but copying the valid bytes to the stdout file descriptor has already happened. Probably the bytes after your string are all `0` anyway by coincidence so you don't notice them on a terminal. `'\0'` prints as zero-width. – Peter Cordes Oct 01 '19 at 06:16
  • @Jeemong - strlen scans for a zero, but the text doesn't have a zero. Unlike C, you have to explicitly include a 0 after the text when declaring a string. In this case it should be | text: db "Hello world, trying to find length of string using function.",0 | . – rcgldr Oct 01 '19 at 13:05
  • 1
    Addition to my earlier comment: besides calculating garbage, you're using the absolute *address* of `len` as the length instead of the contents of that memory. So two showstopper bugs, one of them just syntax, the other one based on apparent fundamental misunderstanding(s). :/ (Sep's answer pointed this out; I hadn't looked at the caller when commenting.) – Peter Cordes Oct 01 '19 at 15:44

1 Answers1

1

Prior to calling strlen, you've loaded ECX with the address of the string for which you desire to know the length. Then use ECX in your function directly.
You don't need to use the prolog/epilog code on this little task.

strlen: push    ecx
        dec     ecx
.loop:  inc     ecx
        cmp     byte ptr [ecx], 0
        jne     .loop
        sub     ecx, [esp]
        mov     [len], ecx         ; Save length
        pop     ecx
        ret

This code runs through the string until it finds a zero. At that point the starting address (It's on the stack at [esp]) is subtracted from the address where the zero was found (It's in ECX). This produces the length.

Instead of putting the result in a memory variable, you could choose to return it in the EDX register - ready to use next!

This version of strlen can only work if you make sure that the string is actually zero-terminated. Just append the zero.

section .data   
text: db "Hello world, trying to find length of string using function.",0

This is NASM

call strlen
mov edx, len    ;length of string to print
int 80h         ;interrupt

You need the square brackets around len in order to fetch the length that is stored at that location.

call    strlen
mov     edx, [len]    ; Length of string to print
int     80h
Sep Roland
  • 20,265
  • 3
  • 36
  • 58
  • I see that this is a loop but I thought that when the stack increases, the memory addresses are descending? For example the first char of `ECX` could be at memory address `900` and it's end would be somewhere `<900`? Or am I wrong? The code makes sense in my head if it's ascending but not if it's descending. EX: `ECX` ranges from `900` to `910`, we keep incrementing `ECX` until it reaches `0` while `ESP` stays at `900`, so, 910 - 900 = length of 10? – Jeemong Oct 01 '19 at 20:37
  • @Jeemong: to cancel out the `inc ecx` for the first iteration. The other alternative is to enter the loop with a `jmp` to the `cmp/jne` loop condition to check the first byte. i.e. to implement `while(*p++){}` in terms of `do{}while(*++p);` – Peter Cordes Oct 02 '19 at 04:26