0

As part of my project which is an RPN calculator with unlimited precision, I'm trying to write a method to take in a buffer with size of at most 80bytes. Since I wanna support unlimited precision(or at least limited by only the heap size), I wanna take in that buffer and create a pointer that points to the head of a linked list that will represent the number, so for example if the input inside the buffer is : 0x7D12AF

Then the linked list generated by that input will look something like :

[AF, addr1] -addr1->[12, addr2] -addr2-> [7D, addr3] -addr3-> 0

where each link is 5 bytes, 4 for the pointer to the next link, and one byte for the data. Here's my shot at it, I'd take any suggestion since I'm really not sure about what I'm doing: (assume atoi takes in a hex digit and converts it to a numeric value)

`section .bss
        ptr: resb 5
 section .data

setcion .text
align 16
extern malloc

_buffersize
    push ebp
    mov ebp, esp
    mov ecx, [ebp+8]


_buffersize:
    push ebp
    mov ebp, esp
    push ecx
    mov ecx, [ebp+8]
    xor eax, eax

    .loop:
        cmp [ecx], 20h
        jle .done
        inc ecx
        inc eax
        jmp .loop

    .done:
        pop ecx
        mov esp, ebp
        pop ebp
        ret

_listify:
    push ebp
    mov ebp, esp

    mov edx, [ebp+8]                ; pointer to the first byte in the number_string
    pushad

    push edx                        ; push function argument
    call _buffersize                ; eax now holds the size of the buffer 
    add esp, 4                      ; clean up stack after call


    mov ecx, eax                    ; count for the loop

    .loop:
        pushad                      ; allocate 5 bytes for a node : 4 for a next ptr, 1 for data
        push 5
        call malloc                 ; eax now points to the 5 bytes allocated
        add esp, 4                  ; clean up stack after call to malloc
        mov [ptr], eax              ; now ptr points to the address in memory of the 5 allocated bytes 
        popad
        push [edx]                  ; push the first byte pointed to by edx as an argument for atoi (atoi converts a signle HEX digit to it's numeric value)
        call _atoi
        add esp, 4                  ; eax now holds the numeric value of that 1 byte character
        mov ebx, [ptr]              ; ebx points to the allocated memory
        mov [ebx], dword 0          ; the address of the next link is NULL as we're insterting at the head of the lsit
        mov [ebx], byte eax         ; hopefully, ebx should now points to 5 bytes in memory of the form [b4b3b2b1b0] where b4b3b2b1 is the address of the next link & b0 is a 0 <= number <16
        mov [ptr], ebx              ; now ptr points to the address of the newly updated linked list representing the number
        inc edx                     ; get ready to read next byte
       loop .loop  

    popad
    mov esp, ebp
    pop ebp
    ret
`

also another question I have is : is there a way to store number in its hex representation? I think it's kind of a stupid question because the representation is just how I look at it but the value is the same .. so converting a hex digit ASCII representation to an int is just the same, and to make hex I should just treat it that way when converting from char to int and vice versa.. please correct me if I'm wrong. Thanks!

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
Ed_
  • 347
  • 1
  • 8

1 Answers1

2

4 for the pointer to the next link, and one byte for the data

So your proposed format only uses 20% of the space for actual data. Actually much less than that because malloc has internal overhead, and each allocation will be at least 8 byte aligned, maybe 16. So you're wasting at least 7/8th or 15/16th of your memory / cache footprint, and more when you include malloc overhead.

See this for more about why it's terrible and what you should do instead, and also an implementation for adding linked lists with 1 hex digit (4 bits) per node, instead of your proposed 8 bits (2 hex digits).

Use an array, use realloc if you need to grow it. This lets you add in 32 or 64-bit chunks (in 64-bit mode). If you want, save realloc calls by allocating extra space like C++ std::vector does, tracking allocated vs. in-use space separately.

Arrays are easier and more efficient to loop over.


Is there a way to store number in its hex representation? I think it's kind of a stupid question because the representation is just how I look at it but the value is the same

ASCII hex is a serialization format for numbers; it uses two ASCII bytes per 8 bits (2 nibbles) of data. See How to convert a binary integer number to a hex string? for how you convert a binary integer to a hex string.

To do the reverse, converting from hex to a binary integer in a register, you can convert a digit and to total = (total<<4) | digit. Where digit is an integer in the 0..15 range. Given an ASCII character, you can subtract '0' and branch on the result being > 9, and if so subtract 'A' instead.

For arbitrary-length hex input, you can start at the end of a buffer and convert 2 hex digits to a byte, store it in the buffer and decrement the pointer.

(If the input ends up being an odd number of hex digits, this is a problem because you want the start of your number to be aligned to a byte boundary. So if you know the length of the hex string, use that to decide whether to start by converting the first digit by itself or not. Or if you have a pointer to the end, you could read digits backwards.

Prefer explicit-length strings / buffers for handling ASCII digits, so you know how many digits you have in the first place, without having to loop through looking for a 0 byte as the terminator of a C implicit-length string.)

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
  • This is kind of a comment on the first question (which got too long to post as a comment), answering 2nd question (the conceptual one about hex vs. numbers). That's one of the reasons why SO discourages posting 2 questions in 1. :/ – Peter Cordes May 23 '20 at 00:35
  • If I were to use an array instead, and say the buffer size is bounded by 80, therefore an array representing a number could never top 80 bytes(right?) , I could statically allocate space for each number and have a pointer to it from each stack entry. – Ed_ May 23 '20 at 10:20
  • @sadElephent: You could statically allocate a linked list, too; that choice is orthogonal. But yes, if you're going to statically allocate with a fixed upper bound then a linked list makes even less sense. – Peter Cordes May 23 '20 at 15:55
  • So I actually took your suggestion and decided to use an array but in a way each byte in the array represents 1 digit, and I dynamically allocated it using malloc , so basically my stack is a bunch of array pointers each of log_16(number) digits. I wanna be able to perform a bitwise OR and AND on two numbers represented by these arrays(in reverse), say: [1,A] | [2,B] so the numbers are actually 1010 0001 | 1011 0010 = 10110011. How would I go about that? – Ed_ May 30 '20 at 05:15
  • @sadElephent: ok, so you decided to store 1 nibble per byte instead of packing your value into contiguous bits. Strange choice but ok. Bitwise AND and OR simply work the same way they would for whole bytes. Padding lines up with padding, values line up with values. As a bonus you can do more than 1 byte at a time because carry isn't needed. So you can go 4 bytes at a time with dword `or`, or even 16 bytes at a time with SSE2 `por`. i.e. just OR each byte from one array into the corresponding byte of the other array, as many bytes at a time as you want. – Peter Cordes May 30 '20 at 05:18