Storage and String manipulation in x86 Assembly Language

Question

I've just picked up learning assembly language coding.

Q: Convert a character string that represents any signed integer to its 2’s complement value, with the result stored in consecutive locations of memory in little endian order.

For example - 1 = 0xFFFFFFFFFFFFFFFE assuming 2's complement codewards are 64-bit. I've done the number -149 in my code which should result in 0xffff ffff ffff ff6b

            .data
S:  .string "-149"
Result:     .quad

            .text
            .globl main

main: 
    mov     S,%rax
    cmp     %rax,0
    jl      positive
    sub     %rax,%rax
    not     S
    add     S,%rax
    sub     $30,%rax
    not     %rax
    add     $1, %rax
    mov     %rax,Result

positive:
    sub     $30,%rax
    not     %rax
    add     $1,%rax 
    mov     %rax,Result

In GDB, the value for the string integer stored is this.

(gdb) x/24xb &S
0x601038:   0x2d    0x31    0x34    0x39    0x00    0x00    0x00    0x00
0x601040:   0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x601048:   0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

if I wanted to do any computations to -149, I'd have to somehow access these locations in the memory - how do I go about doing this?

If I know that the 4 is in the 10's place, I could multiply it by 10 to get 40 and then add the 9 and similiar 1x100 to get 100 and add that as well.

How do I access them to do the computation?

It seems you haven't nailed the other part too. We can't answer debug questions here, please clarify what the exact problem is. The part about consecutive "locations" is effectively unclear: x86 are little endian and depending on the size, in bits, of the numbers that you have to handle, that request may just translate to a store into memory. — Margaret Bloom, Jun 23 '16 at 07:55
Thanks Margaret, specifically I'm wondering about how to apply logical/arithmetic operations to a .string stored signed integer value. — Egyptian_Coder, Jun 23 '16 at 22:31

score 3 · Accepted Answer · edited May 23 '17 at 12:08

How do I access them to do the computation?

A string is stored as consecutive characters in memory. If it's ASCII (not UTF-8), each character is a single byte.

So you can access them one at a time with byte loads/stores, like movzbl 2(%rsi), %eax to get the 3rd character, if rsi points to the start of the string.

Or, if %rdi points to the last character (the ones place in a decimal number), then imul $10, -1(%rdi), %ecx will set %cl to the second-last character plus its place-value. (And the upper bytes of %ecx to garbage; it's probably better to do a movzx load first and then a multiply. This does work, though, to get the low 8 bits correct).

At the other end of the complexity spectrum, have a look at this SSE4.1 IPv4 dotted-quad string to 32bit integer converter. Specifically, the decimal place-value part after the shuffle, using pmaddubsw (_mm_maddubs_epi16) with a vector of [ ..., 100, 10, 1 ] to apply the place-value and one step of horizontal adding, then phaddw to horizontally add the up-to-three digits from each dotted quad.

Also How to implement atoi using SIMD?

See also the x86 tag wiki for lots of other links.

Yes: "If it's ASCII (not UTF-8)…" You can't take a specification on string values without requiring it to specify the character set and encoding. @Egyptian_Coder Send it back. — Tom Blodget, Jun 24 '16 at 17:18
@TomBlodget: In an asm homework assignment, you can assume ASCII. In this specific case, it's supposed to be a decimal number, which means it definitely is the ASCII subset of UTF-8. I wrote that sentence as a warning that the "normal" (easy) way of addressing strings assumes one byte per char, not as a suggestion that he should find out if he needs to actually handle UTF-8 in asm. — Peter Cordes, Jun 24 '16 at 17:38

score -1 · Answer 2 · answered Jun 23 '16 at 10:11

Well, I expect this will not even compile (for example cmp %rax,0 is not a valid combination in AT&T syntax, this looks like you would want Intel syntax).

And there are some things which don't make any sense, like not S ... what do you think that would do? If you would annotate it as byte ptr, it would invert the '<' character (actually why do you have '<' and '>' in the S string is confusing me too).

etc, etc...

So first try to compile it, then open it in debugger, step instruction by instruction, and keep looking at CPU registers and memory and instruction reference guide... till it will makes sense... may take some time, but actually not that long maybe, few days and you will get hang of it.

`cmp %rax, 0` is a compare with an absolute addressing mode. `cmp` is available with a memory operand as the first or second operand, so that should assemble, but not to what the OP probably wanted. — Peter Cordes, Jun 23 '16 at 12:29
Please don't pollute the site with generic answers. Either provide a full one, comment the question or flag it as unanswerable. — Margaret Bloom, Jun 23 '16 at 12:39

Storage and String manipulation in x86 Assembly Language

2 Answers2