1

Why I get the error:

Segmentation fault (core dumped)

Here is the assembly code:

.intel_syntax noprefix
    
.data

    message: .asciz "Hello World!\n"

.text

.global main

main:
    lea rdi, message
    call printf

    ret
Jester
  • 52,795
  • 4
  • 67
  • 108
  • 2
    You get a segfault because 1) you do not maintain stack aligned to 16 bytes and 2) you do not zero the AL register to indicate no vector registers used. Incidentally, the earlier revision did not have problem #1 which then also makes #2 irrelevant so that should have worked. – Jester Apr 17 '21 at 12:32

1 Answers1

2

The problem

System V ABI requires you to align your stack at 16-byte before you call a function. In order to make it easy, the ABI guarantees that "On function entry, if you sub your stack pointer with 8 * n (n is an odd number), your stack will be 16-byte aligned".

If you don't follow this calling convention, other libraries may crash because they can't align their stack frame properly if they need to use instructions that need special alignment, like movdqa for example.

Solution

ammarfaizi2@integral:/tmp/test_asm$ cat test.S
.intel_syntax noprefix
    
.data

    message: .asciz "Hello World!\n"

.text

.global main

main:
    sub rsp, 8
    xor eax, eax
    lea rdi, [rip + message]
    call printf
    add rsp, 8
    ret
ammarfaizi2@integral:/tmp/test_asm$ gcc test.S -o test
ammarfaizi2@integral:/tmp/test_asm$ ./test
Hello World!
ammarfaizi2@integral:/tmp/test_asm$ 

Recommendation

If you call a function and the next thing you do is ret, you can simplify the code with tail call. It uses jmp to the target function to be called. Make sure you undo the current function stack frame before jump if you setup it before.

To support PIE and PIC, consider to use RIP relative addressing to access static storage. It also improves security. Compilers these days usually compile the target to PIE by default.

This part is the example of accessing static storage with RIP relative addressing:

lea rdi, [rip + message]

Execution

ammarfaizi2@integral:/tmp/test_asm$ cat test.S
.intel_syntax noprefix
    
.data

    message: .asciz "Hello World!\n"

.text

.global main

main:
    xor eax, eax
    lea rdi, [rip + message]
    jmp printf

ammarfaizi2@integral:/tmp/test_asm$ gcc test.S -o test
ammarfaizi2@integral:/tmp/test_asm$ ./test
Hello World!
ammarfaizi2@integral:/tmp/test_asm$ 

Edit

Added xor eax, eax for safety. See: glibc scanf Segmentation faults when called from a function that doesn't align RSP

Ammar Faizi
  • 980
  • 2
  • 9
  • 20
  • 1
    Nice point about the tail call. It may be worth adding that you changed the `lea` instruction to `lea rdi, [rip + message]` to allow the creation of a PIE executable (otherwise a relocation error would arise). Another solution would have been to instruct GCC to create an EX ELF with `--static` (PIE are SH ELFs with non-overridable symbols). – Margaret Bloom Apr 17 '21 at 14:06
  • @MargaretBloom oh right, I added the info about RIP relative addressing. – Ammar Faizi Apr 17 '21 at 14:10
  • 1
    It's still unsafe to leave AL unset. Older GCC versions compile variadic functions to use AL for a computed jump into a sequence of `movaps` stores. (Current GCC, and thus current builds of libc, just check 0 / non-0 to conditionally run all 8 or not, so the ABI violation of possibly having AL > 8 doesn't in practice cause problems.) Anyway, you should use `xor eax, eax`. Or use `puts` which isn't variadic. – Peter Cordes Apr 17 '21 at 21:25
  • 1
    Aligned stores to dump the possible XMM args for variadic functions are the usual reason for printf to fault (with AL!=0), but code-gen for some other functions does sometimes include 16-byte aligned load or store. e.g. [glibc scanf Segmentation faults when called from a function that doesn't align RSP](https://stackoverflow.com/q/51070716) – Peter Cordes Apr 17 '21 at 21:28
  • @PeterCordes is it ok if we use `xor al, al` instead of `xor eax, eax`? – Ammar Faizi Apr 18 '21 at 14:41
  • I think like, why should we clear 32-bit (actually zero extends to 64-bit), while the callee only needs 8-bit part of it? – Ammar Faizi Apr 18 '21 at 14:50
  • 1
    @AmmarFaizi: It's ok for correctness, but worse for efficiency, to modify just the low byte of the existing RAX value instead of using a zeroing idiom to just write a new value to the register. [What is the best way to set a register to zero in x86 assembly: xor, mov or and?](https://stackoverflow.com/a/33668295) / [Why doesn't GCC use partial registers?](https://stackoverflow.com/a/41574531). `xor al,al` isn't special-cased as a a zeroing idiom on most CPUs, so `mov al,0` would actually be better on some. But still worse than `xor eax,eax`. – Peter Cordes Apr 18 '21 at 21:18