related: How to remove "noise" from GCC/clang assembly output? The .cfi
directives are not directly useful to you, and the program would work without them. (It's stack-unwind info needed for exception handling and backtraces, so -fomit-frame-pointer
can be enabled by default. And yes, gcc emits this even for C.)
As far as the number of asm source lines needed to produce a value Hello World program, obviously we want to use libc functions to do more work for us.
@Zwol's answer has the shortest implementation of your original C code.
Here's what you could do by hand, if you don't care about the exit status of your program, just that it prints your string.
# Hand-optimized asm, not compiler output
.globl main # necessary for the linker to see this symbol
main:
# main gets two args: argv and argc, so we know we can modify 8 bytes above our return address.
movl $.LC0, 4(%esp) # replace our first arg with the string
jmp puts # tail-call puts.
# you would normally put the string in .rodata, not leave it in .text where the linker will mix it with other functions.
.section .rodata
.LC0:
.asciz "Hello world" # asciz zero-terminates
The equivalent C (you just asked for the shortest Hello World, not one that had identical semantics):
int main(int argc, char **argv) {
return puts("Hello world");
}
Its exit status is implementation-defined but it definitely prints. puts(3)
returns "a non-negative number", which could be outside the 0..255 range, so we can't say anything about the program's exit status being 0 / non-zero in Linux (where the process's exit status is the low 8 bits of the integer passed to the exit_group()
system call (in this case by the CRT startup code that called main()).
Using JMP to implement the tail-call is a standard practice, and commonly used when a function doesn't need to do anything after another function returns. puts()
will eventually return to the function that called main()
, just like if puts() had returned to main() and then main() had returned. main()'s caller still has to deal with the args it put on the stack for main(), because they're still there (but modified, and we're allowed to do that).
gcc and clang don't generate code that modifies arg-passing space on the stack. It is perfectly safe and ABI-compliant, though: functions "own" their args on the stack, even if they were const
. If you call a function, you can't assume that the args you put on the stack are still there. To make another call with the same or similar args, you need to store them all again.
Also note that this calls puts()
with the same stack alignment that we had on entry to main()
, so again we're ABI-compliant in preserving the 16B alignment required by modern version of the x86-32 aka i386 System V ABI (used by Linux).
.string
zero-terminates strings, same as .asciz
, but I had to look it up to check. I'd recommend just using .ascii
or .asciz
to make sure you're clear on whether your data has a terminating byte or not. (You don't need one if you use it with explicit-length functions like write()
)
In the x86-64 System V ABI (and Windows), args are passed in registers. This makes tail-call optimization a lot easier, because you can rearrange args or pass more args (as long as you don't run out of registers). This makes compilers willing to do it in practice. (Because as I said, they currently don't like to generate code that modifies the incoming arg space on the stack, even though the ABI is clear that they're allowed to, and compiler generated functions do assume that callees clobber their stack args.)
clang or gcc -O3 will do this optimization for x86-64, as you can see on the Godbolt compiler explorer:
#include <stdio.h>
int main() { return puts("Hello World"); }
# clang -O3 output
main: # @main
movl $.L.str, %edi
jmp puts # TAILCALL
# Godbolt strips out comment-only lines and directives; there's actually a .section .rodata before this
.L.str:
.asciz "Hello World"
Static data addresses always fit in the low 31 bits of address-space, and executable don't need position-independent code, otherwise the mov
would be lea .LC0(%rip), %rdi
. (You'll get this from gcc if it was configured with --enable-default-pie
to make position-independent executables.)
How to load address of function or label into register in GNU Assembler
Hello World using 32-bit x86 Linux int 0x80
system calls directly, no libc
See Hello, world in assembly language with Linux system calls? My answer there was originally written for SO Docs, then moved here as a place to put it when SO Docs closed down. It didn't really belong here so I moved it to another question.
related: A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux. The smallest binary file you can run that just makes an exit() system call. That is about minimizing the binary size, not the source size or even just the number of instructions that actually run.