30

This is my assembly level code ...

section .text
global _start
_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, size
        int 0x80
exit:   mov eax, 1
        int 0x80
section .data
mesg    db      'KingKong',0xa
size    equ     $-mesg

Output:

root@bt:~/Arena# nasm -f elf a.asm -o a.o
root@bt:~/Arena# ld -o out a.o
root@bt:~/Arena# ./out 
KingKong

My question is What is the global _start used for? I tried my luck with Mr.Google and I found that it is used to tell the starting point of my program. Why cant we just have the _start to tell where the program starts like the one given below which produces a kinda warning on the screen

section .text
_start: mov eax, 4
        mov ebx, 1
        mov ecx, mesg
        mov edx, size
        int 0x80
exit:   mov eax, 1
        int 0x80
section .data
mesg    db      'KingKong',0xa
size    equ     $-mesg

root@bt:~/Arena# nasm -f elf a.asm
root@bt:~/Arena# ld -e _start -o out a.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048080
root@bt:~/Arena# ld -o out a.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048080
Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
vikkyhacks
  • 2,949
  • 6
  • 30
  • 43

4 Answers4

45

global directive is NASM specific. It is for exporting symbols in your code to where it points in the object code generated. Here you mark _start symbol global so its name is added in the object code (a.o). The linker (ld) can read that symbol in the object code and its value so it knows where to mark as an entry point in the output executable. When you run the executable it starts at where marked as _start in the code.

If a global directive missing for a symbol, that symbol will not be placed in the object code's export table so linker has no way of knowing about the symbol.

If you want to use a different entry point name other than _start (which is the default), you can specify -e parameter to ld like:

ld -e my_entry_point -o out a.o
Sedat Kapanoglu
  • 43,149
  • 22
  • 112
  • 140
  • 4
    `_start` is just the one ld "knows" by default. `ld -o out a.o -e _main` would work. – Frank Kotler Jul 27 '13 at 14:59
  • @ssg thanks for your answer tough, I can clearly get your point that I can change _start to _main or whatever I want, But I cannot understand why use the `global _start` line ? Why cant the linker just search the _start in my program and set that the start point for execution ? Why use that directive `global` ? – vikkyhacks Jul 27 '13 at 15:36
  • 1
    @vikkyhacks the reason is that no symbols are put in the object code unless you specifically add `global` directive for them. otherwise they are all considered local directives and discarded during assembling. – Sedat Kapanoglu Jul 27 '13 at 15:55
  • 2
    I'd suggest using `ld -e my_entry_point` in your example. It won't generally work to do that with a `main()` generated by a C compiler, so using `_main` is just going to be confusing to people that don't understand how all the pieces fit together yet. – Peter Cordes Apr 14 '16 at 03:44
5

A label is not explicitly global until you declare it to be global so you have to use the global directive.

The global label "_start" is needed by the linker, if there is no global _start address then the linker will complain because it cant find one. You didnt declare _start as a global so it is not visible outside that module/object of code so not visible to the linker.

This is the opposite of C where things are implied to be global unless you declare them to be local

unsigned int hello;
int fun ( int a )
{
  return(a+1);
}

hello and fun are global, visible outside the object, but this

static unsigned int hello;
static int fun ( int a )
{
  return(a+1);
}

makes them local not visible.

all local:

_start:
hello:
fun:
more_fun:

these are now global available to the linker and other objects

global _start
_start:
global hello
hello:
...
old_timer
  • 62,459
  • 8
  • 79
  • 150
  • 1
    understand that these directives are specific to the assembler, the program that assembles the assembly language into machine code. Assembly languages generally do not have standards, so each assembler can do its own thing, note "intel format" vs "at&t format" being an extreme for the same instruction set. Likewise some may want "global" and others may require ".global" for example. So you are learning the nuances of the toolchain not necessarily the instruction set. – old_timer Jul 27 '13 at 15:26
  • well that really makes it very hard to digest, We have the concept of local and global variables in C because of the functions that are used, are there scopes in assembly language (well, correct me if I am wrong, I have just started assembly) Why cant the linker just search the _start in my program and set that the start point for execution ? What info does it lack to do that ? – vikkyhacks Jul 27 '13 at 15:39
  • 3
    @vikkyhacks, I guess you can think of labels in assembly as "static" symbols in a C context, at least by default. That is, they're only usable at file/translation unit scope. Defining a label with `.global` makes the assembler export it (add it to the symbol table of the object) so that the linker can find it later for use in other translation units (or for program startup, in your case). – Carl Norum Jul 27 '13 at 16:09
  • local and global to C is relative to the context, particularly the object made from a single C file (or function). No different with assembly, the label/variable is relative to the context, the object made from that single source file. – old_timer Jul 27 '13 at 16:37
  • 1
    Just like in C or any other language the linker only "searches" the object for labels/variables/functions defined as GLOBAL. has nothing to do with the source language, for that particular language you have to define that label/variable/function/etc as local to the object or context or global for everyone to use (outside the object) – old_timer Jul 27 '13 at 16:40
  • the linker could search that object and find it but that would be a violation of the linkers job. At best the linker could give you an error "_start found but not global" or something like that. ld is open source you are welcome to add such a feature. – old_timer Jul 27 '13 at 16:41
5

_start is used by the default Binutils' ld linker script as the entry point

We can see the relevant part of that linker script with:

 ld -verbose a.o | grep ENTRY

which outputs:

ENTRY(_start)

The ELF file format (and other object format I suppose), explicitly say which address the program will start running at through the e_entry header field.

ENTRY(_start) tells the linker to set that entry the address of the symbol _start when generating the ELF file from object files.

Then when the OS starts running the program (exec system call on Linux), it parses the ELF file, loads the executable code into memory, and sets the instruction pointer to the specified address.

The -e flag mentioned by Sedat overrides the default _start symbol.

You can also replace the entire default linker script with the -T <script> option, here is a concrete example that sets up some bare metal assembly stuff.

.global is an assembler directive that marks the symbol as global in the ELF file

The ELF file contains some metadata for every symbol, indicating its visibility.

The easiest way to observe this is with the nm tool.

For example in a Linux x86_64 GAS freestanding hello world:

main.S

.text
.global _start
_start:
asm_main_after_prologue:
    /* write */
    mov $1, %rax   /* syscall number */
    mov $1, %rdi   /* stdout */
    lea msg(%rip), %rsi  /* buffer */
    mov $len, %rdx /* len */
    syscall

    /* exit */
    mov $60, %rax   /* syscall number */
    mov $0, %rdi    /* exit status */
    syscall
msg:
    .ascii "hello\n"
    len = . - msg

GitHub upstream

compile and run:

gcc -ffreestanding -static -nostdlib -o main.out main.S
./main.out

nm gives:

00000000006000ac T __bss_start
00000000006000ac T _edata
00000000006000b0 T _end
0000000000400078 T _start
0000000000400078 t asm_main_after_prologue
0000000000000006 a len
00000000004000a6 t msg

and man nm tells us that:

If lowercase, the symbol is usually local; if uppercase, the symbol is global (external).

so we see that _global is visible externally (upper case T), but the msg which we didn't mark as .global isn't (lower case t).

The linker then knows how to blow up if multiple global symbols with the same name are seen, or do smarter things is more exotic symbol types are seen.

If we don't mark _start as global, ld becomes sad and says:

cannot find entry symbol _start

0

global _start is just a label that points to a memory address.In the case of _start when it comes to ELF binaries it is the default label used that acts as the address where the program starts.

There is also main or _main or main_ is known to the C language, and is called by "startup code" which is "usually" linked to - if you're using C.

Hope this helps.

Moksh
  • 13
  • 4