61

I saw a snippet of code on CodeGolf that's intended as a compiler bomb, where main is declared as a huge array. I tried the following (non-bomb) version:

int main[1] = { 0 };

It seems to compile fine under Clang and with only a warning under GCC:

warning: 'main' is usually a function [-Wmain]

The resulting binary is, of course, garbage.

But why does it compile at all? Is it even allowed by the C specification? The section that I think is relevant says:

5.1.2.2.1 Program startup

The function called at program startup is named main. The implementation declares no prototype for this function. It shall be defined with a return type of int and with no parameters [...] or with two parameters [...] or in some other implementation-defined manner.

Does "some other implementation-defined manner" include a global array? (It seems to me that the spec still refers to a function.)

If not, is it a compiler extension? Or a feature of the toolchains, that serves some other purpose and they decided to make it available through the frontend?

Community
  • 1
  • 1
Theodoros Chatzigiannakis
  • 26,988
  • 8
  • 61
  • 97
  • 1
    It **doesn't** compile. ISO C forbids zero sized arrays. – Jens Jan 13 '16 at 11:00
  • 8
    It's not allowed by the C specification. Compilers often implement stuff not covered by the specification. – M.M Jan 13 '16 at 11:06
  • Related question: [How can a program with a global variable called main instead of a main function work?](http://stackoverflow.com/q/32851184/1708801). I think also inspired by a codegolf question. – Shafik Yaghmour Oct 06 '16 at 20:21
  • @M.M Especially in the case of [Malbolge](http://web.archive.org/web/20000815230017/http:/www.mines.edu/students/b/bolmstea/malbolge/) – MilkyWay90 Jun 21 '19 at 21:56

6 Answers6

42

It's because C allows for "non-hosted" or freestanding environment which doesn't require the main function. This means that the name main is freed for other uses. This is why the language as such allows for such declarations. Most compilers are designed to support both (the difference is mostly how linking is done) and therefore they don't disallow constructs that would be illegal in hosted environment.

The section you refers to in the standard refers to hosted environment, the corresponding for freestanding is:

in a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined. Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.

If you then link it as usual it will go bad since the linker normally has little knowledge about the nature of the symbols (what type it has or even if it's a function or variable). In this case the linker will happily resolve calls to main to the variable named main. If the symbol is not found it will result in link error.

If you're linking it as usual you're basically trying to use the compiler in hosted operation and then not defining main as you're supposed to means undefined behavior as per appendix J.2:

the behavior is undefined in the following circumstances:

  • ...
  • program in a hosted environment does not define a function named main using one of the specified forms (5.1.2.2.1)

The purpose of the freestanding possibility is to be able to use C in environments where (for example) standard libraries or CRT initialization is not given. This means that the code that is run before main is called (that's the CRT initialization that initializes the C runtime) might not provided and you would be expected to provide that yourself (and you may decide to have a main or may decide not to).

skyking
  • 12,561
  • 29
  • 47
  • This compiles and links fine (well, with a warning) with gcc 4.9.3 on cygwin: `int f(int argc,char **argv) { return 0; } char *main = (char *)f;` – Peter - Reinstate Monica Jan 13 '16 at 11:05
  • @PeterA.Schneider But if it runs OK it's just pure luck. The CRT-init will try to call `main` which is where the pointer is stored and not what it points at. – skyking Jan 13 '16 at 11:11
  • It links but segfaults. Btw, I don't think the question has much to do with "freestanding". For example, the following compiles and links (to a dll) in VS13: `namespace Main_abused { class Program { int Main = 0; } } `. It's rather that main (and Main in C#) are not keywords, and the C linkers are dumb, err, simple. – Peter - Reinstate Monica Jan 13 '16 at 11:14
  • @PeterA.Schneider I disagree, a program with `main` not being defined in a way different from what's required by the standard (or implementation specified) is malformed. – skyking Jan 13 '16 at 11:25
  • This isn't really accurate. The hosted section of C99/C11 has a retarded sentence "or in some other implementation-defined manner", which is completely unclear. So nobody really knows what forms of main that are allowed... [Discussed in detail here](http://stackoverflow.com/questions/204476/what-should-main-return-in-c-and-c/31263079#31263079). – Lundin May 13 '16 at 09:26
  • I really don't see the ambiguity in that sentence (except possibly whether you need to document the lack of additional forms). You mention that the rationale is not normative and neither is section 5.1.2.2.3 (regarding the signature of main) - so that they points in different directions does not mean an ambiguity. – skyking May 13 '16 at 10:08
24

If you are interested how to create program in main array: https://jroweboy.github.io/c/asm/2015/01/26/when-is-main-not-a-function.html. The example source there just contains a char (and later int) array called main which is filled with machine instructions.

The main steps and problems were:

  • Obtain the machine instructions of a main function from a gdb memory dump and copy it into the array
  • Tag the data in main[] executable by declaring it const (data is apparently either writable or executable)
  • Last detail: Change an address for actual string data.

The resulting C code is just

const int main[] = {
    -443987883, 440, 113408, -1922629632,
    4149, 899584, 84869120, 15544,
    266023168, 1818576901, 1461743468, 1684828783,
    -1017312735
};

but results in an executable program on a 64 bit PC:

$ gcc -Wall final_array.c -o sixth
final_array.c:1:11: warning: ‘main’ is usually a function [-Wmain]
 const int main[] = {
           ^
$ ./sixth 
Hello World!
Peter - Reinstate Monica
  • 12,309
  • 2
  • 29
  • 52
tymmej
  • 241
  • 1
  • 2
9

The problem is that main is not a reserved identifier. The C standard only says that in hosted systems there is usually a function called main. But nothing in the standard prevents you from abusing the same identifier for other sinister purposes.

GCC gives you a smug warning "main is usually a function", hinting that the use of the identifier main for other unrelated purposes isn't a brilliant idea.


Silly example:

#include <stdio.h>

int main (void)
{
  int main = 5;
  main:

  printf("%d\n", main);
  main--;

  if(main)
  {
    goto main;
  }
  else
  {
    int main (void);
    main();
  }
}

This program will repeatedly print the numbers 5,4,3,2,1 until it gets a stack overflow and crashes (don't try this at home). Unfortunately, the above program is a strictly conforming C program and the compiler can't stop you from writing it.

Lundin
  • 155,020
  • 33
  • 213
  • 341
8

main is - after compiling - just another symbol in an object file like many others (global functions, global variables, etc).

The linker will link the symbol main regardless of its type. Indeed, the linker cannot see the type of the symbol at all (he can see, that it isn't in the .text-section however, but he doesn't care ;))

Using gcc, the standard entry point is _start, which in turn calls main() after preparing the runtime environment. So it will jump to the address of the integer array, which usually will result in a bad instruction, segfault or some other bad behaviour.

This all of course has nothing to do with the C-standard.

Ctx
  • 17,064
  • 24
  • 33
  • 48
3

It only compiles because you don't use the proper options (and works because linkers sometimes only care for the names of symbols, not their type).

$ gcc -std=c89 -pedantic -Wall x.c
x.c:1:5: warning: ISO C forbids zero-size array ‘main’ [-Wpedantic]
 int main[0];
     ^
x.c:1:5: warning: ‘main’ is usually a function [-Wmain]
Jens
  • 61,963
  • 14
  • 104
  • 160
  • 2
    It still compiles and links. The only difference is that it warns you about that `main` is usually a function (then it continues and links anyway). – skyking Jan 13 '16 at 11:14
  • 1
    @skyking You want the compile/link to fail? Add `-Werror` then. – Jens Jan 13 '16 at 12:22
  • But then (other) valid C programs would fail to compile as well. – skyking Jan 13 '16 at 12:33
  • @skyking Then add `-Wno-*` for the warnings you chose to accept. More often than not, warnings are easy to fix and if they aren't, something is wrong with the code, IMNSHO. I use `-Werror` for years now and it has been proven valuable. New warnings are impossible to miss and must be fixed to continue. – Jens Jan 13 '16 at 12:45
  • 1
    I agree that `-Werror` and enabling warnings is a good idea, but that doesn't contradict the fact that doing so will cause the compiler failing to compile valid C programs. – skyking Jan 13 '16 at 12:46
  • @skyking Can you give an example of a valid C program that fails to compile? The GCC developers might want to have a look at it. – Jens Jan 08 '19 at 00:40
  • The whole idea of `-Werror` is to fail compilation if a warning is issued. Warnings can be issued on valid C programs. I'm quite sure the GCC developers don't want to have a look at that. – skyking Jan 08 '19 at 22:22
1
const int main[1] = { 0xc3c3c3c3 };

This compiles and executes on x86_64... does nothing just return :D

Zibri
  • 7,056
  • 2
  • 42
  • 38