1

In C when you have a function that returns a pointer to one of it's local (on the stack) variables the calling function gets null returned instead. Why does that happen?

I can do this in C on my hardware

void A() {
    int A = 5;
}

void B() {
    // B will be 5 even when uninitialised due to the B stack frame using
    // the old memory layout of A
    int B;
    printf("%d\n", B);
}

int main() {
    A();
    B();
}

Due to the fact that the stack frame memory doesn't get reset and B overlays A's memory record in the stack.

However I can't do

int* C() {
    int C = 10;
    return &C;
}

int main() {
    // D will be null ?
    int* D = C();
}

I know I shouldn't do this code, it's UB, is different on different hardware, compilers could optimize it to change the behaviour of the example, and it will get clobbered when we next call another function in this example anyway.

But I was wondering why specifically D is null when compiled with GCC and why I get a segmentation fault if I try and access that memory address, shouldn't the bits still be there?

Is it the compiler doing this?

Atrox1449
  • 63
  • 1
  • 7
  • 4
    This is all undefined behaviour. Your program does not comply with the rules of the C language therefore none of the guarantees in said rules apply . – M.M Jun 04 '20 at 20:59
  • 1
    My guess is - the compiler was doing you a favor by making your program crash sooner rather than later. I've just tried the scenario in MSVC 2019 - got a warning and the value of D was not NULL. The compiler sees you're returning a local address and knows how wrong that is. If you returned a local address via a more subtle mechanism, you might've got the actual pointer to the no longer valid stack location. – Seva Alekseyev Jun 04 '20 at 21:02
  • If you have compiler optimizations active, then most of your assumptions will likely be wrong. For example, any good optimizer will optimize the line `int A = 5;` away, as the result is not used. Also, the compiler is allowed to treat any code paths which cause undefined behavior as unreachable, which means that it is also allowed to simply optimize them away. I suggest you read [this answer by Microsoft Blogger Raymond Chen](https://stackoverflow.com/a/9452284) for more information. – Andreas Wenzel Jun 04 '20 at 21:08

2 Answers2

6

GCC sees the undefined behaviour (UB) visible at compile time and decides to just return NULL on purpose. This is good: noisy failure right away on first use of a value is easier to debug. Returning NULL was a new feature somewhere around GCC5; as @P__J__'s answer shows on Godbolt, GCC4.9 prints non-null stack addresses.

Other compilers may behave differently, but any decent compile will warn about this error. See also What Every C Programmer Should Know About Undefined Behavior

Or with optimization disabled, you could use a tmp variable to hide the UB from the compiler. Like int *p = &C; return p; because gcc -O0 doesn't optimize across statements. (Or with optimization enabled, make that pointer variable volatile to launder a value through it, hiding the source of the pointer value from the optimizer.)

#include <stdio.h>

int* C() {
    int C = 10;
    int *volatile p = &C;    // volatile pointer to plain int
    return p;                // still UB, but hidden from the compiler
}

int main()
{
    int* D = C();
    printf("%p\n", (void *)D);
    if (D){
        printf("%#x\n", *D);   // in theory should be passing an unsigned int for %x
    }
}

Compiling and running on the Godbolt compiler explorer, with gcc10.1 -O3 for x86-64:

0x7ffcdbf188e4
0x7ffc

Interestingly, the dead store to int C optimized away, although it does still have an address. It has its address taken, but the var holding the address doesn't escape the function until int C goes out of scope at the same time that address is returned. Thus no well-defined accesses to the 10 value are possible, and it is valid for the compiler to make this optimization. Making int C volatile as well would give us the value.

The asm for C() is:

C:
        lea     rax, [rsp-12]            # address in the red-zone, below RSP
        mov     QWORD PTR [rsp-8], rax   # store to a volatile local var, also in the red zone
        mov     rax, QWORD PTR [rsp-8]   # reload it as return value
        ret

The version that actually runs is inlined into main and behaves similarly. It's loading some garbage value from the callstack that was left there, probably the top half of an address. (x86-64's 64-bit addresses only have 48 significant bits. The low half of the canonical range always has 16 leading zero bits).

But it's memory that wasn't written by main, so perhaps an address used by some function that ran before main.


// B will be 5 even when uninitialised due to the B stack frame using
// the old memory layout of A
int B;

Nothing about that is guaranteed. It's just luck that that happens to work out when optimization is disabled. With a normal level of optimization like -O2, reading an uninitialized variable might just read as 0 if the compiler can see that at compile time. Definitely no need for it to load from the stack.

And the other function would have optimized away a dead store.

GCC also warns for use-uninitialized.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
2

It is an undefined behaviour (UB) but many modern compilers when they detect it return the reference to the automatic storage variable return NULL as a precaution (for example newer versions of gcc).

example here: https://godbolt.org/z/H-zU4C

0___________
  • 34,740
  • 4
  • 19
  • 48
  • Returning the address of an automatic variable wouldn't invoke Undefined Behavior if the caller made no attempt to do anything with the pointer thus returned. The pointer's value would be Indeterminate when control returns to the caller, but since the pointer would not have become Indeterminate until after control reached the `return` statement, the `return` statement itself would not invoke UB, and if the function were invoked without using the return value, there would be no UB in the caller either. – supercat Jun 07 '20 at 17:31
  • it is quite obvious that if you do not use this pointer then no UB. This case not interesting if we analyse the code. NULL pointers are OK unless we derefernce them etc etc. Typical nitpick – 0___________ Jun 07 '20 at 17:35