Malloc altering behavior of uninitialized variable in separate function?

Question

This is a question for my Programming Langs Concepts/Implementation class. Given the following C code snippet:

void foo()  
{ 
    int i; 
    printf("%d ", i++); 
} 
void main()  
{ 
    int j; 
    for (j = 1; j <= 10; j++)  
        foo(); 
}

The local variable i in foo is never initialized but behaves similarly to a static variable on most systems. Meaning the program will print 0 1 2 3 4 5 6 7 8 9. I understand why it does this (the memory location of i never changes) but the question in the homework asks to modify the code (without changing foo) to alter this behavior. I've come up with a solution that works and makes the program print ten 0's but I don't know if it's the "right" solution and to be honest I don't exactly know why it works.

Here is my solution:

void main()  
{ 
    int j; 
    void* some_ptr = NULL;
    for (j = 1; j <= 10; j++)
    {
        some_ptr = malloc(sizeof(void*));
        foo();
        free(some_ptr);
    }
}

My original thought process was that i wasn't changing locations because there was no other memory manipulation happening around the calls of foo, so allocating a variable should disrupt that, but ince some_ptr is allocated in the heap and i is on the stack, shouldn't the allocation of some_ptr have no effect on i? My thought is that the compiler is playing some games with the optimization of that subroutine call, could anyone clarify?

What was the exact question you were asked? There is no way that `i` in `foo()` can be initialised from elsewhere except by some kind of fudge that makes it appear that the undefined behaviour behaves like you *think it should have*. In C, `i` will not be initialised on *any* systems, certainly not *most*. — Weather Vane, Apr 12 '16 at 00:30
@WeatherVane The exact wording of the question was: Consider the following C program: [code block as above] Local variable i in subroutine foo is never initialized. On many systems, however, variable i appears to “remember” its value between the calls to foo, and the program will print 0 1 2 3 4 5 6 7 8 9. (a) Suggest an explanation for this behavior. (b) Change the code above (without modifying function foo) to alter this behavior. — zwaller, Apr 12 '16 at 00:37
My test does not print `0` thru `9`. It's undefined behaviour. So all you can do is to change it to another undefined behaviour. It has nothing to do with `malloc` and everything to do with overwriting the stack location where `i` is stored when `foo` is called. — Weather Vane, Apr 12 '16 at 00:39
See [What should `main()` return in C and C++](http://stackoverflow.com/questions/204476/). — Jonathan Leffler, Apr 12 '16 at 00:42
That's what I was afraid of, it's not a terribly well conceived question for a homework. As of the undefined behavior I've only tested it on my machine with gcc and it did print 0-9. Anyways, thank you — zwaller, Apr 12 '16 at 00:43
There lies the danger - your lovely project prints `0..9` every time, until you demonstrate it on your funders' computer. Then it will print `14698336 14698337 14698338 14698339 14698340 14698341 14698342 14698343 14698344 14698345`. How rash of your professor to say otherwise. — Weather Vane, Apr 12 '16 at 00:48

wallyk · Accepted Answer · 2016-04-12T01:01:45.197

3

There cannot be a "right" solution. But there can be a class of solutions which work for a particular CPU architecture, ABI, compiler, and compiler options.

Changing the code to something like this will have the effect of altering the memory above the stack in a way which should affect many, if not most, environments in the targeted way.

void foo()  
{ 
    int i; 
    printf("%d ", i++); 
} 
void main()  
{ 
    int j;
    int a [2];

    for (j = 1; j <= 10; j++)
    {
        foo();
        a [-5] = j * 100;
    }
}

Output (gcc x64 on Linux):

0 100 200 300 400 500 600 700 800 900

a[-5] is the number of words of stack used for overhead and variables spanning the two functions. There is the return address, saved stack link value, etc. The stack likely looks like this when foo() writes to a[-5]:

i
saved stack link
return address
main's j
(must be something else)
main's a[]

I guessed -5 on the second try. -4 was my first guess.

edited Apr 12 '16 at 01:01

answered Apr 12 '16 at 00:46

wallyk

53,902
14
79
135

This is a very good answer, I feel a bit better knowing that the "right" answer is pretty much up to interpretation. Very cool what you did with the array, by the way. – zwaller Apr 12 '16 at 00:58
What the heck is portable undefined behaviour? `a [-5]` is also undefined behaviour. Affecting one UB by another, can only teach OP how not to code. – Weather Vane Apr 12 '16 at 01:08
1

@WeatherVane: This question is not about *proper coding methodologies*, but more about *How can something like this happen?* It can happen by an errant pointer use, array bounds error, or corrupting the stack. Since so many implementations use the stack the same way, this technique is somewhat *portable* though I did not say that. Suppose you wanted to write a virus: what kind of *non undefined behavior* would you use? :-) – wallyk Apr 12 '16 at 01:24
True, you didn't use the word "portable" but that is what "should affect many, if not most, environments" means. – Weather Vane Apr 12 '16 at 01:34

score 2 · Answer 2 · answered Apr 12 '16 at 00:51

When you call foo() from main(), the (uninitialized) variable i is allocated at a memory address. In the original code, it so happens that it is zero (on your machine, with your compiler, and your chosen compilation options, your environment settings, and given the current phase of the moon — it might change when any of these, or a myriad other factors, changes).

By calling another function before calling foo(), you allow the other function to overwrite the memory location that foo() will use for i with a different value. It isn't guaranteed to change; you could, by bad luck, replace the zero with another zero.

You could perhaps use another function:

static void bar(void)
{
    int j;
    for (j = 10; j < 20; j++)
        printf("%d\n", j);
}

and calling that before calling foo() will change the value in i. Calling malloc() changes things too. Calling pretty much any function will probably change it.

However, it must be (re)emphasized that the original code is laden with undefined behaviour, and calling other functions doesn't make it any less undefined. Anything can happen and it is valid.

This makes a lot of sense. I went with the same idea but instead of the loop in `bar` I only had `int j = 5;`. The program then printed all 5's. — zwaller, Apr 12 '16 at 01:13

score 1 · Answer 3 · answered Apr 12 '16 at 00:33

The variable i in foo is simply uninitialized, and uninitialized value have indeterminate value upon entering the block. The way you saw it print certain value is entirely by coincident, and to write standard conforming C, you should never rely on such behavior. You should always initialize automatic variables before using it.

From c11std 6.2.4p6:

For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration or compound literal is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.

The question wasn't whether the code conforms to standard C or not, and by all accounts, it's terrible code. The class I am taking is a precursor to compiler design and the fact that the above code is entirely legal has to be dealt with when designing a compiler. I understand good programming standards and don't appreciate the patronizing response. — zwaller, Apr 12 '16 at 00:41
I find it curious that if a "goto" jumps from a point after a variable is used to a point before it's declared, and a second "goto" skips its declaration, the value remains defined, but if the declaration is encountered in execution order, the value becomes indeterminate at that time. I wonder if that was intentional or an oversight? — supercat, Apr 13 '16 at 18:53

score 1 · Answer 4 · answered Apr 12 '16 at 00:57

The reason the uninitialized value seems to keep its value from past calls is that it is on the stack and the stack pointer happens to have the same value every time the function is called.

The reason your code might be changing the value is that you started calling other functions: malloc and free. Their internal stack variables are using the same location as i in foo().

As for optimization, small programs like this are in danger of disappearing entirely. GCC or Clang might decide that since using an uninitialized variable is undefined behavior, the compiler is within its rights to completely remove the code. Or it might put i in a register set to zero. Then decide all printf calls output zero. Then decide that your entire program is simply a single puts("0000000000") call.

Malloc altering behavior of uninitialized variable in separate function?

4 Answers4