8

C++ default initialization doesn't zero out variables with auto storage, why the special treatment for static storage variables?

Was it something defined by C and C++ just have to be compatible with? If that's the case why C decides to do zero-initialization?

If a file scope static variables is provided with a initializer, they will be zero-initialized first and then constant/dynamic initialized again. Isn't that redundant? For example the following code is from cppreference:http://en.cppreference.com/w/cpp/language/zero_initialization

#include <string>

double f[3]; // zero-initialized to three 0.0's
int* p;   // zero-initialized to null pointer value
std::string s; // zero-initialized to indeterminate value
               // then default-initialized to ""
int main(int argc, char* argv[])
{
    static int n = argc; // zero-initialized to 0
                         // then copy-initialized to argc
    delete p; // safe to delete a null pointer
}

In this case, why n can't be initialized to argc directly?

EDIT: Part of this question has been answered by the question here: Static variable initialization? But I don't think it's a duplicate because the answers in the other question didn't answer my second question, ie. why the 2 staged initialization. Besides, the title of the other post doesn't really say what exactly the question is.

Community
  • 1
  • 1
swang
  • 4,733
  • 4
  • 28
  • 48
  • See also http://www.youtube.com/watch?v=48kP_Ssg2eY it's a video from a D conference, but about C++ and covering this issue. – johannes Sep 09 '14 at 11:40
  • +1 on the non-duplicate. We are trying to figure out why we are catching a Valgrind finding for a variable declared at file scope with an initial value of 0 or false. The double-initialization you discuss may be the cause. We've moved it to the Valgrind list at [Uninitialized access findings on non-static file scope variables that's been initialized?](https://sourceforge.net/p/valgrind/mailman/message/34369193/). – jww Aug 12 '15 at 20:09
  • Plus, the cited duplicate is kind of a crappy, open-ended question. It asks about static variables in Java, C and C++. The answers provided don't have half the value of @TonyD's answer below. – jww Aug 12 '15 at 20:11

3 Answers3

13

The behaviour on the Operating Systems where C was developed has shaped these Standard stipulations. As applications load, the OS loader provides some memory for the BSS. It's desirable to clear it to zeros because if some other process had been using that memory earlier, the program you're starting could snoop on the prior process's memory contents, potentially seeing passwords, conversations or other data. Not every early or simple OS cares about this, but most do, so on most the initialisation is effectively "free" as it's a task the OS will do anyway.

Having this default of 0 makes it easy for the implementation to see refer to flags set during dynamic initialisation, as there will be no uninitialised memory read and consequent undefined behaviour. For example, given...

void f() { static int n = g(); }

...the compiler/implementation may implicitly add something like a static bool __f_statics_initialised variable too - which "luckily" defaults to 0 / false due to the zeroing behaviour - along with initialisation code akin to (a possibly thread safe version of)...

if (!__f_statics_initialised)
{
    n = g();
    __f_statics_initialised = true;
}

For the above scenario the initialisation is done on first call, but for global variables it's done in an unspecified per-object ordering, sometime before main() is invoked. In that scenario, having some object-specific initialisation code and dynamic initialisation able to differentiate statics in uninitialised state from those they know need to be set to non-zero values makes it easier to write robust start-up code. For example, functions can check if a non-local static pointer is still 0, and new an object for it if so.

It's also noteworthy that many CPUs have highly efficient instructions to zero out large swathes of memory.

Tony Delroy
  • 94,554
  • 11
  • 158
  • 229
  • Thanks, so zero-initialization actually means setting 0 at byte level, regardless of the type? – swang Sep 09 '14 at 11:22
  • There is nothing in C or C++ requiring a .bss to be present. The languages don't even require an OS: static initialization is done in the same manner no matter if the system is a hosted or freestanding system. – Lundin Sep 09 '14 at 11:25
  • 1
    @Lundin: sure, but the historical environment on which C first ran have shaped the language requirements of the implementation. – Tony Delroy Sep 09 '14 at 11:33
  • @swang: anything that shouldn't be set to all-bits-0 must be special-cased by the implementation... for example, if a `nullptr` for some type was actually `0xFFFF`, then the compiler would probably move it to the data segment where that value can be read in with the executable image. – Tony Delroy Sep 09 '14 at 11:35
7

Zero-initialization of globals comes "for free" because the storage for them is allocated in the "BSS" segment before main() starts. That is, when you access your pointer p, the pointer itself must be stored somewhere, and that somewhere is actually a specific chunk of bits in BSS. Since it must be initialized to something, why not zero?

Now, why don't automatic/stack variables do this? Because that would cost time: allocation on the stack is nothing more than incrementing (or decrementing, a matter of perspective) the stack pointer. Whatever garbage was there can be left there (according to C). Since we can't get zero-init for free, we don't get it at all (because again, it's C, where we don't like to pay for things we don't use).

Default-initializing a std::string or other class type is a bit more complex: C++ requires that it is initialized somehow, and the default constructor is of course the one that gets used, and yes, technically it is zero-initialized first, but as discussed that zero-init happened "for free." It might be permissible for an implementation which can sufficiently analyze std::string to determine at build time how to initialize its bits as if the default constructor were called, but I don't know if any implementation does that.

John Zwinck
  • 207,363
  • 31
  • 261
  • 371
  • 2
    `.bss` does not actually occupy storage on disk (what would be the point, seeing as it's all zero?); it just describes a memory range within which offsets can be assigned. – ecatmur Sep 09 '14 at 11:06
  • Thanks, but why in the above example variable n is initialized by two stages, rather than one? at compile time shouldn't the compiler figure out the value we want is argc and just store that in BSS and skip the zero out stage. – swang Sep 09 '14 at 11:07
  • @swang: Under the "as-if" rule, it could be initialised in a single stage, since there's nothing to observe it in the zero-initialised state. It just has to behave *as if* initialisation followed the rules laid out by the standard. (But it couldn't store the value in BSS, since that's always zero-initialised; and it couldn't store it in a static data section, since the value of `argc` is only known at runtime.) – Mike Seymour Sep 09 '14 at 11:09
  • @ecatmur, can you explain it a bit more please, where does the globals get stored then? – swang Sep 09 '14 at 11:11
  • 1
    @swang static variable storage gets allocated when the program is loaded from disk by the OS. There is no need for anything to be stored on disk, unless the initial value is non-zero. – ecatmur Sep 09 '14 at 11:14
  • @Mike Seymour, I see, since like Tony D said BSS is always zeroed by the OS, you always get the zero initialization stage for free anyway. And if n was assigned a constant, like static int n = 5, then 5 will be stored in the data section? In this case n will not be stored in BSS but the data section instead, right? – swang Sep 09 '14 at 11:20
  • 1
    @swang: Yes. `n` would be stored in the initialized data section. And the two-stage intialization of in-function statics is there for a reason: it's the only way to let you have an influence on the order of initialization. Otherwise it's undefined, leading often to hard-to-track-down bugs - worse, bugs that happen only on some builds. Initialized global variables are **more evil in C++ than in C** - for that very reason. – Kuba hasn't forgotten Monica Sep 22 '14 at 01:17
3

Global and static variables, in C, have a fixed memory address during the lifetime of the program. This enables the program launcher to initialize them by copying an appropriate memory region from the file executable to the computer memory.

As a consequence, C can (must) provide an initial value for every static/global variable. If the user does not provide any value the standard behavior is to use zero. Contrary to local variable, this does not increase neither memory nor speed of the application (since a value must be written anyway).

Eventually this behavior (copying static initial data into executable) can be very bad if you have large arrays without any initial data. In fact it seems that modern C compilers are able to avoid this waste and will zero-fill large arrays instead of storing zeros in the program exectuable. Nevertheless once the rule has been given, they are forced to fill the region even if the user might not need it. Anyway this is a very cheap operation which is performed once at program startup.

Emanuele Paolini
  • 8,978
  • 3
  • 32
  • 57
  • good point about the arrays, otherwise if I declare a static int[10000000000] it could make the executable extremely large.. – swang Sep 09 '14 at 11:25