Confusion about array initialization in C

Question

In C language, if initialize an array like this:

int a[5] = {1,2};

then all the elements of the array that are not initialized explicitly will be initialized implicitly with zeros.

But, if I initialize an array like this:

int a[5]={a[2]=1};

printf("%d %d %d %d %d\n", a[0], a[1],a[2], a[3], a[4]);

output:

1 0 1 0 0

I don't understand, why does a[0] print 1 instead of 0? Is it undefined behaviour?

Note: This question was asked in an interview.

A very deep question. I wonder if the interviewer knows the answer themselves. I don't. Indeed ostensibly the value of the expression `a[2] = 1` is `1`, but I'm not sure if you are are allowed to take the result of a designated initialiser expression as the value of the first element. The fact you've added the lawyer tag means I think we need an answer citing the standard. — Bathsheba, Sep 13 '18 at 06:59
Well if that's their favourite question, you may well have dodged a bullet. Personally I prefer a written programming exercise (with access to a compiler and debugger) to be taken over a few hours rather than "ace" style questions such as the above. I could *conject* an answer, but I don't think it would have any real factual basis. — Bathsheba, Sep 13 '18 at 07:04
@Bathsheba Especially if the answer to the [spin-off question](https://stackoverflow.com/questions/52307772/assignment-of-array-elements-in-array-initialization) concludes that this is not well-defined. Then this question is only answerable for the specific compiler. — Kami Kaze, Sep 13 '18 at 07:28
Shall we close this as a duplicate to the spin off question? — Bathsheba, Sep 13 '18 at 07:29
@Bathsheba I would do the opposite, as the answer here now answers both questions. — Kami Kaze, Sep 13 '18 at 07:36
@KamiKaze: I've asked the mods to merge them; no guarantee they will though. My feeling is that the other question is a little more canonical, if you get my meaning. But let's see what the bosses think! — Bathsheba, Sep 13 '18 at 07:37
@Bathsheba would be the best. Still I would give the credit for the question to OP, as he came up with the topic. But this is not for me to decide just what I feel would be "the right thing". — Kami Kaze, Sep 13 '18 at 07:39
@Bathsheba Designated initializers must start with `[` or `.`. `a[2] = 1` is a normal expression, no designator. — melpomene, Sep 13 '18 at 08:13
Maybe related to [Strange values while initializing array using designated initializers](https://stackoverflow.com/q/28813617/1708801) — Shafik Yaghmour, Sep 13 '18 at 13:16
I think that a[0] = 1 depends on the endianess of the CPU arquitecture.... — Rui F Ribeiro, Sep 13 '18 at 20:48
I'm going to remember this construct if I need it for [code golfing](https://codegolf.stackexchange.com/) sometime! — ErikF, Sep 14 '18 at 04:19
Ugh, I really wish employers would stop using code golf questions to vet candidates. So unproductive. — coderkevin, Sep 14 '18 at 05:25
Agreed that this is a very poor interview question. I would be happy not to have this job. — Steve Johnson, Sep 18 '18 at 20:07

melpomene · Accepted Answer · 2018-09-13T15:56:59.953

TL;DR: I don't think the behavior of int a[5]={a[2]=1}; is well defined, at least in C99.

The funny part is that the only bit that makes sense to me is the part you're asking about: a[0] is set to 1 because the assignment operator returns the value that was assigned. It's everything else that's unclear.

If the code had been int a[5] = { [2] = 1 }, everything would've been easy: That's a designated initializer setting a[2] to 1 and everything else to 0. But with { a[2] = 1 } we have a non-designated initializer containing an assignment expression, and we fall down a rabbit hole.

Here's what I've found so far:

a must be a local variable.
6.7.8 Initialization
1. All the expressions in an initializer for an object that has static storage duration shall be constant expressions or string literals.
a[2] = 1 is not a constant expression, so a must have automatic storage.
a is in scope in its own initialization.
6.2.1 Scopes of identifiers
1. Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator.
The declarator is a[5], so variables are in scope in their own initialization.
a is alive in its own initialization.
6.2.4 Storage durations of objects
1. An object whose identifier is declared with no linkage and without the storage-class specifier static has automatic storage duration.
2. For such an object that does not have a variable length array type, its lifetime extends from entry into the block with which it is associated until execution of that block ends in any way. (Entering an enclosed block or calling a function suspends, but does not end, execution of the current block.) If the block is entered recursively, a new instance of the object is created each time. The initial value of the object is indeterminate. If an initialization is specified for the object, it is performed each time the declaration is reached in the execution of the block; otherwise, the value becomes indeterminate each time the declaration is reached.
There is a sequence point after a[2]=1.
6.8 Statements and blocks
1. A full expression is an expression that is not part of another expression or of a declarator. Each of the following is a full expression: an initializer; the expression in an expression statement; the controlling expression of a selection statement (if or switch); the controlling expression of a while or do statement; each of the (optional) expressions of a for statement; the (optional) expression in a return statement. The end of a full expression is a sequence point.
Note that e.g. in int foo[] = { 1, 2, 3 } the { 1, 2, 3 } part is a brace-enclosed list of initializers, each of which has a sequence point after it.
Initialization is performed in initializer list order.
6.7.8 Initialization
1. Each brace-enclosed initializer list has an associated current object. When no designations are present, subobjects of the current object are initialized in order according to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union. [...]
1. The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject; all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.
However, initializer expressions are not necessarily evaluated in order.
6.7.8 Initialization
1. The order in which any side effects occur among the initialization list expressions is unspecified.

However, that still leaves some questions unanswered:

Are sequence points even relevant? The basic rule is:
6.5 Expressions
1. Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
a[2] = 1 is an expression, but initialization is not.

This is slightly contradicted by Annex J:
J.2 Undefined behavior
- Between two sequence points, an object is modified more than once, or is modified and the prior value is read other than to determine the value to be stored (6.5).
Annex J says any modification counts, not just modifications by expressions. But given that annexes are non-normative, we can probably ignore that.
How are the subobject initializations sequenced with respect to initializer expressions? Are all initializers evaluated first (in some order), then the subobjects are initialized with the results (in initializer list order)? Or can they be interleaved?

I think int a[5] = { a[2] = 1 } is executed as follows:

Storage for a is allocated when its containing block is entered. The contents are indeterminate at this point.
The (only) initializer is executed (a[2] = 1), followed by a sequence point. This stores 1 in a[2] and returns 1.
That 1 is used to initialize a[0] (the first initializer initializes the first subobject).

But here things get fuzzy because the remaining elements (a[1], a[2], a[3], a[4]) are supposed to be initialized to 0, but it's not clear when: Does it happen before a[2] = 1 is evaluated? If so, a[2] = 1 would "win" and overwrite a[2], but would that assignment have undefined behavior because there is no sequence point between the zero initialization and the assignment expression? Are sequence points even relevant (see above)? Or does zero initialization happen after all initializers are evaluated? If so, a[2] should end up being 0.

Because the C standard does not clearly define what happens here, I believe the behavior is undefined (by omission).

Instead of undefined I would argue that it's *unspecified*, which leave things open for interpretation by the implementations. — Some programmer dude, Sep 13 '18 at 08:24
I don't think any of the stuff before your last horizontal rule is relevant to the question; the only issue is when the zero initialization happens (which the standard doesn't seem to say). The same issue would be raised by `int a[5] = { a[2] = a[3] };` — M.M, Sep 13 '18 at 09:03
"we fall into a rabbit hole" LOL! Never heard that for an UB or unspecified stuff. — BЈовић, Sep 13 '18 at 09:16
I think 6.7.8.19 can be read in two ways, both of which give "valid" outcomes, but both of which are undesirable readings. 1) We consider `a[2] = 1` to initialize `a[2]` (even though it is explicitly not "an initializer" for `a[2]`) because it "sets an initial value" (as parenthetically mentioned in 5.1.2), which makes the outcome with `a[2] == 1` well-defined (because it was "initialized explicitly" and so mustn't be set to 0). This is undesirable because it sneaks in initialization without an initializer, bypassing all the careful words about how the order of initializers is not specified... — Jeroen Mostert, Sep 13 '18 at 12:18
... or 2) we consider `a[2] = 1` not to initialize by the narrow interpretation of "initialize" where an assignment does not "initialize", but then `a[2]` must be set to 0 by 6.7.8.19. This is undesirable because it clashes with the most obvious implementation of initialization (initialize the whole block to 0, then process initializers), puts an undue burden on the compiler, and makes the whole thing useless anyway. I would agree that by the way things are currently worded, the whole thing should probably be considered undefined by virtue of the standard not being explicit enough. — Jeroen Mostert, Sep 13 '18 at 12:19
In the above, for "the order of initializers", read "the order of side effects of initializers", of course. The order of the initializers is specified. — Jeroen Mostert, Sep 13 '18 at 12:25
It appears to me that the key question is ordering of implicit initialization with respect to initialization of elements for which there are explicit initializers. I do not take 6.7.8/19 to specify this, as "initializer-list order" is not meaningful for elements that have no corresponding initializer. The observed behavior follows from implicit initialization being performed before explicit initializers are evaluated. I find that a very reasonable -- indeed, likely -- implementation, but I agree that the standard does not specify this ordering. — John Bollinger, Sep 13 '18 at 15:37
@Someprogrammerdude I don't think it can be unspecified ("*behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance*") because the standard doesn't really provide any possibilities among which to choose. It simply doesn't say what happens, which I believe falls under "*Undefined behavior is [...] indicated in this International Standard [...] by the omission of any explicit definition of behavior.*" — melpomene, Sep 13 '18 at 15:52
@BЈовић It's a reference to Alice in Wonderland: https://www.merriam-webster.com/dictionary/rabbit%20hole — melpomene, Sep 13 '18 at 15:58
@BЈовић It's also a very nice description not only for undefined behaviour, but also for defined behaviour that needs a thread like this one to explain. — gnasher729, Sep 13 '18 at 22:40
@M.M OP asked two questions: "*why does a[0] print 1 instead of 0? Is it undefined behaviour?*" The first part of my reply answers both: The whole construct is undefined, but `a[0]` being 1 makes sense because `=` returns the value being assigned. The rest of my answer explains in detail why I think the behavior is undefined. — melpomene, Sep 14 '18 at 12:19
Would you say, then, that the program behavior would still be undefined if the initializer were instead `{ a[0] = 1 }`? — John Bollinger, Sep 14 '18 at 15:11
@JohnBollinger I think `int a[5] = { a[0] = 1 }` has defined behavior and must set `a[0]` to `1`. — melpomene, Sep 14 '18 at 19:08
Hmmm. I'm probing your logic, of course. I think I'm prepared to accept that the original exhibits UB, but I am not, at this point, prepared to accept that the alternative I proposed does not also do so. It, too, features two different side effects on the same object (`a[0]` in that case) whose relative order seems not to be defined. That the two produce the same effect on that object is irrelevant. Consider the statement `a[0] = a[0] = 1;`, for example. I think it clear that this exhibits UB according to the standard. — John Bollinger, Sep 14 '18 at 19:23
@JohnBollinger The difference is that you cannot actually initialize the `a[0]` subobject before evaluating its initializer, and evaluating any initializer includes a sequence point (because it's a "full expression"). Therefore I believe modifying the subobject we're initializing is fair game. — melpomene, Sep 14 '18 at 19:41

user694733 · Answer 2 · 2018-09-13T08:17:55.230

22

I don't understand, why does a[0] print 1 instead of 0?

Presumably a[2]=1 initializes a[2] first, and the result of the expression is used to initialize a[0].

From N2176 (C17 draft):

6.7.9 Initialization

The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified. ¹⁵⁴⁾

So it would seem that output 1 0 0 0 0 would also have been possible.

Conclusion: Don't write initializers that modifies the initialized variable on the fly.

edited Sep 13 '18 at 08:17

answered Sep 13 '18 at 07:28

user694733

13,861
1
40
62

1

That part does not apply: There is only one initializer expression here, so it doesn't need to be sequenced with anything. – melpomene Sep 13 '18 at 07:43
@melpomene There is the `{...}` expression which initializes `a[2]` to `0`, and `a[2]=1` sub-expression which initializes `a[2]` to `1`. – user694733 Sep 13 '18 at 07:47
1

`{...}` is a braced initializer list. It is not an expression. – melpomene Sep 13 '18 at 07:49
@melpomene Ok, you may be right there. But I would still argue there are still 2 competing side-effects so that paragraph stands. – user694733 Sep 13 '18 at 08:03
@melpomene there are two things to be sequenced: the first initializer, and the setting of other elements to 0 – M.M Sep 13 '18 at 09:01
@M.M Exactly, which is why the quote from 6.7.9/23 doesn't apply: It only covers sequencing among initializer expressions, not between an initializer expression and initialization of subobjects (which is what this question boils down to). – melpomene Sep 13 '18 at 15:55

Jonathan Leffler · Answer 3 · 2018-09-23T13:57:04.637

I think the C11 standard covers this behaviour and says that the result is unspecified, and I don't think C18 made any relevant changes in this area.

The standard language is not easy to parse. The relevant section of the standard is §6.7.9 Initialization. The syntax is documented as:

initializer:
                assignment-expression
                { initializer-list }
                { initializer-list , }
initializer-list:
                designation_opt initializer
                initializer-list , designation_opt initializer
designation:
                designator-list =
designator-list:
                designator
                designator-list designator
designator:
                [ constant-expression ]
                . identifier

Note that one of the terms is assignment-expression, and since a[2] = 1 is indubitably an assignment expression, it is allowed inside initializers for arrays with non-static duration:

§4 All the expressions in an initializer for an object that has static or thread storage duration shall be constant expressions or string literals.

One of the key paragraphs is:

§19 The initialization shall occur in initializer list order, each initializer provided for a particular subobject overriding any previously listed initializer for the same subobject;¹⁵¹⁾ all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration.

¹⁵¹⁾ Any initializer for the subobject which is overridden and so not used to initialize that subobject might not be evaluated at all.

And another key paragraph is:

§23 The evaluations of the initialization list expressions are indeterminately sequenced with respect to one another and thus the order in which any side effects occur is unspecified.¹⁵²⁾

¹⁵²⁾ In particular, the evaluation order need not be the same as the order of subobject initialization.

I'm fairly sure that paragraph §23 indicates that the notation in the question:

int a[5] = { a[2] = 1 };

leads to unspecified behaviour. The assignment to a[2] is a side-effect, and the evaluation order of the expressions are indeterminately sequenced with respect to one another. Consequently, I don't think there is a way to appeal to the standard and claim that a particular compiler is handling this correctly or incorrectly.

There is only one initialization list expression, so §23 is not relevant. — melpomene, Mar 05 '19 at 07:59

score 2 · Answer 4 · answered Sep 13 '18 at 09:43

2

My Understanding is a[2]=1 returns value 1 so code becomes

int a[5]={a[2]=1} --> int a[5]={1}

int a[5]={1} assign value for a[0]=1

Hence it print 1 for a[0]

For example

char str[10]={‘H’,‘a’,‘i’};


char str[0] = ‘H’;
char str[1] = ‘a’;
char str[2] = ‘i;

answered Sep 13 '18 at 09:43

Karthika

81
8

2

This is a [language-lawyer] question, but this is not an answer that works with the standard, thus making it irrelevant. Plus there are also 2 much more in-depth answers available and your answer does not seem to add anything. – Kami Kaze Sep 14 '18 at 08:20
I have a doubt.Is the concept I posted wrong?Could you clarify me with this? – Karthika Sep 14 '18 at 09:56
1

You just speculate for reasons, while there is a very good answer already given with relevant parts of the standard. Just saying how it could happen is not what the the question is about. It is about what the standard says should happen. – Kami Kaze Sep 14 '18 at 10:00
But the person who posted above question asked the reason and why does it happen? So only i dropped this answer.But concept is correct.Right? – Karthika Sep 14 '18 at 10:23
OP asked "*Is it undefined behaviour?*". Your answer doesn't say. – melpomene Sep 14 '18 at 12:12
There is no concept because it is undefined behaviour, speculation about how a possible end state is generated is meaning less because it might change on a whim. OP asked if this is handled by the standard but your answr does not cite the standard at all. – Kami Kaze Oct 04 '18 at 06:47

Battle · Answer 5 · 2018-09-14T13:58:06.317

1

I try to give a short and simple answer for the puzzle: int a[5] = { a[2] = 1 };

First a[2] = 1 is set. That means the array says: 0 0 1 0 0
But behold, given that you did it in the { } brackets, which are used to initialize the array in order, it takes the first value (which is 1) and sets that to a[0]. It is as if int a[5] = { a[2] }; would remain, where we already got a[2] = 1. The resulting array is now: 1 0 1 0 0

Another example: int a[6] = { a[3] = 1, a[4] = 2, a[5] = 3 }; - Even though the order is somewhat arbitrary, assuming it goes from left to right, it would go in these 6 steps:

0 0 0 1 0 0
1 0 0 1 0 0
1 0 0 1 2 0
1 2 0 1 2 0
1 2 0 1 2 3
1 2 3 1 2 3

edited Sep 14 '18 at 13:58

answered Sep 13 '18 at 11:58

Battle

651
10
12

1

`A = B = C = 5` is not a declaration (or initialization). It's a normal expression that parses as `A = (B = (C = 5))` because the `=` operator is right associative. That doesn't really help with explaining how initialization works. The array actually starts existing when the block it is defined in is entered, which can be long before the actual definition is executed. – melpomene Sep 13 '18 at 15:48
This is a [language-lawyer] question, but this is not an answer that works with the standard, thus making it irrelevant. Plus there are also 2 much more in-depth answers available and your answer does not seem to add anything. – Kami Kaze Sep 14 '18 at 08:19
1

"*It goes from left to right, each starting with the internal declaration*" is incorrect. The C standard explicitly says "*The order in which any side effects occur among the initialization list expressions is unspecified.*" – melpomene Sep 14 '18 at 12:14
@melpomene - So it's actually just arbitrary in C? Alright, I'll edit it again. The result should still be the same though, however the steps may not. Also I remember the dark times when I was forced to code in C in a course... how I do not miss those days. – Battle Sep 14 '18 at 13:56
The result "should" not be anything in particular because the behavior is undefined. See my answer for rationale. – melpomene Sep 14 '18 at 13:57
@melpomene - Even if it is not well defined in the standards, in practice the compiler may handle it consistently. I cite you: "Because the C standard does not clearly define what happens here, I believe the behavior is undefined (by omission)." - Given that it can be tested, there is no need for speculation. And there is no reason to assert that your speculation, founded on the lack of information (as you said - "by omission"), is a valid proof. There is theory, and there is practice. You made arguments to why in theory it is not (well) defined - fine. But neither of us tested it. – Battle Sep 14 '18 at 14:10
You cannot test for undefined behavior. How would that even work? Any observed result would be consistent with undefined behavior. The compiler may handle it consistently, but in that case it must be documented in the compiler manual. Otherwise things could silently change with another compiler, or the next version of the compiler, or different optimization settings, or the same code in a different context. – melpomene Sep 14 '18 at 14:19
@melpomene - You test the code from my example sufficient times and see if the results are consistent. Make variations where you think inconsistencies emerge. If it's not consistent, you can say it is not definitive and indeed somewhat arbitrary. That would be solid proof. Otherwise it's what? Lacking documentation and speculation based on said lacking documentation. Well, tell the people who work on the compiler to fix their documentation then. Prove me wrong and I will gladly yield and delete or edit my answer. – Battle Sep 14 '18 at 14:37
1

"*You test the code from my example sufficient times and see if the results are consistent.*" That's not how it works. You don't seem to understand what undefined behavior is. *Everything* in C has undefined behavior by default; it's just that some parts have behavior that is defined by the standard. To prove that something has defined behavior, you must cite the standard and show where it defines what should happen. In the absence of such a definition, the behavior is undefined. – melpomene Sep 14 '18 at 14:49
1

The assertion in point (1) is an enormous leap over the key question here: does the implicit initialization of element a[2] to 0 occur before the side effect of the `a[2] = 1` initializer expression is applied? The observed result is as if it was, but the standard does not appear to specify that that should be the case. *That* is the center of the controversy, and this answer completely overlooks it. – John Bollinger Sep 14 '18 at 15:02
1

"Undefined behavior" is a technical term with a narrow meaning. It doesn't mean "behavior we're not really sure about". The key insight here is that no test, with no compiler, can ever show a particular program is or is not well-behaved *according to the standard*, because if a program has undefined behavior, the compiler is allowed to do *anything* -- including working in a perfectly predictable and reasonable manner. It's not simply a quality of implementation issue where the compiler writers document things -- that's unspecified or implementation-defined behavior. – Jeroen Mostert Sep 14 '18 at 16:04
TL;DR: with careful tests you can establish (to any degree you desire) how any one particular compiler processes this code, but that won't tell you anything about what the standard *says* should happen with this code, and therein lies the question. It's clear enough how an "obvious" or "reasonable" compiler might process this code, but that's not what `language-lawyer` is about. If C was defined according to how one particular reference compiler did things (some languages do work this way), it would be another matter. – Jeroen Mostert Sep 14 '18 at 16:05
@Jeroen Mostert - The question was this: "I don't understand, why does a[0] print 1 instead of 0? Is it undefined behaviour?" - I answered the first part and expanded on it with another example, and remained concise. There is the voting system, I see you both have cast your downvotes, be happy and move on. I corrected what needed to be, but that's it. Hell, this is not natural science, this is just about some humans not doing their jobs well enough (not that I want to criticize them), and leaving behaviors undefined. That's ALL. Once fixed (if ever), this conversation becomes obsolete. – Battle Sep 14 '18 at 18:48

score 0 · Answer 6 · answered Sep 19 '18 at 07:05

0

The assignment a[2]= 1 is an expression that has the value 1, and you essentially wrote int a[5]= { 1 }; (with the side effect that a[2] is assigned 1 as well).

answered Sep 19 '18 at 07:05

Yves Daoust

48,767
8
39
84

But it is unclear when the side effect is evaluated and the behaviour might change depending on the compiler. Also the standard seems to state that this is undefined behaviour making explanations for compiler specific realisations not helpful. – Kami Kaze Oct 04 '18 at 06:52
@KamiKaze: sure, the value 1 landed there by accident. – Yves Daoust Oct 04 '18 at 07:09

score 0 · Answer 7 · answered Nov 19 '18 at 01:57

0

I believe, that int a[5]={ a[2]=1 }; is a good example for a programmer shooting him/herself into his/her own foot.

I might be tempted to think that what you meant was int a[5]={ [2]=1 }; which would be a C99 designated initializer setting element 2 to 1 and the rest to zero.

In the rare case that you really really meant int a[5]={ 1 }; a[2]=1;, then that would be a funny way of writing it. Anyhow, this is what your code boils down to, even though some here pointed out that it's not well defined when the write to a[2] is actually executed. The pitfall here is that a[2]=1 is not a designated initializer but a simple assignment which itself has the value 1.

answered Nov 19 '18 at 01:57

Sven

1,152
1
12
17

looks like this language-lawyer topic is asking references from standard drafts. That is why you are downvoted (I didnt do it as you see I am downvoted for same reason). I think what you wrote is completely fine but looks like all these language lawyers here are either from commitee or something like that. So they are not asking for help at all they are trying to check if draft covers the case or not and most of the guys here are triggered if you put answer like you helping them. I guess ill delete my answer :) If this topic rules put clearly that would have been helpful – Abdurrahim Dec 08 '18 at 02:39

Confusion about array initialization in C

7 Answers7

6.7.9 Initialization

Linked

Related