133

I was under the impression that accessing a union member other than the last one set is UB, but I can't seem to find a solid reference (other than answers claiming it's UB but without any support from the standard).

So, is it undefined behavior?

Ayxan Haqverdili
  • 17,764
  • 5
  • 27
  • 57
Luchian Grigore
  • 236,802
  • 53
  • 428
  • 594
  • 3
    C99 (and I believe C++11 as well) explicitly allow type-punning with unions. So I think it falls under "implementation defined" behavior. – Mysticial Jul 07 '12 at 07:40
  • 1
    I have used it on several occasions to convert from individual int to char. So, I definitely know it is not undefined. I used it on the Sun CC compiler. So, it might still be compiler dependent. – go4sri Jul 07 '12 at 07:55
  • 44
    @go4sri: Clearly, you don't know what it means for behavior to be undefined. The fact that it appeared to work for you in some instance does not contradict its undefinededness. – Benjamin Lindley Jul 07 '12 at 07:58
  • 1
    This might be a good read: http://davmac.wordpress.com/2010/02/26/c99-revisited/ – Mysticial Jul 07 '12 at 08:03
  • 4
    Related: [Purpose of Unions in C and C++](http://stackoverflow.com/q/2310483/183120) – legends2k Oct 11 '13 at 05:11
  • 4
    @Mysticial, the blog post you link to is very specifically regarding C99; this question is tagged only for C++. – davmac Dec 02 '13 at 16:16
  • Reading or assigning a non active member? – curiousguy Oct 30 '19 at 21:29

5 Answers5

141

The confusion is that C explicitly permits type-punning through a union, whereas C++ () has no such permission.

6.5.2.3 Structure and union members

95) If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’). This might be a trap representation.

The situation with C++:

9.5 Unions [class.union]

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

C++ later has language permitting the use of unions containing structs with common initial sequences; this doesn't however permit type-punning.

To determine whether union type-punning is allowed in C++, we have to search further. Recall that is a normative reference for C++11 (and C99 has similar language to C11 permitting union type-punning):

3.9 Types [basic.types]

4 - The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T, where N equals sizeof(T). The value representation of an object is the set of bits that hold the value of type T. For trivially copyable types, the value representation is a set of bits in the object representation that determines a value, which is one discrete element of an implementation-defined set of values. 42
42) The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.

It gets particularly interesting when we read

3.8 Object lifetime [basic.life]

The lifetime of an object of type T begins when: — storage with the proper alignment and size for type T is obtained, and — if the object has non-trivial initialization, its initialization is complete.

So for a primitive type (which ipso facto has trivial initialization) contained in a union, the lifetime of the object encompasses at least the lifetime of the union itself. This allows us to invoke

3.9.2 Compound types [basic.compound]

If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained.

Assuming that the operation we are interested in is type-punning i.e. taking the value of a non-active union member, and given per the above that we have a valid reference to the object referred to by that member, that operation is lvalue-to-rvalue conversion:

4.1 Lvalue-to-rvalue conversion [conv.lval]

A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.

The question then is whether an object that is a non-active union member is initialized by storage to the active union member. As far as I can tell, this is not the case and so although if:

  • a union is copied into char array storage and back (3.9:2), or
  • a union is bytewise copied to another union of the same type (3.9:3), or
  • a union is accessed across language boundaries by a program element conforming to ISO/IEC 9899 (so far as that is defined) (3.9:4 note 42), then

the access to a union by a non-active member is defined and is defined to follow the object and value representation, access without one of the above interpositions is undefined behaviour. This has implications for the optimisations allowed to be performed on such a program, as the implementation may of course assume that undefined behaviour does not occur.

That is, although we can legitimately form an lvalue to a non-active union member (which is why assigning to a non-active member without construction is ok) it is considered to be uninitialized.

Ben Voigt
  • 260,885
  • 36
  • 380
  • 671
ecatmur
  • 137,771
  • 23
  • 263
  • 343
  • If I am not mistaken (I only have a draft version of the C99 standard), this explicit paragraph about type punning was not in C99. Though, maybe we can infer it from other information in the standard as you did it for C++. Nevertheless, this addition seems to reveal that it was not clear in previous versions of the standard. – mpu Aug 17 '12 at 09:55
  • 1
    @mpu it should be present; look for 6.5.2.3, footnote 82. – ecatmur Aug 17 '12 at 10:06
  • 5
    3.8/1 says an object's lifetime ends when its storage is reused. That indicates to me that a non-active member of a union's lifetime has ended because its storage has been reused for the active member. That would mean you're limited in how you use the member (3.8/6). – bames53 Oct 18 '12 at 19:35
  • 1
    @bames53 good point, but if it has trivial initialization then its lifetime starts again immediately or when the non-active member is accessed (*storage with the proper alignment and size for type `T` is obtained*). – ecatmur Oct 19 '12 at 08:05
  • 2
    Under that interpretation then every bit of memory simultaneously contains objects of all types that are trivially initiallizable and have appropriate alignment... So then does the lifetime of any non-trivially initiallizable type immediately end as its storage is reused for all these other types (and not restart because they're not trivially initiallizable)? – bames53 Oct 19 '12 at 14:08
  • @bames53 I don't think that would count as "reuse"; that would require at the least using the object as an lvalue. – ecatmur Oct 19 '12 at 14:28
  • I guess 'use' and 'reuse' aren't explicitly defined. Nor is 'obtained'. Is storage obtained every time a non-active member is accessed, or is it only obtained once during the original allocation? Anyway, this answer is a great summary of the issues. – bames53 Oct 19 '12 at 15:06
  • I've put in a larger excerpt of the rule on (g)lvalue-to-rvalue conversion, since it seems the other part of it could be relevant as well (the object to which the glvalue refers, does that have the type of the active member, and not the type of the glvalue undergoing attempted conversion?) – Ben Voigt Jul 24 '13 at 01:00
  • 1
    You may find some of the references I link in [this answer](http://stackoverflow.com/a/20956250/1708801) on type-punning interesting. Especially the quote by Pascal Cuoq in my footnote. Also a side question, since you invoke C99 being a normative reference for C++11 do you have a position on [Can we apply content not explicitly cited from the normative references to the C++ standard?](http://stackoverflow.com/q/23020323/1708801)? – Shafik Yaghmour Jun 25 '14 at 12:26
  • 3
    The wording 4.1 is completely and utterly broken and has since been rewritten. It disallowed all sorts of perfectly valid things: it disallowed custom `memcpy` implementations (accessing objects using `unsigned char` lvalues), it disallowed accesses to `*p` after `int *p = 0; const int *const *pp = &p;` (even though the implicit conversion from `int**` to `const int*const*` is valid), it disallowed even accessing `c` after `struct S s; const S &c = s;`. [CWG issue 616](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#616). Does the new wording allow it? There's also [basic.lval]. –  Sep 14 '14 at 10:04
  • 1
    @hvd undefined behavior resulting from evaluation of expressions producing indeterminate values has now (cf. n3936) moved to [dcl.init]/12. This resolves the `memcpy` issue (it's now written in terms of narrow character types). – ecatmur Sep 15 '14 at 07:54
  • 1
    @ecatmur That issue does say it's about indeterminate values, but it includes all other issues related to that paragraph. The issue with `memcpy` isn't about indeterminate values: using `memcpy` to copy an already initialised value doesn't read any indeterminate values. –  Sep 15 '14 at 08:10
  • Allowing type punning via unions is a crazy idea of the C committee that cannot be made to work. – curiousguy Aug 18 '15 at 15:33
  • @hvd "_doesn't read any indeterminate values_" Even when you read padding bytes? – curiousguy Aug 18 '15 at 15:53
  • 1
    @curiousguy IIRC I was thinking of simple types (and didn't clarify properly). Fair point, for other types, using `memcpy` to copy already initialised values can cause reads of indeterminate values. –  Aug 18 '15 at 21:51
  • 1
    @curiousguy: Allowing type punning via unions is an idea that works just fine if the lifetime of any object that's contained within another is the lifetime of the container, and object accesses behave as ways of accessing the underlying storage. Neither principle works very well in C++, but both work just fine in the language invented by Dennis Ritchie. – supercat Jan 20 '17 at 23:00
  • 1
    The standard should be changed to explicitly allow type punning via unions given some stringent restrictions on what exactly is in the union. Anything that has a non-trivial construct or destructor would UB if it was one of the punned types. I'm not even sure if such things are allowed to be union members at all. – Omnifarious Mar 20 '17 at 02:20
  • 2
    @Omnifarious: That would make sense, though it would also need to clarify (and the C Standard also needs to clarify, btw) what the unary `&` operator means when applied to a union member. I would think the resulting pointer should be usable to access the member at least until the next time the next direct or indirect use of any other member lvalue, but in gcc the pointer isn't usable even that long, which raises a question of what the `&` operator is supposed to mean. – supercat Apr 23 '17 at 18:30
  • 4
    One question regarding *"Recall that c99 is a normative reference for C++11"* Isn't that only relevant, where the c++ standard explicitly refers to the C standard (e.g. for the c library functions)? – MikeMB Sep 15 '17 at 12:11
  • @MikeMB yes, but [basic.types]/4 footnote 42 says "*The intent is that the memory model of C++ is compatible with that of ISO/IEC 9899 Programming Language C.*". It's a bit tenuous, admittedly. – ecatmur Sep 15 '17 at 14:19
  • Do all major compilers allow this as an extension? – Demi Jan 21 '19 at 01:25
  • [basic.life]/1 was updated to restrict the lifetime of union members in [P0137](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0137r1.html) for C++17. – Davis Herring May 03 '19 at 03:56
28

The C++11 standard says it this way

9.5 Unions

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

If only one value is stored, how can you read another? It just isn't there.


The gcc documentation lists this under Implementation defined behavior

  • A member of a union object is accessed using a member of a different type (C90 6.3.2.3).

The relevant bytes of the representation of the object are treated as an object of the type used for the access. See Type-punning. This may be a trap representation.

indicating that this is not required by the C standard.


2016-01-05: Through the comments I was linked to C99 Defect Report #283 which adds a similar text as a footnote to the C standard document:

78a) If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation.

Not sure if it clarifies much though, considering that a footnote is not normative for the standard.

Community
  • 1
  • 1
Bo Persson
  • 86,087
  • 31
  • 138
  • 198
  • I stumbled upon this, but it doesn't really say accessing another value is UB. Also, *some* value **is** there since the memory is common to all members. – Luchian Grigore Jul 07 '12 at 07:49
  • 12
    @LuchianGrigore: UB isn't what standard says is UB, instead it's what the standard doesn't describe how it should work. This is exactly such case. Does the standard describe what happens? Does it say that it's implementation defined? No and no. So it's UB. Moreover, regarding the "members share the same memory address" argument, you'll have to refer to the aliasing rules, which will bring you to UB again. – Yakov Galka Jul 07 '12 at 07:52
  • 3
    @ybungalobill there are actually loads of places where the standard says the behavior is undefined. Also, it's not clear what "active" means. – Luchian Grigore Jul 07 '12 at 07:53
  • 5
    @Luchian: It's quite clear what active means, *"that is, the value of at most one of the non-static data members can be stored in a union at any time."* – Benjamin Lindley Jul 07 '12 at 07:55
  • 6
    @LuchianGrigore: Yes there are. There is infinite amount of cases that the standard does not (and cannot) address. (C++ is a Turing complete VM so it's incomplete.) So what? It does explain what "active" mean, refer to the above quote, after "that is". – Yakov Galka Jul 07 '12 at 07:55
  • 2
    It is undefined behaviour to try to read an object that isn't there (like using a dangling pointer). The union only contains one value, the one last written to. – Bo Persson Jul 07 '12 at 07:56
  • 8
    @LuchianGrigore: Omission of explicit definition of behavior is also unconsidered undefined behavior, according to the definitions section. – jxh Jul 07 '12 at 07:59
  • @Bo: Is it in the standard that all members of a union are stored at the same memory address? If so, then it would be standard-defined behavior that when you store the value into one of the union fields, the other fields values change as well. They aren't "not there" - you've just written to a memory location where you _know_ something resides. And it would be entirely predictable behavior if you know the endianness and size of all the members of the union, as well as any padding rules. Why is it not clear that this is well-defined behavior? – Claudiu Jul 07 '12 at 15:52
  • @BoPersson: e.g. `char hello[4]; int *p1 = (int *)hello; *pi = 10;` is it now undefined behavior to access `hello`? – Claudiu Jul 07 '12 at 15:53
  • 5
    @Claudiu That's UB for a different reason - it violates strict aliasing. – Mysticial Jul 07 '12 at 16:27
  • 1
    @Claudiu - The standard *does* say that the last written value is there, and nothing else. Many compilers will allow you to try to read the value as another type, but that would be implementation specific. – Bo Persson Jul 07 '12 at 17:18
  • @Bo Persson, from what I understand from the first quote, it simply states that you cannot store two values at once, which is understandable since all members share the same memory. Why is it undefined behaviour? The union is doing exactly what it is supposed to do. It is the "casting" by accessing different members (of different types) that may trigger undefined behavior, no? – Eitan T Aug 14 '12 at 09:27
  • 2
    @EitanT - Yes, the undefined behavior is trying to read a member that isn't there (and that's not unique to unions :-). A complication is that gcc promises to do its best if you try this type punning, and other compilers want to be gcc compatible, so they allow it too. So it often works in practice, except when it doesn't and you run into things like *"This may be a trap representation"*. – Bo Persson Aug 14 '12 at 09:39
  • @ybungalobill "_Does the standard describe what happens?_" Yes: you are simply using a lvalue – curiousguy Aug 18 '15 at 16:01
  • @Mysticial "_it violates strict aliasing_" how? – curiousguy Aug 18 '15 at 16:02
  • 2
    @curiousguy You can't dereference through an incompatible type pointer. In this case deferencing as `int*` is not compatible with `char*`. There is [an exception](http://stackoverflow.com/questions/23848188/strict-aliasing-rule-and-char-pointers) that allows `char*` to alias with anything, but not the other way around. – Mysticial Aug 18 '15 at 16:39
  • @Mysticial Pointers don't alias, only lvalues do. – curiousguy Aug 18 '15 at 16:49
  • Is there a way to stop GCC from letting us do this, or at least provide stern warnings? The lack thereof made me think all type punning was OK up until now. As mentioned on Jerry's excellent answer, I'm just hoping that the proviso for "structs that share a common initial sequence" is going to save my (projects') ass(es) in this situation. – underscore_d Dec 31 '15 at 13:59
  • What you quoted for GCC is actually taken from the C standard, not implementation-defined. See: http://stackoverflow.com/questions/11639947/is-type-punning-through-a-union-unspecified-in-c99-and-has-it-become-specified This does not hold for the C++ standard, however, and I'm not sure whether `g++` makes the same guarantee for C++. – underscore_d Jan 05 '16 at 10:50
  • @underscore - Ok, I haven't read the C standard *that* closely, so I might have missed some footnotes. :-) I was directed to the GCC documentation by the compiler, when it didn't like some of my code and suggested union type punning instead. To be compatible, the other major compilers will allow this too, so it is kind of a de facto standard on Windows and Linux (where a trap representation isn't present either). Complicated, this... – Bo Persson Jan 05 '16 at 11:20
  • Could you edit to reflect that GCC is following the C standard there? What _is_ implementation defined is that `g++` inherits the same rule: https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Implementation.html . So, yes, it certainly is complicated! It's lucky implementations have _de facto_ standards, especially if you're me and coding C++ where - if my understanding of the Standard is correct - I'm depending on `g++` applying the C rules. I hate having to rely on implementation-defined behaviour, but at least it's not _un_ defined and won't delete my code... – underscore_d Jan 05 '16 at 11:35
  • @underscore_d: De-facto standards are great when they are respected. The authors of the Standard focused on things which they felt should be required even on platforms where they would be somewhat impractical (e.g. requiring that unsigned types behave as a ring with a power-of-two modulus even on platforms whose arithmetic instructions would wrap mod some other value) but saw no need to mandate behaviors which would be commonplace, practical, and useful on 99% of machines. If the lack of a mandate didn't stop compilers from supporting such behaviors before 1989, there was no reason to... – supercat Aug 08 '16 at 16:40
  • ...think it would discourage them from continuing to support such behaviors in the 21st century. – supercat Aug 08 '16 at 16:40
  • FYI, union type-punning *is* well-defined in ISO C99 and later, at least GCC devs think so. The GNU extension you cite applies to C89/C90, and to C++: that's why it mentions C90. I thought I recalled seeing a clearer statement of GNU C++ union type-punning in the GCC manual at one point, but currently (https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html) it's still listed under the C implementation-defined behaviour, not C++. It does like to a section about `-fstrict-aliasing` that compares union (safe) vs. pointer-case (unsafe). – Peter Cordes Jan 07 '21 at 22:32
  • I think the GCC manual may consider the "C implementation-defined behaviour" as applying to C++ as well, even though C and C++ are more properly separate languages. Ah yes, [C++ Implementation-Defined Behavior](https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Implementation.html) says: *Some choices are documented in the corresponding document for the C language. See [C Implementation](https://gcc.gnu.org/onlinedocs/gcc/C-Implementation.html).* So that covers GNU C++ explicitly. – Peter Cordes Jan 07 '21 at 22:39
18

I think the closest the standard comes to saying it's undefined behavior is where it defines the behavior for a union containing a common initial sequence (C99, §6.5.2.3/5):

One special guarantee is made in order to simplify the use of unions: if a union contains several structures that share a common initial sequence (see below), and if the union object currently contains one of these structures, it is permitted to inspect the common initial part of any of them anywhere that a declaration of the complete type of the union is visible. Two structures share a common initial sequence if corresponding members have compatible types (and, for bit-fields, the same widths) for a sequence of one or more initial members.

C++11 gives similar requirements/permission at §9.2/19:

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

Though neither states it directly, these both carry a strong implication that "inspecting" (reading) a member is "permitted" only if 1) it is (part of) the member most recently written, or 2) is part of a common initial sequence.

That's not a direct statement that doing otherwise is undefined behavior, but it's the closest of which I'm aware.

Jerry Coffin
  • 437,173
  • 71
  • 570
  • 1,035
  • To make this complete, you need to know what "layout-compatible types" are for C++, or "compatible types" are for C. – Michael Anderson Aug 15 '12 at 08:32
  • 2
    @MichaelAnderson: Yes and no. You need to deal with those when/if you want to be certain whether something falls within this exception -- but the real question here is whether something that clearly falls outside the exception truly gives UB. I think that's strongly enough implied here to make the intent clear, but I don't think it's ever directly stated. – Jerry Coffin Aug 15 '12 at 15:43
  • This "common initial sequence" thing might just have saved 2 or 3 of my projects from the Rewrite Bin. I was livid when I first read about most punning uses of `union`s being undefined, since I'd been given the impression by a particular blog that this was OK, and built several large structures and projects around it. Now I _think_ I might be OK after all, since my `union`s do contain classes having the same types at the front – underscore_d Dec 31 '15 at 13:55
  • @JerryCoffin, I think you were hinting at the same question as me: what if our `union` contains _e.g._ a `uint8_t` and a `class Something { uint8_t myByte; [...] };` - I would assume this proviso would also apply here, but it's worded very deliberately to only allow for `struct`s. Luckily I'm already using those instead of raw primitives :O – underscore_d Dec 31 '15 at 14:04
  • @underscore_d: The C standard at least sort of covers that question: "A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa." – Jerry Coffin Dec 31 '15 at 16:00
  • Thanks, Jerry. 2 things I wonder from that: I assume it's in a C version/section that's normative for C++? and do rules for pointers (aliasing) implicitly apply to unions (type punning) too? In the course of trying to wrap my head around this, I frequently see these two things discussed as though they're equivalent, but I don't know whether that's correct. – underscore_d Dec 31 '15 at 16:26
  • this should be accepted answer, it's suits my ability to understand the problem. – metablaster Sep 24 '19 at 02:22
12

Something that is not yet mentioned by available answers is the footnote 37 in the paragraph 21 of the section 6.2.5:

Note that aggregate type does not include union type because an object with union type can only contain one member at a time.

This requirement seem to clearly imply that you must not write in a member and read in another one. In this case it might be undefined behavior by lack of specification.

mpu
  • 433
  • 2
  • 11
  • Many implementations document their storage formats and layout rules. Such a specification would in many cases imply what the effect of reading storage of one type and writing as another would be in the absence of rules saying compilers don't have to actually use their defined storage format except when things are read and written using pointers of a character type. – supercat Sep 20 '16 at 21:28
-3

I well explain this with a example.
assume we have the following union:

union A{
   int x;
   short y[2];
};

I well assume that sizeof(int) gives 4, and that sizeof(short) gives 2.
when you write union A a = {10} that well create a new var of type A in put in it the value 10.

your memory should look like that: (remember that all of the union members get the same location)

       |                   x                   |
       |        y[0]       |       y[1]        |
       -----------------------------------------
   a-> |0000 0000|0000 0000|0000 0000|0000 1010|
       -----------------------------------------

as you could see, the value of a.x is 10, the value of a.y1 is 10, and the value of a.y[0] is 0.

now, what well happen if I do this?

a.y[0] = 37;

our memory will look like this:

       |                   x                   |
       |        y[0]       |       y[1]        |
       -----------------------------------------
   a-> |0000 0000|0010 0101|0000 0000|0000 1010|
       -----------------------------------------

this will turn the value of a.x to 2424842 (in decimal).

now, if your union has a float, or double, your memory map well be more of a mess, because of the way you store exact numbers. more info you could get in here.

Marco A.
  • 41,192
  • 25
  • 117
  • 233
elyashiv
  • 3,503
  • 2
  • 26
  • 48