274

I was reading about order of evaluation violations, and they give an example that puzzles me.

1) If a side effect on a scalar object is un-sequenced relative to another side effect on the same scalar object, the behavior is undefined.

// snip
f(i = -1, i = -1); // undefined behavior

In this context, i is a scalar object, which apparently means

Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), std::nullptr_t, and cv-qualified versions of these types (3.9.3) are collectively called scalar types.

I don’t see how the statement is ambiguous in that case. It seems to me that regardless of if the first or second argument is evaluated first, i ends up as -1, and both arguments are also -1.

Can someone please clarify?


UPDATE

I really appreciate all the discussion. So far, I like @harmic’s answer a lot since it exposes the pitfalls and intricacies of defining this statement in spite of how straight forward it looks at first glance. @acheong87 points out some issues that come up when using references, but I think that's orthogonal to the unsequenced side effects aspect of this question.


SUMMARY

Since this question got a ton of attention, I will summarize the main points/answers. First, allow me a small digression to point out that "why" can have closely related yet subtly different meanings, namely "for what cause", "for what reason", and "for what purpose". I will group the answers by which of those meanings of "why" they addressed.

for what cause

The main answer here comes from Paul Draper, with Martin J contributing a similar but not as extensive answer. Paul Draper's answer boils down to

It is undefined behavior because it is not defined what the behavior is.

The answer is overall very good in terms of explaining what the C++ standard says. It also addresses some related cases of UB such as f(++i, ++i); and f(i=1, i=-1);. In the first of the related cases, it's not clear if the first argument should be i+1 and the second i+2 or vice versa; in the second, it's not clear if i should be 1 or -1 after the function call. Both of these cases are UB because they fall under the following rule:

If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

Therefore, f(i=-1, i=-1) is also UB since it falls under the same rule, despite that the intention of the programmer is (IMHO) obvious and unambiguous.

Paul Draper also makes it explicit in his conclusion that

Could it have been defined behavior? Yes. Was it defined? No.

which brings us to the question of "for what reason/purpose was f(i=-1, i=-1) left as undefined behavior?"

for what reason / purpose

Although there are some oversights (maybe careless) in the C++ standard, many omissions are well-reasoned and serve a specific purpose. Although I am aware that the purpose is often either "make the compiler-writer's job easier", or "faster code", I was mainly interested to know if there is a good reason leave f(i=-1, i=-1) as UB.

harmic and supercat provide the main answers that provide a reason for the UB. Harmic points out that an optimizing compiler that might break up the ostensibly atomic assignment operations into multiple machine instructions, and that it might further interleave those instructions for optimal speed. This could lead to some very surprising results: i ends up as -2 in his scenario! Thus, harmic demonstrates how assigning the same value to a variable more than once can have ill effects if the operations are unsequenced.

supercat provides a related exposition of the pitfalls of trying to get f(i=-1, i=-1) to do what it looks like it ought to do. He points out that on some architectures, there are hard restrictions against multiple simultaneous writes to the same memory address. A compiler could have a hard time catching this if we were dealing with something less trivial than f(i=-1, i=-1).

davidf also provides an example of interleaving instructions very similar to harmic's.

Although each of harmic's, supercat's and davidf' examples are somewhat contrived, taken together they still serve to provide a tangible reason why f(i=-1, i=-1) should be undefined behavior.

I accepted harmic's answer because it did the best job of addressing all meanings of why, even though Paul Draper's answer addressed the "for what cause" portion better.

other answers

JohnB points out that if we consider overloaded assignment operators (instead of just plain scalars), then we can run into trouble as well.

Community
  • 1
  • 1
Nicu Stiurca
  • 8,024
  • 5
  • 37
  • 46
  • 1
    A scalar object is an object of scalar type. See 3.9/9: "Arithmetic types (3.9.1), enumeration types, pointer types, pointer to member types (3.9.2), `std::nullptr_t`, and cv-qualified versions of these types (3.9.3) are collectively called *scalar types*." – Rob Kennedy Feb 10 '14 at 07:12
  • 1
    Maybe there's an error on the page, and they actually meant `f(i-1, i = -1)` or something similar. – Mr Lister Feb 10 '14 at 08:41
  • Take a look at this question: http://stackoverflow.com/a/4177063/71074 – Robert S. Barnes Feb 10 '14 at 10:34
  • @RobKennedy Thanks. Does "arithmetic types" include bool? – Nicu Stiurca Feb 10 '14 at 14:25
  • Well I guess the simple answer is: the evaluation order of the parameters would be dependent on the calling convention used by the compiler/architecture - which the programmer doesn't have much control(?!) and so the behavior may be different from one compiler/architecture to another. – JohnTortugo Feb 11 '14 at 20:10
  • 1
    SchighSchagh your update should be in answer section. – Grijesh Chauhan Feb 13 '14 at 05:32
  • You should probably mention in your summary that since C++17 this is not UB anymore, see [@AlexDs answer](https://stackoverflow.com/a/46186122) – chtz Mar 29 '18 at 14:36

11 Answers11

349

Since the operations are unsequenced, there is nothing to say that the instructions performing the assignment cannot be interleaved. It might be optimal to do so, depending on CPU architecture. The referenced page states this:

If A is not sequenced before B and B is not sequenced before A, then two possibilities exist:

  • evaluations of A and B are unsequenced: they may be performed in any order and may overlap (within a single thread of execution, the compiler may interleave the CPU instructions that comprise A and B)

  • evaluations of A and B are indeterminately-sequenced: they may be performed in any order but may not overlap: either A will be complete before B, or B will be complete before A. The order may be the opposite the next time the same expression is evaluated.

That by itself doesn't seem like it would cause a problem - assuming that the operation being performed is storing the value -1 into a memory location. But there is also nothing to say that the compiler cannot optimize that into a separate set of instructions that has the same effect, but which could fail if the operation was interleaved with another operation on the same memory location.

For example, imagine that it was more efficient to zero the memory, then decrement it, compared with loading the value -1 in. Then this:

f(i=-1, i=-1)

might become:

clear i
clear i
decr i
decr i

Now i is -2.

It is probably a bogus example, but it is possible.

harmic
  • 22,855
  • 3
  • 52
  • 72
  • 60
    Very nice example of how the expression could actually do something unexpected while conforming to sequencing rules. Yes, a bit contrived, but so is the code snipped I'm asking about in the first place. :) – Nicu Stiurca Feb 10 '14 at 08:00
  • 10
    And even if the assignment is done as an atomic operation, it is possible to conceive of a superscalar architecture where both assignments are made simultaneously causing memory access conflict that results in a failure. The language is designed so that compiler writers have as much freedom as possible in using the advantages of the target machine. – ach Feb 10 '14 at 08:22
  • 11
    I really like your example of how even assigning the same value to the same variable in both parameters could result in an unexpected result because the two assignments are unsequenced – Martin J. Feb 10 '14 at 11:17
  • 1
    +1e+6 (ok, +1) for the point that compiled code isn't always what you would expect. Optimizers are *really* good at throwing these sorts of curves at you when you don't follow the rules :P – Corey Feb 11 '14 at 06:34
  • 3
    On the Arm processor a load 32bit can take up to 4 instructions: It does `load 8bit immediate and shift` up to 4 times. Usually the compiler will do indirect addressing to fetch a number form a table to avoid this. (-1 can be done in 1 instruction, but another example could be chosen). – ctrl-alt-delor Feb 11 '14 at 12:26
  • 1
    On something like a PIC 16Cxx, `unsigned char foo=255`; could either be `movlw 255 / movwf _foo` or `clrf _foo / decf foo`. The former will load the working register with 255; the latter will leave it undisturbed. – supercat Sep 03 '14 at 21:22
  • I think this answer does more harm than good. It doesn't matter whether or not you can think of some way it might fail. The standard says it's undefined and the standard is the authoritative source for what it does or does not define. Period. – David Schwartz Sep 24 '16 at 21:58
  • Nice example! The clang compiler caught this situation with `-Wunsequenced` (I presume GCC would do so as well). – Eljay Dec 01 '17 at 23:39
210

First, "scalar object" means a type like a int, float, or a pointer (see What is a scalar Object in C++?).


Second, it may seem more obvious that

f(++i, ++i);

would have undefined behavior. But

f(i = -1, i = -1);

is less obvious.

A slightly different example:

int i;
f(i = 1, i = -1);
std::cout << i << "\n";

What assignment happened "last", i = 1, or i = -1? It's not defined in the standard. Really, that means i could be 5 (see harmic's answer for a completely plausible explanation for how this chould be the case). Or you program could segfault. Or reformat your hard drive.

But now you ask: "What about my example? I used the same value (-1) for both assignments. What could possibly be unclear about that?"

You are correct...except in the way the C++ standards committee described this.

If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

They could have made a special exception for your special case, but they didn't. (And why should they? What use would that ever possibly have?) So, i could still be 5. Or your hard drive could be empty. Thus the answer to your question is:

It is undefined behavior because it is not defined what the behavior is.

(This deserves emphasis because many programmers think "undefined" means "random", or "unpredictable". It doesn't; it means not defined by the standard. The behavior could be 100% consistent, and still be undefined.)

Could it have been defined behavior? Yes. Was it defined? No. Hence, it is "undefined".

That said, "undefined" doesn't mean that a compiler will format your hard drive...it means that it could and it would still be a standards-compliant compiler. Realistically, I'm sure g++, Clang, and MSVC will all do what you expected. They just wouldn't "have to".


A different question might be Why did the C++ standards committee choose to make this side-effect unsequenced?. That answer will involve history and opinions of the committee. Or What is good about having this side-effect unsequenced in C++?, which permits any justification, whether or not it was the actual reasoning of the standards committee. You could ask those questions here, or at programmers.stackexchange.com.

Community
  • 1
  • 1
Paul Draper
  • 64,883
  • 37
  • 172
  • 246
  • 1
    "Realistically, I'm sure g++, Clang, and MSVC will all do what you expected." - So long as you don't pass any options specifically to catch code like this, anyway. –  Feb 10 '14 at 06:59
  • 9
    @hvd, yes, in fact I know that if you enable `-Wsequence-point` for g++, it will warn you. – Paul Draper Feb 10 '14 at 07:02
  • 1
    Why is this less obvious? You are changing the same parameter before sequence point – BЈовић Feb 10 '14 at 07:49
  • @PaulDraper I was actually thinking more of something like clang's `-fsanitize=undefined` which, even if it might not catch this in its current form (I'm not sure), could in the future. –  Feb 10 '14 at 08:11
  • 4
    @BЈовић, It's less obvious because you are changing it to the same value in both operations, and neither assignment relies on the previous value, so the undefined order of operations appears irrelevant to the casual reader. – Paul Butcher Feb 10 '14 at 10:01
  • 49
    "I'm sure g++, Clang, and MSVC will all do what you expected" I wouldn't trust a modern compiler. They're evil. For example they might recognize that this is undefined behaviour and assume this code is unreachable. If they don't do so today, they might do so tomorrow. Any UB is a ticking time bomb. – CodesInChaos Feb 10 '14 at 10:29
  • `harmic` provided a great example of WHY this UB can generated ill effects. Modern compilers really sometimes do such optimisations. Not on your CPU? Move it to Itanium. Or AVR. Or, Atari. (let me just add that I those platform examples are rhetorical and not verified ;) ) – quetzalcoatl Feb 10 '14 at 11:05
  • 2
    Traditionally, one might sometimes point out that the compiler is at liberty to react to undefined constructs by making demons fly out of your nose. Hence whatever behaviour a compiler does do in the face of an undefined construct is a *nasal demon*. – Jon Hanna Feb 10 '14 at 11:47
  • I'm curious - how do you justify saying that "...`i` could be 5." in the context of the code under discussion? I'll agree that given `f(i = 1, i = -1)` the variable `i` could be be either 1 or -1, but I don't see how `i` could be assigned a value of 5 given the code as shown. – Bob Jarvis - Reinstate Monica Feb 10 '14 at 12:06
  • 4
    @BobJarvis: because it's undefined behaviour. So if some compiler decides it wants to set i to 5, or do anything else, then it is still a compliant compiler. I don't think he actually meant that i==5 would be a likely result. – RemcoGerlich Feb 10 '14 at 12:57
  • 8
    @BlacklightShining "your answer is bad because it is not good" is not very useful feedback, is it? – Vincent van der Weele Feb 10 '14 at 13:06
  • @RemcoGerlich: the undefined behavior is the order of evaluation of the assignments. If you say "assign 1" followed by "assign -1" and the compiler sets i to 5, that's not "undefined behavior" - that's just wrong. – Bob Jarvis - Reinstate Monica Feb 10 '14 at 13:10
  • 2
    @BobJarvis have a look at Harmic's answer and tell us why this (or any other resulting number) would not be standard compliant. – Hulk Feb 10 '14 at 13:13
  • 1
    @Hulk: as Harmic said, this is a bogus example. If the compiler generates code which does not assign an intended value, it's wrong. It might have a choice of *which* value to assign in *which* order, but it can't assign an improper value. In this example he's assuming that the instructions to clear and decrement the value of `i` are interleaved; this is not reasonable behavior on the part of the compiler. It could generate `clear; decr; clear; decr`, but `clear; clear; decr; decr` would not result in anything that the code required, and the only thing generated should be a bug report. – Bob Jarvis - Reinstate Monica Feb 10 '14 at 13:19
  • 2
    @BobJarvis I don't claim that this kind of behavior would be reasonable, but I don't think it's explicitly defined by the standard - and as others have mentioned, once you've got UB, anything is allowed to happen, including nasal demons and formatted hard drives. – Hulk Feb 10 '14 at 13:34
  • @BobJarvis, you're wrong in assuming the serializability of operations executed on the data. Even if the assumption is valid for the current mainstream hardware architectures, lots of other architectures have existed and are to be developed in the future. – ach Feb 10 '14 at 13:57
  • 2
    @BobJarvis It may even be more reasonable than you think. When you're granted the luxury of making something UB, you can use that to optimize. When you do optimizations while not caring about how one use case works, you can get funny results. – Cruncher Feb 10 '14 at 14:00
  • 14
    @BobJarvis The compiles has absolutely no obligation to generate even remotely correct code in the face of undefined behaviour. TIt can even assume that this code is never even called and thus replace the whole thing with a nop (Note that compilers actually make such assumptions in the face of UB). Therefore I'd say that the correct reaction to such a bug report can only be "closed, works as intended" – Grizzly Feb 10 '14 at 14:12
  • 2
    @Grizzly: In fact, I've even seen an optimization guide *specifically advise* writing code that can have undefined behavior to allow compiler optimizations. (the example was to use *signed* loop index variables, so that the compiler didn't have to worry about producing correct behavior of an *unsigned* loop index variable overflowing) –  Feb 10 '14 at 14:41
  • 2
    `It is undefined behavior because it is not defined what the behavior is.` To me, this tautological answer is rather unsatisfactory. Personally, I tend to associate UB with things that inherently don't make sense or are otherwise problematic to define. I appreciate the discussion here, but I prefer harmic's answer which does a better job of exposing the pitfalls of trying to define this. (Your answer almost hints that the spec committee was too lazy.) – Nicu Stiurca Feb 10 '14 at 14:56
  • 2
    @SchighSchagh, the "reason" is this: in the general case, the behavior is uncertain, since there is no relative "order" for function parameter evaluation. In your specific case, there is never any reason to do it, so they didn't worry about making an exception. – Paul Draper Feb 10 '14 at 16:47
  • @Heuster: guess you didn't get the tautological irony. – rsenna Feb 10 '14 at 17:28
  • @BobJarvis: Perhaps the particular CPU architecture requires setting bit value 4 before setting 1, then unsetting 4. Resulting in value 5. I know, crazy, but stuff like that sometimes happens. – Zan Lynx Feb 10 '14 at 17:29
  • 7
    @SchighSchagh Sometimes a rephrasing of the terms (which only on the surface appears to be a tautological answer) is what people need. Most people new to technical specifications think `undefined behavior` means `something random will happen`, which is far from the case most of the time. – Izkata Feb 10 '14 at 17:38
  • 6
    @SchighSchagh: "Personally, I tend to associate UB with things that inherently don't make sense or are otherwise problematic to define" -- that's the mistake that this answer is trying to correct. You should not associate undefined behavior with things that are problematic to define, you should associate it with things that in fact are not defined. It's nice if you can be shown an example why it benefits some hypothetical implementation for it to be undefined (like harmic's answer does), but all that's needed to clarify your statement "`i` ends up as `-1`" is that it isn't guaranteed ;-) – Steve Jessop Feb 10 '14 at 19:03
  • @Heuster It's about as informative as `It is undefined behavior because it is not defined what the behavior is.`. – Blacklight Shining Feb 11 '14 at 16:46
  • @BlacklightShining, I didn't ask the question. I just answered it. (And that is the correct answer.) – Paul Draper Feb 11 '14 at 18:14
  • @PaulDraper Correct, yes, in the sense that `x == x` is correct. _Correct_ does not mean _helpful._ If I asked _Why isn't X the case?_, anyone could answer _X is not the case because it is not the case that X._, and they would be similarly _correct._ But such [tautologies](https://xkcd.com/703/) are by no means helpful. In any case, [your recent clarification](https://stackoverflow.com/revisions/21670570/8) has removed the tautology from the tautology, so to speak. [Have an upvote.](https://stackoverflow.com/q/21670459#comment32765597_21670570) – Blacklight Shining Feb 15 '14 at 05:20
  • 1
    @CodesInChaos A quote that is very applicable here. "The worst possible outcome of undefined behavior is to have it do what you were expecting." – dgnuff Jun 22 '18 at 05:00
27

A practical reason to not make an exception from the rules just because the two values are the same:

// config.h
#define VALUEA  1

// defaults.h
#define VALUEB  1

// prog.cpp
f(i = VALUEA, i = VALUEB);

Consider the case this was allowed.

Now, some months later, the need arises to change

 #define VALUEB 2

Seemingly harmless, isn't it? And yet suddenly prog.cpp wouldn't compile anymore. Yet, we feel that compilation should not depend on the value of a literal.

Bottom line: there is no exception to the rule because it would make successful compilation depend on the value (rather the type) of a constant.

EDIT

@HeartWare pointed out that constant expressions of the form A DIV B are not allowed in some languages, when B is 0, and cause compilation to fail. Hence changing of a constant could cause compilation errors in some other place. Which is, IMHO, unfortunate. But it is certainly good to restrict such things to the unavoidable.

Community
  • 1
  • 1
Ingo
  • 34,949
  • 5
  • 49
  • 97
  • Sure, but the example **does** use integer literals. Your `f(i = VALUEA, i = VALUEB);` has definitely the potential for undefined behaviour. I hope you aren't really coding against values behind identifiers. – Wolf Feb 10 '14 at 10:38
  • 3
    @Wold But the compiler doesn't see preprocessor macros. And even if this were not so, it is hard to find an example in any programming language, where a source code compiles until one changes some int constant from 1 to 2. This is simply unacceptable and unexplainable, while you see very good explanations here why that code is broken even with the same values. – Ingo Feb 10 '14 at 11:40
  • Yes, the compiles does not see macros. But, was **this** the question? – Wolf Feb 10 '14 at 11:53
  • 1
    Your answer is missing the point, read [harmic's answer](http://stackoverflow.com/a/21671069/2932052) and the OP's comment on it. – Wolf Feb 10 '14 at 12:45
  • "And even if this were not so, it is hard to find an example in any programming language, where a source code compiles until one changes some int constant from 1 to 2". That's an easy one :-). Just try to compile this: CONST A = 1; CONST B = 2; CONST C = B DIV (2-A); then change CONST A = 1 to CONST A = 2 and see if the compiler will compile it :-) :-). – HeartWare Feb 10 '14 at 12:55
  • @HeartWare In what language? – Ingo Feb 10 '14 at 12:56
  • PASCAL / Delphi / Object Pascal – HeartWare Feb 10 '14 at 12:56
  • They complain about that? Then they're broken, IMHO. I would accept, even welcome, a warning that there is a zero-divide. But not a refusal to compile it. – Ingo Feb 10 '14 at 13:01
  • What value should C then have in statements that use it? Like passing it as a parameter to a function that accepts an integer? How should the compiler compile this statement: SomeProcedure(A,B,C); – HeartWare Feb 10 '14 at 13:04
  • Oh - and Microsoft's C# compiler complains with "Division by constant zero" in this code: private const int A = 2; private const int B = 2; private const int C = B/(2 - A); whereas this code works as expected: private const int A = 1; private const int B = 2; private const int C = B/(2 - A); So the only change to make it not compile was to modify a definition of a constant from 1 to 2 :-). – HeartWare Feb 10 '14 at 13:07
  • 1
    It could do `SomeProcedure(A, B, B DIV (2-A))`. Anyway, if the language states that CONST must be fully evaluated at compile time, then, of course, my claim is not valid for that case. Since it somehow blurs the distinction of compiletime and runtime. Would it also notice if we write `CONST C = X(2-A); FUNCTION X:INTEGER(CONST Y:INTEGER) = B/Y;` ?? Or are functions not allowed? – Ingo Feb 10 '14 at 13:08
  • CONST expressions (in both PASCAL and C#) must be fully evaluable at compile time, so no function calls are allowed. And besides - even it the compiler was to compile it as SomeProcedure(A, B, B DIV (2-A)) it would still not compile, all because I changed some int constant from 1 to 2. So it isn't that hard to find examples that prevent compilation of the source file when I change a definition of a constant from 1 to 2 :-). – HeartWare Feb 10 '14 at 13:11
  • @HeartWare - "must be fully evaluable at compile time, so no function calls are allowed" - this is a non sequitur, especially when the compiler knows the function and it only contains code that is allowed in constant expressions. -- However, I must accept that my claim was easy to refute. Still, I don't think this is like it should be. – Ingo Feb 10 '14 at 13:15
  • Touché :-). Let me amend it, then: CONST expressions (in both PASCAL and C#) must be fully evaluable by the compiler without the help of user-defined code, at compile time, so no user-defined function calls are allowed (but f.ex. ABS, SQRT and other built-in intrinsic functions are allowed, because they are compiler-supplied "magic" functions and not actual code that is called as a subroutine - at least in PASCAL/Delphi). – HeartWare Feb 10 '14 at 13:24
  • @HeartWare I see why they don't allow it. One could easily write whole programs in CONST FUNCTIONS (note that functional languages like Haskell provide **only** such functions, apart from primitives) and could do ones computations entirely in the compiler. That'd be fun. :) – Ingo Feb 10 '14 at 13:37
  • `Yet, we feel that compilation should not depend on the value of a literal.` Meh. You can do a lot of cool stuff with literals, especially in template meta-programming, but obviously not every possible literal will always make sense. Also, I think your answer misses the point a little. But thank you for the discussion anyway. – Nicu Stiurca Feb 10 '14 at 15:21
  • @SchighSchagh It was written on the assumption that it is clear now why `f(i = 1, i = 2)` is rightly undefined, and tries to give a practical (rather than technical) reason why *not* to make an exception from the rule in the special case `f(i = 1, i = 1)` (as someone around here suggested). – Ingo Feb 10 '14 at 15:28
13

The confusion is that storing a constant value into a local variable is not one atomic instruction on every architecture the C is designed to be run on. The processor the code runs on matters more than the compiler in this case. For example, on ARM where each instruction can not carry a complete 32 bits constant, storing an int in a variable needs more that one instruction. Example with this pseudo code where you can only store 8 bits at a time and must work in a 32 bits register, i is a int32:

reg = 0xFF; // first instruction
reg |= 0xFF00; // second
reg |= 0xFF0000; // third
reg |= 0xFF000000; // fourth
i = reg; // last

You can imagine that if the compiler wants to optimize it may interleave the same sequence twice, and you don't know what value will get written to i; and let's say that he is not very smart:

reg = 0xFF;
reg |= 0xFF00;
reg |= 0xFF0000;
reg = 0xFF;
reg |= 0xFF000000;
i = reg; // writes 0xFF0000FF == -16776961
reg |= 0xFF00;
reg |= 0xFF0000;
reg |= 0xFF000000;
i = reg; // writes 0xFFFFFFFF == -1

However in my tests gcc is kind enough to recognize that the same value is used twice and generates it once and does nothing weird. I get -1, -1 But my example is still valid as it is important to consider that even a constant may not be as obvious as it seems to be.

davidf
  • 141
  • 3
  • I suppose that on ARM the compiler will just load the constant from a table. What you describe seems more like MIPS. – ach Feb 11 '14 at 20:04
  • 1
    @AndreyChernyakhovskiy Yep, but in a case when it's not simply `-1` (that the compiler has stored somewhere), but it's rather `3^81 mod 2^32`, but constant, then the compiler might do exactly what's done here, and in some lever of omtimization, my interleave the call sequences to avoid waiting. – yo' Feb 14 '14 at 10:05
  • @tohecz, yeah, I've checked it up already. Indeed, the compiler is too smart to load every constant from a table. Anyway, it would never use the same register to compute the two constants. This would just as surely 'undefine' the defined behaviour. – ach Feb 14 '14 at 13:25
  • @AndreyChernyakhovskiy But you are probably not "every C++ compiler programmer in the world". Remember that there're machines with 3 short registers available for computations only. – yo' Feb 14 '14 at 13:26
  • @tohecz, consider the example `f(i = A, j = B)` where `i` and `j` are two separate objects. This example has no UB. Machine having 3 short registers is no excuse for compiler to mix the two values of `A` and `B` in the same register (as shown in @davidf's answer), because it would break the program semantics. – ach Feb 15 '14 at 13:25
  • This answer is nonsense. This is a pure question about the standard. What any actual platform does or doesn't do is irrelevant. Even if there existed no platform where this could possibly fail, the behavior would still be undefined because the standard clearly says it doesn't define it. – David Schwartz Sep 24 '16 at 22:00
11

Behavior is commonly specified as undefined if there is some conceivable reason why a compiler which was trying to be "helpful" might do something which would cause totally unexpected behavior.

In the case where a variable is written multiple times with nothing to ensure that the writes happen at distinct times, some kinds of hardware might allow multiple "store" operations to be performed simultaneously to different addresses using a dual-port memory. However, some dual-port memories expressly forbid the scenario where two stores hit the same address simultaneously, regardless of whether or not the values written match. If a compiler for such a machine notices two unsequenced attempts to write the same variable, it might either refuse to compile or ensure that the two writes cannot get scheduled simultaneously. But if one or both of the accesses is via a pointer or reference, the compiler might not always be able to tell whether both writes might hit the same storage location. In that case, it might schedule the writes simultaneously, causing a hardware trap on the access attempt.

Of course, the fact that someone might implement a C compiler on such a platform does not suggest that such behavior shouldn't be defined on hardware platforms when using stores of types small enough to be processed atomically. Trying to store two different values in unsequenced fashion could cause weirdness if a compiler isn't aware of it; for example, given:

uint8_t v;  // Global

void hey(uint8_t *p)
{
  moo(v=5, (*p)=6);
  zoo(v);
  zoo(v);
}

if the compiler in-lines the call to "moo" and can tell it doesn't modify "v", it might store a 5 to v, then store a 6 to *p, then pass 5 to "zoo", and then pass the contents of v to "zoo". If "zoo" doesn't modify "v", there should be no way the two calls should be passed different values, but that could easily happen anyway. On the other hand, in cases where both stores would write the same value, such weirdness could not occur and there would on most platforms be no sensible reason for an implementation to do anything weird. Unfortunately, some compiler writers don't need any excuse for silly behaviors beyond "because the Standard allows it", so even those cases aren't safe.

supercat
  • 69,493
  • 7
  • 143
  • 184
9

The fact that the result would be the same in most implementations in this case is incidental; the order of evaluation is still undefined. Consider f(i = -1, i = -2): here, order matters. The only reason it doesn't matter in your example is the accident that both values are -1.

Given that the expression is specified as one with an undefined behaviour, a maliciously compliant compiler might display an inappropriate image when you evaluate f(i = -1, i = -1) and abort the execution - and still be considered completely correct. Luckily, no compilers I am aware of do so.

Kevin
  • 11,714
  • 18
  • 66
  • 107
Amadan
  • 169,219
  • 18
  • 195
  • 256
8

It looks to me like the only rule pertaining to sequencing of function argument expression is here:

3) When calling a function (whether or not the function is inline, and whether or not explicit function call syntax is used), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.

This does not define sequencing between argument expressions, so we end up in this case:

1) If a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object, the behavior is undefined.

In practice, on most compilers, the example you quoted will run fine (as opposed to "erasing your hard disk" and other theoretical undefined behavior consequences).
It is, however, a liability, as it depends on specific compiler behaviour, even if the two assigned values are the same. Also, obviously, if you tried to assign different values, the results would be "truly" undefined:

void f(int l, int r) {
    return l < -1;
}
auto b = f(i = -1, i = -2);
if (b) {
    formatDisk();
}
Martin J.
  • 4,748
  • 4
  • 21
  • 38
8

C++17 defines stricter evaluation rules. In particular, it sequences function arguments (although in unspecified order).

N5659 §4.6:15
Evaluations A and B are indeterminately sequenced when either A is sequenced before B or B is sequenced before A, but it is unspecified which. [ Note: Indeterminately sequenced evaluations cannot overlap, but either could be executed first. —end note ]

N5659 § 8.2.2:5
The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter.

It allows some cases which would be UB before:

f(i = -1, i = -1); // value of i is -1
f(i = -1, i = -2); // value of i is either -1 or -2, but not specified which one
AlexD
  • 30,405
  • 3
  • 66
  • 62
  • 2
    Thank you for adding this update for [tag:C++17], so I didn't have to. ;) – Yakk - Adam Nevraumont Oct 26 '17 at 19:27
  • Awesome, thanks a lot for this answer. Slight followup: if `f`'s signature were `f(int a, int b)`, does C++17 guarantee that `a == -1` and `b == -2` if called as in the second case? – Nicu Stiurca Nov 02 '17 at 02:46
  • Yes. If we have parameters `a` and `b`, then either `i`-then-`a` are initialized to -1, afterwards `i`-then-`b` are initialized to -2, or the way around. In both cases, we end up with `a == -1` and `b == -2`. At least this is how I read "_The initialization of a parameter, including every associated value computation and side effect, is indeterminately sequenced with respect to that of any other parameter_". – AlexD Nov 02 '17 at 22:25
  • I think it has been the same in C since forever. – fuz Jul 13 '18 at 15:19
5

The assignment operator could be overloaded, in which case the order could matter:

struct A {
    bool first;
    A () : first (false) {
    }
    const A & operator = (int i) {
        first = !first;
        return * this;
    }
};

void f (A a1, A a2) {
    // ...
}


// ...
A i;
f (i = -1, i = -1);   // the argument evaluated first has ax.first == true
JohnB
  • 12,033
  • 4
  • 34
  • 63
  • 1
    True enough, but the question was about _scalar types_, which others have pointed out means essentially int family, float family, and pointers. – Nicu Stiurca Feb 10 '14 at 14:30
  • The real problem in this case is that the assignment operator is stateful, so even regular manipulation of the variable is prone to issues like this. – AJMansfield Feb 10 '14 at 16:18
2

This is just answering the "I'm not sure what "scalar object" could mean besides something like an int or a float".

I would interpret the "scalar object" as a abbreviation of "scalar type object", or just "scalar type variable". Then, pointer, enum (constant) are of scalar type.

This is a MSDN article of Scalar Types.

Peng Zhang
  • 2,988
  • 3
  • 25
  • 35
  • This reads a bit like a "link only answer". Can you copy the relevant bits from that link to this answer (in a blockquote)? – Cole Johnson Feb 10 '14 at 16:51
  • 1
    @ColeJohnson This is not a link only answer. The link is only for further explanation. My answer is "pointer", "enum". – Peng Zhang Feb 10 '14 at 23:16
  • I didn't say your answer _was_ a link only answer. I said it _"reads like [one]"_. I suggest you read up why we don't want link only answers in the help section. The reason being, if Microsoft updates their URLs in their site, that link breaks. – Cole Johnson Feb 19 '14 at 18:01
2

Actually, there's a reason not to depend on the fact that compiler will check that i is assigned with the same value twice, so that it's possible to replace it with single assignment. What if we have some expressions?

void g(int a, int b, int c, int n) {
    int i;
    // hey, compiler has to prove Fermat's theorem now!
    f(i = 1, i = (ipow(a, n) + ipow(b, n) == ipow(c, n)));
}
polkovnikov.ph
  • 5,241
  • 4
  • 40
  • 71
  • 1
    Don't need to prove Fermat's theorem: just assign `1` to `i`. Either both arguments assign `1` and this does the "right" thing, or the arguments assign different values and it's undefined behavior so our choice is still permitted. –  Aug 08 '15 at 03:07