48

There's been some debate going on in this question about whether the following code is legal C++:

std::list<item*>::iterator i = items.begin();
while (i != items.end())
{
    bool isActive = (*i)->update();
    if (!isActive)
    {
        items.erase(i++);  // *** Is this undefined behavior? ***
    }
    else
    {
        other_code_involving(*i);
        ++i;
    }
}

The problem here is that erase() will invalidate the iterator in question. If that happens before i++ is evaluated, then incrementing i like that is technically undefined behavior, even if it appears to work with a particular compiler. One side of the debate says that all function arguments are fully evaluated before the function is called. The other side says, "the only guarantees are that i++ will happen before the next statement and after i++ is used. Whether that is before erase(i++) is invoked or afterwards is compiler dependent."

I opened this question to hopefully settle that debate.

Community
  • 1
  • 1
Michael Kristofik
  • 31,476
  • 15
  • 72
  • 121
  • 4
    Love the question. There's something fascinating about a language where even such trivial code as f(i++) can spark long debates about the semantics, with multiple cross-references to the standard... :) – jalf Feb 28 '09 at 16:20
  • You are really mixing two questions here: (1) whether it is generally legal to use the postincrement operator in a function call argument, and (2) whether a specific use will produce the intended result. It's always legal, but whether it will produce the desired result depends on the specific case. – nobody Feb 28 '09 at 16:31

8 Answers8

63

Quoth the C++ standard 1.9.16:

When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. (Note: Value computations and side effects associated with the different argument expressions are unsequenced.)

So it would seem to me that this code:

foo(i++);

is perfectly legal. It will increment i and then call foo with the previous value of i. However, this code:

foo(i++, i++);

yields undefined behavior because paragraph 1.9.16 also says:

If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

Michael Kristofik
  • 31,476
  • 15
  • 72
  • 121
  • It ain't undefined here. It will do both, but in a undefined order. – Migol Feb 28 '09 at 15:35
  • 5
    The second one is undefined, yeah. Modifying a variable twice in the same expression is undefined, not just the order in which they're evaluated. In practical terms, both may increment the original i, rather than the one updated by the other increment. Nothing wrong with the first one. – jalf Feb 28 '09 at 15:47
  • Wouldn't foo(++i, ++i) be more undefined ? – Benoît Mar 01 '09 at 09:55
  • 24
    How can it be "more undefined"? – Ed S. Mar 01 '09 at 10:55
  • @Ed hehe exactly, how can 0 be more of a 0 – Ólafur Waage Mar 01 '09 at 14:03
  • I always thought that `foo(i++)` is fine too, but I recently came across this MSDN page: https://docs.microsoft.com/en-us/cpp/cpp/postfix-increment-and-decrement-operators-increment-and-decrement. `When a postfix operator is applied to a function argument, the value of the argument is not guaranteed to be incremented or decremented before it is passed to the function.`. What does it mean? Does it say that `foo(i++)` is undefined after all? – Valentin Nov 05 '17 at 16:39
  • @Valentin I don't know which version of the standard the MSDN page refers to, but the current standard says the exact opposite. Paragraph 4.6.18 of the current working draft says `When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.` – Michael Kristofik Nov 15 '17 at 20:05
  • @Valentin: Perhaps it is talking about a parameter’s initialization potentially observing the old value if it has another way of referring to the variable being incremented/decremented? – Davis Herring Jul 12 '19 at 15:43
14

To build on Kristo's answer,

foo(i++, i++);

yields undefined behavior because the order that function arguments are evaluated is undefined (and in the more general case because if you read a variable twice in an expression where you also write it, the result is undefined). You don't know which argument will be incremented first.

int i = 1;
foo(i++, i++);

might result in a function call of

foo(2, 1);

or

foo(1, 2);

or even

foo(1, 1);

Run the following to see what happens on your platform:

#include <iostream>

using namespace std;

void foo(int a, int b)
{
    cout << "a: " << a << endl;
    cout << "b: " << b << endl;
}

int main()
{
    int i = 1;
    foo(i++, i++);
}

On my machine I get

$ ./a.out
a: 2
b: 1

every time, but this code is not portable, so I would expect to see different results with different compilers.

Bill the Lizard
  • 369,957
  • 201
  • 546
  • 842
  • 1
    It might also result in foo(2,2). You don't know which value of i each increment uses, and it might use the original value in both. In theory, it could result in foo(42, -263) as well, but we're probably not likely to see that particular result from a real world compiler. ;) – jalf Feb 28 '09 at 15:50
  • oh wait, aren't we off by one in all the examples? Shouldn't it be foo(1,1), foo(1,2) and foo(2,1)? – jalf Feb 28 '09 at 15:51
  • Are you sure? Won't the function call always be foo(1,1)? The question is what value i will have afterwards. – Steve Rowe Feb 28 '09 at 15:52
  • @Steve Rowe: Suppose the leftmost incrmeent is evaluated first, yielding 1 as the first argument to the function, and setting i to 2. Then the rightmost increment is evaluated, yielding 2 as the second argument, and setting the final i to 3. Then foo gets called with (1,2), and i=3 afterwards. – jalf Feb 28 '09 at 16:02
  • @Bill: No, they don't need to have different values, It's undefined (http://www.research.att.com/~bs/bs_faq2.html#evaluation-order). There's no guarantee that they'll be evaluated sequentially (or at all, if we want to be pedantic). – jalf Feb 28 '09 at 16:16
  • @jalf: Thanks for catching both my mistakes. – Bill the Lizard Feb 28 '09 at 16:17
  • I believe Steve is right. There can never be (2,2) only (1,1) perhaps. – Milan Babuškov Mar 01 '09 at 11:31
  • Your link to ATT Research on evaluation of arguments is broken. :( – Galaxy Jun 24 '18 at 21:20
  • @Galaxy Thank you for pointing that out. Unfortunately, I can't find where that page was moved to, so I had to remove the reference. – Bill the Lizard Jun 25 '18 at 00:36
  • You understate the breadth of **undefined** behavior. The optimizer may just eliminate all code that leads to this call. – Davis Herring Jul 12 '19 at 15:45
5

The standard says the side effect happens before the call, so the code is the same as:

std::list<item*>::iterator i_before = i;

i = i_before + 1;

items.erase(i_before);

rather than being:

std::list<item*>::iterator i_before = i;

items.erase(i);

i = i_before + 1;

So it is safe in this case, because list.erase() specifically doesn't invalidate any iterators other than the one erased.

That said, it's bad style - the erase function for all containers returns the next iterator specifically so you don't have to worry about invalidating iterators due to reallocation, so the idiomatic code:

i = items.erase(i);

will be safe for lists, and will also be safe for vectors, deques and any other sequence container should you want to change your storage.

You also wouldn't get the original code to compile without warnings - you'd have to write

(void)items.erase(i++);

to avoid a warning about an unused return, which would be a big clue that you're doing something odd.

Pete Kirkham
  • 46,814
  • 5
  • 86
  • 159
  • erase() returns an iterator for sequence containers, but not for associative containers like set and map. – bk1e Feb 28 '09 at 17:36
  • Added 'sequence' to the post. Some implementations do have the return, but it's not standard. – Pete Kirkham Feb 28 '09 at 17:46
  • In the first example, it should be items.erase(i_before); – jmucchiello Mar 01 '09 at 09:47
  • Surely you'd need to set warning levels very high to be warned about unused return values? Have to confess I've never bothered with that -- it requires prepending "(void)" to a helluva lot of perfectly valid code. Or am I missing something? – j_random_hacker Mar 03 '09 at 10:31
  • Probably not that much valid code; I don't find in onerous and have found otherwise hidden bugs due to it. – Pete Kirkham Mar 03 '09 at 11:03
3

It's perfectly OK. The value passed would be the value of "i" before the increment.

Himadri Choudhury
  • 9,389
  • 6
  • 36
  • 43
3

++Kristo!

The C++ standard 1.9.16 makes a lot of sense with respect to how one implements operator++(postfix) for a class. When that operator++(int) method is called, it increments itself and returns a copy of the original value. Exactly as the C++ spec says.

It's nice to see standards improving!


However, I distinctly remember using older (pre-ANSI) C compilers wherein:

foo -> bar(i++) -> charlie(i++);

Did not do what you think! Instead it compiled equivalent to:

foo -> bar(i) -> charlie(i); ++i; ++i;

And this behavior was compiler-implementation dependent. (Making porting fun.)


It's easy enough to test and verify that modern compilers now behave correctly:

#define SHOW(S,X)  cout << S << ":  " # X " = " << (X) << endl

struct Foo
{
  Foo & bar(const char * theString, int theI)
    { SHOW(theString, theI);   return *this; }
};

int
main()
{
  Foo f;
  int i = 0;
  f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);
  SHOW("END ",i);
}


Responding to comment in thread...

...And building on pretty much EVERYONE's answers... (Thanks guys!)


I think we need spell this out a bit better:

Given:

baz(g(),h());

Then we don't know whether g() will be invoked before or after h(). It is "unspecified".

But we do know that both g() and h() will be invoked before baz().

Given:

bar(i++,i++);

Again, we don't know which i++ will be evaluated first, and perhaps not even whether i will be incremented once or twice before bar() is called. The results are undefined! (Given i=0, this could be bar(0,0) or bar(1,0) or bar(0,1) or something really weird!)


Given:

foo(i++);

We now know that i will be incremented before foo() is invoked. As Kristo pointed out from the C++ standard section 1.9.16:

When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function. [ Note: Value computations and side effects associated with different argument expressions are unsequenced. -- end note ]

Though I think section 5.2.6 says it better:

The value of a postfix ++ expression is the value of its operand. [ Note: the value obtained is a copy of the original value -- end note ] The operand shall be a modifiable lvalue. The type of the operand shall be an arithmetic type or a pointer to a complete effective object type. The value of the operand object is modified by adding 1 to it, unless the object is of type bool, in which case it is set to true. [ Note: this use is deprecated, see Annex D. -- end note ] The value computation of the ++ expression is sequenced before the modification of the operand object. With respect to an indeterminately-sequenced function call, the operation of postfix ++ is a single evaluation. [ Note: Therefore, a function call shall not intervene between the lvalue-to-rvalue conversion and the side effect associated with any single postfix ++ operator. -- end note ] The result is an rvalue. The type of the result is the cv-unqualified version of the type of the operand. See also 5.7 and 5.17.

The standard, in section 1.9.16, also lists (as part of its examples):

i = 7, i++, i++;    // i becomes 9 (valid)
f(i = -1, i = -1);  // the behavior is undefined

And we can trivially demonstrate this with:

#define SHOW(X)  cout << # X " = " << (X) << endl
int i = 0;  /* Yes, it's global! */
void foo(int theI) { SHOW(theI);  SHOW(i); }
int main() { foo(i++); }

So, yes, i is incremented before foo() is invoked.


All this makes a lot of sense from the perspective of:

class Foo
{
public:
  Foo operator++(int) {...}  /* Postfix variant */
}

int main() {  Foo f;  delta( f++ ); }

Here Foo::operator++(int) must be invoked prior to delta(). And the increment operation must be completed during that invocation.


In my (perhaps overly complex) example:

f . bar("A",i) . bar("B",i++) . bar("C",i) . bar("D",i);

f.bar("A",i) must be executed to obtain the object used for object.bar("B",i++), and so on for "C" and "D".

So we know that i++ increments i prior to calling bar("B",i++) (even though bar("B",...) is invoked with the old value of i), and therefore i is incremented prior to bar("C",i) and bar("D",i).


Getting back to j_random_hacker's comment:

j_random_hacker writes:
+1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"?

This question is a lot more complicated than you might think...

Rewriting your question as code...

int bar(const char * theString, int theI) { SHOW(...);  return i; }

bar("A",i)   ^   bar("B",i++)   ^   bar("C",i)   ^   bar("D",i);

Now we have only ONE expression. According to the standard (section 1.9, page 8, pdf page 20):

Note: operators can be regrouped according to the usual mathematical rules only where the operators really are associative or commutative.(7) For example, in the following fragment: a=a+32760+b+5; the expression statement behaves exactly the same as: a=(((a+32760)+b)+5); due to the associativity and precedence of these operators. Thus, the result of the sum (a+32760) is next added to b, and that result is then added to 5 which results in the value assigned to a. On a machine in which overflows produce an exception and in which the range of values representable by an int is [-32768,+32767], the implementation cannot rewrite this expression as a=((a+b)+32765); since if the values for a and b were, respectively, -32754 and -15, the sum a+b would produce an exception while the original expression would not; nor can the expression be rewritten either as a=((a+32765)+b); or a=(a+(b+32765)); since the values for a and b might have been, respectively, 4 and -8 or -17 and 12. However on a machine in which overflows do not produce an exception and in which the results of overflows are reversible, the above expression statement can be rewritten by the implementation in any of the above ways because the same result will occur. -- end note ]

So we might think that, due to precedence, that our expression would be the same as:

(
       (
              ( bar("A",i) ^ bar("B",i++)
              )
          ^  bar("C",i)
       )
    ^ bar("D",i)
);

But, because (a^b)^c==a^(b^c) without any possible overflow situations, it could be rewritten in any order...

But, because bar() is being invoked, and could hypothetically involve side effects, this expression cannot be rewritten in just any order. Rules of precedence still apply.

Which nicely determines the order of evaluation of the bar()'s.

Now, when does that i+=1 occur? Well it still has to occur before bar("B",...) is invoked. (Even though bar("B",....) is invoked with the old value.)

So it's deterministically occurring before bar(C) and bar(D), and after bar(A).

Answer: NO. We will always have "A=0, B=0, C=1, D=1", if the compiler is standards-compliant.


But consider another problem:

i = 0;
int & j = i;
R = i ^ i++ ^ j;

What is the value of R?

If the i+=1 occurred before j, we'd have 0^0^1=1. But if the i+=1 occurred after the whole expression, we'd have 0^0^0=0.

In fact, R is zero. The i+=1 does not occur until after the expression has been evaluated.


Which I reckon is why:

i = 7, i++, i++; // i becomes 9 (valid)

Is legal... It has three expressions:

  • i = 7
  • i++
  • i++

And in each case, the value of i is changed at the conclusion of each expression. (Before any subsequent expressions are evaluated.)


PS: Consider:

int foo(int theI) { SHOW(theI);  SHOW(i);  return theI; }
i = 0;
int & j = i;
R = i ^ i++ ^ foo(j);

In this case, i+=1 has to be evaluated before foo(j). theI is 1. And R is 0^0^1=1.

Community
  • 1
  • 1
Mr.Ree
  • 7,990
  • 25
  • 28
  • +1, but I had to read the standard carefully to convince myself that this was OK. Am I right in thinking that, if bar() was instead a global function returning say int, f was an int, and those invocations were connected by say "^" instead of ".", then any of A, C and D could report "0"? – j_random_hacker Mar 03 '09 at 10:25
  • Responded by amending original post. (It took more than 300 chars.) – Mr.Ree Mar 05 '09 at 18:14
  • The `R` examples have undefined behavior due to reading and incrementing `i` without sequencing. – Davis Herring Jul 12 '19 at 15:49
1

To build on MarkusQ's answer: ;)

Or rather, Bill's comment to it:

(Edit: Aw, the comment is gone again... Oh well)

They're allowed to be evaluated in parallel. Whether or not it happens in practice is technically speaking irrelevant.

You don't need thread parallelism for this to occur though, just evaluate the first step of both (take the value of i) before the second (increment i). Perfectly legal, and some compilers may consider it more efficient than fully evaluating one i++ before starting on the second.

In fact, I'd expect it to be a common optimization. Look at it from an instruction scheduling point of view. You have the following you need to evaluate:

  1. Take the value of i for the right argument
  2. Increment i in the right argument
  3. Take the value of i for the left argument
  4. Increment i in the left argument

But there's really no dependency between the left and the right argument. Argument evaluation happens in an unspecified order, and need not be done sequentially either (which is why new() in function arguments is usually a memory leak, even when wrapped in a smart pointer) It's also undefined what happens when you modify the same variable twice in the same expression. We do have a dependency between 1 and 2, however, and between 3 and 4. So why would the compiler wait for 2 to complete before computing 3? That introduces added latency, and it'll take even longer than necessary before 4 becomes available. Assuming there's a 1 cycle latency between each, it'll take 3 cycles from 1 is complete until the result of 4 is ready and we can call the function.

But if we reorder them and evaluate in the order 1, 3, 2, 4, we can do it in 2 cycles. 1 and 3 can be started in the same cycle (or even merged into one instruction, since it's the same expression), and in the following, 2 and 4 can be evaluated. All modern CPU's can execute 3-4 instructions per cycle, and a good compiler should try to exploit that.

jalf
  • 229,000
  • 47
  • 328
  • 537
  • 1
    What's the point of analyzing what a compiler is likely to do with something undefined? If you want speed, wouldn't it work better just to emit no code for undefined expressions? – David Thornley Mar 05 '09 at 18:16
  • What the compiler does with undefined expressions doesn't matter speedwise or otherwise, because nothing undefined should be in your code in the first place. If your code invokes undefined behavior, you've already lost. ;) – jalf Mar 05 '09 at 23:32
  • The point in my post was simply to illustrate a way in which the expression could yield a "surprising" value, and a reason why the compiler might actually choose to do this instead of the more predictable results. Of course relying on any of this would be stupid. :) – jalf Mar 05 '09 at 23:33
0

Sutter's Guru of the Week #55 (and the corresponding piece in "More Exceptional C++") discusses this exact case as an example.

According to him, it is perfectly valid code, and in fact a case where trying to transform the statement into two lines:

items.erase(i);
i++;

does not produce code that is semantically equivalent to the original statement.

Boojum
  • 6,326
  • 1
  • 28
  • 32
-1

To build on Bill the Lizard's answer:

int i = 1;
foo(i++, i++);

might also result in a function call of

foo(1, 1);

(meaning that the actuals are evaluated in parallel, and then the postops are applied).

-- MarkusQ

MarkusQ
  • 21,092
  • 3
  • 53
  • 67
  • I think it was supposed to be foo(1,1). Bill had an off by one error earlier, and I think MarkusQ just copied that in his post – jalf Mar 01 '09 at 12:32
  • 3
    It might also result in the computer turning into a stuffed panda, according to the standard. Undefined is undefined, and arguing about what one particular implementation is likely to do is pointless. – David Thornley Mar 05 '09 at 18:18
  • @DavidThornley Wish there would be implementation turning PC into stuffed pandas... – cerkiewny Jul 08 '14 at 09:05