What is the rationale for self-assignment-unsafe move assignment operators in the standard library?

Question

The standard library policy about move assignment is that the implementation is allowed to assume that self-assignment will never happen; this seems to me a really bad idea, given that:

the "regular" ("copy") assignment contract in C++ has always been regarded as safe against self-assignment; now we have yet another incoherent corner case of C++ to remember and to explain - and a subtly dangerous one, too; I think we all agree that what is needed in C++ is not more hidden traps;
it complicates algorithms - anything in the remove_if family need to take care of this corner case;
it would be really easy to fulfil this requirement - where you implement move with swap it comes for free, and even in other cases (where you can get some performance boost with ad-hoc logic) it's just a single, (almost) never taken branch, which is virtually free on any CPU¹; also, in most interesting cases (moves involving parameters or locals) the branch would be removed completely by the optimizer when inlining (which should happen almost always for "simple" move assignment operators).

So, why such a decision?

¹ Especially in library code, where implementers can liberally exploit compiler-specific hints about "branch expected outcome" (think __builtin_expect in gcc/__assume in VC++).

Found [this answer](http://stackoverflow.com/a/9322542/4342498). It is a good read but really does not answer the question. — NathanOliver, Oct 04 '16 at 12:43
@NathanOliver: yep, I already read that, and unfortunately it left me with more doubts than before =( — Matteo Italia, Oct 04 '16 at 12:55
[LWG 2468](http://cplusplus.github.io/LWG/lwg-active.html#2468)'s current direction is to say that self-move-assignment results in valid but unspecified state. — T.C., Oct 04 '16 at 22:39
@T.C.: the rationale being...? Are they really so afraid of this little branch to prefer a concrete risk of subtle errors? — Matteo Italia, Oct 04 '16 at 23:34
@MatteoItalia: "*where you implement move with swap*" Which is a thing you should never do, so I'm not really sure what your point is. Swap, the *far less common* operation, is what should be implemented in terms of move. Not the other way around. — Nicol Bolas, Oct 05 '16 at 04:08
@NicolBolas: the "big five" are easily implemented in terms of copies and swaps, generalizing the well-known pattern traditionally used for the "big three" (with nice exception safety behavior for free). If you *measure* that it's a performance problem, then we can check alternatives. It seems that everybody here is *extremely* keen of premature optimization. — Matteo Italia, Oct 05 '16 at 05:13
A performance test of the copy swap idiom when you have a `vector` as a data member: http://www.youtube.com/watch?v=vLinb2fgkHk&t=35m30s `string` data members will behave similarly. Data members which are classes containing strings and vectors will behave similarly. — Howard Hinnant, Oct 12 '16 at 17:16

score 5 · Accepted Answer · answered Oct 04 '16 at 12:29

5

Moved from objects in std are supposed to be discarded or assigned to prior to being reused. Anything that is not completely free beyond that is not promised.

Sometimes things are free. Like a move-constructed-from container is empty. Note that some move-assiged-from cases have no such guarantee, as some implementations may choose to move elements instead of buffers. Why the difference? One was a free extra guarantee, the other not.

A branch or other check is not completely free. It takes up a branch prediction slot, and even if predicted is merely almost free.

On top of that, a = std::move(a); is evidence of a logic error. Assign-from a (within std) means you will only assign-to or discard a. Yet here you are wanting it to have specific state on the next line. Either you know you are self-assigning, or you do not. If you do not, you are now moving from object you are also populating and you do not know it.

The principle of "do little things to keep things safe" clashes with "you do not pay for that which you do not use". In this case, the second won.

answered Oct 04 '16 at 12:29

Yakk - Adam Nevraumont

235,777
25
285
465

side note regarding: `a = std::move(a)` occurences in swap(): http://stackoverflow.com/a/13129446/717732 – quetzalcoatl Oct 04 '16 at 12:43
3

The performance excuse is completely preposterous; I'd like to see at least *one* benchmark where this has any relevance (and I can safely bet my most prized possessions that nobody in the committee ever saw such a benchmark). Most importantly, when your performance problems start being one extra branch predictor slot you stop using the STL or whatever bit of generic code you don't control completely, write your stuff from scratch tailored to your problem doing *exactly* just what you want, and check the assembly output of your critical loops at every configuration change. – Matteo Italia Oct 04 '16 at 12:48
Also, I thought your second paragraph, but that's not an answer to the question - it's more like an explanation of how this logic is implemented in standardese. My question is why someone thought that this logic is correct. The spirit of move is not "wreck havoc in the moved-from object and - if it's not too much of a hassle - maybe move its content on the lhs object", it's "move the content to the lhs object, and leave the rhs however it's more convenient". Where lhs and rhs coincide I don't think there's a doubt about what I'm asking for (hint: it's not "wreak havoc in my object"). – Matteo Italia Oct 04 '16 at 23:42
1

@MatteoItalia Yes, that is a position to take. Except then you'd have to add a "unsafe-move-from" function, because the branch/test is a non-zero cost that someone who knows they are not moving from themselves has to pay. And you shouldn't pay for what you do not use. – Yakk - Adam Nevraumont Oct 04 '16 at 23:45
Show me some benchmark where the difference is even measurable, and show me that someone took some kind of measurement of this before deciding on this defective behavior. Dangerous behavior should have some serious motivation, not idle concerns made out of thin air. Incidentally, after reading the link by T.C. I'm even more convinced that this hasn't even been actually seriously though out, given that the standard itself is not coherent with itself in the treatment of move-self-assignment. – Matteo Italia Oct 04 '16 at 23:52
Also, safety and least astonishment regularly trump "you don't pay for what you do not use". `std::vector::push_back` has to jump through hoops to make sure to work correctly when it receives a reference that is inside the vector itself, and that's a good thing. Yet, there's no `push_back_trust_me_its_not_in_the_vector`, exactly as there's no `push_back_trust_me_theres_enough_capacity`. YDPFWYDU is an important principle for the *core language*, as it allows the standard library to be "you pay for some agreed sane semantics, if you don't like it you can reimplement your fat-free version". – Matteo Italia Oct 05 '16 at 00:04
1

@MatteoItalia Can you show some realistic code that would be broken under the LWG 2468 proposal? – M.M Oct 05 '16 at 03:25
@M.M. I have two instances of this thing coming straight from production that broke yesterday. Think `std::remove_if`-like loops (the classic read pointer/write pointer loop), with some additional logic of course, because otherwise straight `remove_if` would have been use). But again, that's backwards. Things should be safe and unsurprising by default, and made unsafe if there are real performance concerns. It's two days we are talking and nobody gave a single benchmark that proves that this behavior actually gives measurable any performance advantage. – Matteo Italia Oct 05 '16 at 05:06
1

@MatteoItalia it's well known that moving leaves the source in an unspecified state, so it's not surprising (to me anyway) that self-move leaves the source in an unspecified state too. OTOH it would be surprising to me if adding an extra CPU instruction for every single move operation didn't slow the program down. (Not saying it does or doesn't, just that it would be surprising if it didn't, because common sense suggests that, one instruction is slower than zero instructions). – M.M Oct 05 '16 at 12:05
1

@M.M: re-read the second half of my first comment; it's also well known that moving leaves the destination in a well defined state (actually, that's its entire point). If source and destination are the same, the only way to respect this contract is to make it a no-op, since the destination has to have the required value, and the source being untouched is just a special case of "undefined state". – Matteo Italia Oct 06 '16 at 05:22
On the performance part, of course doing something is slower or the same than doing nothing, but here we are talking about a minuscule (if any) slowdown (a compare+well predicted branch is generally 2 cycles on architectures with a branch predictor, and in older ARMs I did some tests and it can usually be implemented with predicated instructions, which are 1 cycle each as well) in exchange for a death trap less *in a general-purpose library* (again-not core language, you can implement your faster-than-light move in your custom types). – Matteo Italia Oct 06 '16 at 05:27
Accepting because, although IMO the reasoning behind this is flawed and the performance concerns unjustified, it's most probably the actual line of thought of the committee, which is what I originally asked. – Matteo Italia Oct 12 '16 at 16:27

score 5 · Answer 2 · edited May 23 '17 at 12:06

Yakk gives a very good answer (as usual and upvoted) but this time I wanted to add just a little more information.

The policy on self-move-assignment has shifted a tiny bit over the past half-decade. We have just recently clarified this corner case in LWG 2468. Actually, I should be more precise: An informal group between meetings agreed to the resolution of this issue, and it is likely to be voted into the C++1z working draft next month (Nov 2016).

The gist of the issue is to modify the MoveAssignable requirements to clarify that if the target and source of a move assignment are the same object, then there are no requirements on the value of the object after the assignment (except that it must be a valid state). It further clarifies that if this object is being used with the std::lib, it must still meet the requirements of the algorithm (e.g. LessThanComparable) whether or not it was move-assigned or even self-move-assigned.

So...

T x, y;
x = std::move(y);  // The value of y is unspecified and x == the old y
x = std::move(x);  // The value of x is unspecified

But both x and y are still in valid states. No memory has been leaked. No undefined behavior has occurred.

Rationale For This Position

It is still performance. However it is recognized that swap(x, x) has been legal since C++98 and does occur in the wild. Furthermore since C++11 swap(x, x) performs a self move assignment on x:

T temp = std::move(x);
x = std::move(x);
x = std::move(temp);

Prior to C++11, swap(x, x) was (a rather expensive) no-op (using copy instead of move). LWG 2468 clarifies that with C++11 and after, swap(x, x) is still a (not quite as expensive) no-op (using move instead of copy).

Details:

T temp = std::move(x);
// temp now has the value of the original x, and x's value is unspecified
x = std::move(x);
// x's value is still unspecified
x = std::move(temp);
// x's value now has the value of temp, which is also the original x value

To accomplish this no-op, self-move-assignment on x can do anything it wants as long as it leaves x in a valid state without asserting or throwing an exception.

If you want to specify that for your type T self-move-assignment is a no-op, that is perfectly fine. The std::lib does exactly that for unique_ptr.

If you want to specify that for your type U self-move-assignment leaves it in a valid but unspecified state, that is also fine. The std::lib does exactly that for vector. Some implementations (I believe VS) go to the trouble to make self-move-assignment on vector a no-op. Other's don't (such as libc++).

So you can confirm me that, for the resolution, you went on with the "undefined-but-valid-state" for *performance reasons*? Can you show me the benchmarks that adviced you that the trivial solution to this problem (requiring self-move in standard containers to be a no-op) resulted in measurable and unacceptable slowdown? — Matteo Italia, Oct 05 '16 at 05:18
Changing `MoveAssignable` doesn't do much to self-move-assignment of types in the standard library though. Very few types in the standard library are explicitly required to be `MoveAssignable`. — T.C., Oct 05 '16 at 09:09
@HowardHinnant: no need to hurry, I just though that when the working group added death traps for performance reasons they had some extensive test suite backing their decision, so providing this data would have been a breeze. If it's not like this, it's a bit troubling (especially given that performance is all but intuitive these days). — Matteo Italia, Oct 06 '16 at 05:47
But most importantly: please, *please*, ***please***, either you decide to fix this thing *for real* (self assignment=no-op), or leave it illegal/undefined behavior. Making it "legally undefined" means that we keep the death trap, and we don't even get decent tooling (think libstdc++ debug STL) to diagnose it (since self-assignment becomes legal - although with death-trap semantics - adding an assertion in debug mode would break on now-legal code). — Matteo Italia, Oct 06 '16 at 05:52

What is the rationale for self-assignment-unsafe move assignment operators in the standard library?

2 Answers2

Linked