How to actually implement the rule of five?

Question

UPDATE at the bottom

q1: How would you implement the rule of five for a class that manages rather heavy resources, but of which you want it to be passed around by value because that greatly simplifies and beautifies it's usage? Or are not all five items of the rule even needed?

In practice, I'm starting something with 3D imaging where an image is usually 128*128*128 doubles. Being able though to write things like this would make the math alot easier:

Data a = MakeData();
Data c = 5 * a + ( 1 + MakeMoreData() ) / 3;

q2: Using a combination of copy elision / RVO / move semantics the compiler should be able to this this with a minimum of copying, no?

I tried to figure out how to do this so I started with the basics; suppose an object implementing the traditional way of implementing copy and assignment:

class AnObject
{
public:
  AnObject( size_t n = 0 ) :
    n( n ),
    a( new int[ n ] )
  {}
  AnObject( const AnObject& rh ) :
    n( rh.n ),
    a( new int[ rh.n ] )
  {
    std::copy( rh.a, rh.a + n, a );
  }
  AnObject& operator = ( AnObject rh )
  {
    swap( *this, rh );
    return *this;
  }
  friend void swap( AnObject& first, AnObject& second )
  {
    std::swap( first.n, second.n );
    std::swap( first.a, second.a );
  }
  ~AnObject()
  {
    delete [] a;
  }
private:
  size_t n;
  int* a;
};

Now enter rvalues and move semantics. As far as I can tell this would be a working implementation:

AnObject( AnObject&& rh ) :
  n( rh.n ),
  a( rh.a )
{
  rh.n = 0;
  rh.a = nullptr;
}

AnObject& operator = ( AnObject&& rh )
{
  n = rh.n;
  a = rh.a;
  rh.n = 0;
  rh.a = nullptr;
  return *this;
}

However the compiler (VC++ 2010 SP1) is not too happy with this, and compilers are usually correct:

AnObject make()
{
  return AnObject();
}

int main()
{
  AnObject a;
  a = make(); //error C2593: 'operator =' is ambiguous
}

q3: How to solve this? Going back to AnObject& operator = ( const AnObject& rh ) certainly fixes it but don't we lose a rather important optimization opportunity?

Apart from that, it's clear that the code for the move constructor and assignment is full of duplication. So for now we forget about the ambiguity and try to solve this using copy and swap but now for rvalues. As explained here we wouldn't even need a custom swap but instead have std::swap do all the work, which sounds very promising. So I wrote the following, hoping std::swap would copy construct a temporary using the move constructor, then swap it with *this:

AnObject& operator = ( AnObject&& rh )
{
  std::swap( *this, rh );
  return *this;
}

But that doesn't work out and instead leads to a stack overflow due to infinite recursion since std::swap calls our operator = ( AnObject&& rh ) again. q4: Can someone provide an example of what is meant in the example then?

We can solve this by providing a second swap function:

AnObject( AnObject&& rh )
{
  swap( *this, std::move( rh ) );
}

AnObject& operator = ( AnObject&& rh )
{
  swap( *this, std::move( rh ) );
  return *this;
}

friend void swap( AnObject& first, AnObject&& second )
{
  first.n = second.n;
  first.a = second.a;
  second.n = 0;
  second.a = nullptr;
}

Now there's almost twice the amount code, however the move part of it pays of by alowing pretty cheap moving; but on the other hand the normal assignment can't benefit from copy elision anymore. At this point I'm really confused though, and not seeing anymore what's right and wrong, so I'm hoping to get some input here..

UPDATE So it seems there are two camps:

one saying to skip the move assignment operator and continue doing what C++03 taught us, ie write a single assignment operator that passes the argument by value.
the other one saying to implement the move assignment operator (after all, it's C++11 now) and have the copy assignment operator take its argument by reference.

(ok and there's the 3rd camp telling me to use a vector, but that's sort of out of scope for this hypothetical class. Ok in real life I would use a vector, and there would be also other members, but since the move constructor/assignment are not automatically generated (yet?) the question would still hold)

Unfortunately I cannot test both implementations in a real world scenario since this project has just started and the way the data will actually flow is not known yet. So I simply implemented both of them, added counters for allocation etc and ran a couple of iterations of approx. this code, where T is one of the implementations:

template< class T >
T make() { return T( narraySize ); }

template< class T >
void assign( T& r ) { r = make< T >(); }

template< class T >
void Test()
{
  T a;
  T b;
  for( size_t i = 0 ; i < numIter ; ++i )
  {
    assign( a );
    assign( b );
    T d( a );
    T e( b );
    T f( make< T >() );
    T g( make< T >() + make< T >() );
  }
}

Either this code is not good enough to test what I'm after, or the compiler is just too smart: doesn't matter what I use for arraySize and numIter, the results for both camps are pretty much identical: same number of allocations, very slight variations in timing but no reproducable significant difference.

So unless someone can point to a better way to test this (given that the actual usage scnearios are not known yet), I'll have to conclude that it doesn't matter and hence is left to the taste of the developper. In which case I'd pick #2.

I believe that with C++0x, for many objects, there is no point in defining copy semantics at all. The best way to avoid expensive copies is to forbid them totally. Define a nothrow `swap` function, a move constructor, and implement move assigment with "move and swap". — Alexandre C., May 12 '11 at 10:15
Any elision optimisations that apply to copies also apply to moves so there's no reason to not write a move operator. Consider copy-on-write implicit sharing to get the best of both worlds w.r.t. value-vs-reference semantics. — spraff, Oct 31 '11 at 13:32

Howard Hinnant · Accepted Answer · 2011-05-12T12:12:56.157

17

You've missed a significant optimization in your copy assignment operator. And subsequently the situation has gotten confused.

  AnObject& operator = ( const AnObject& rh )
  {
    if (this != &rh)
    {
      if (n != rh.n)
      {
         delete [] a;
         n = 0;
         a = new int [ rh.n ];
         n = rh.n;
      }
      std::copy(rh.a, rh.a+n, a);
    }
    return *this;
  }

Unless you really never think you'll be assigning AnObjects of the same size, this is much better. Never throw away resources if you can recycle them.

Some might complain that the AnObject's copy assignment operator now has only basic exception safety instead of strong exception safety. However consider this:

Your clients can always take a fast assignment operator and give it strong exception safety. But they can't take a slow assignment operator and make it faster.
template <class T>
T&
strong_assign(T& x, T y)
{
    swap(x, y);
    return x;
}

Your move constructor is fine, but your move assignment operator has a memory leak. It should be:

  AnObject& operator = ( AnObject&& rh )
  {
    delete [] a;
    n = rh.n;
    a = rh.a;
    rh.n = 0;
    rh.a = nullptr;
    return *this;
  }

...

Data a = MakeData();
Data c = 5 * a + ( 1 + MakeMoreData() ) / 3;
q2: Using a combination of copy elision / RVO / move semantics the compiler should be able to this this with a minimum of copying, no?

You may need to overload your operators to take advantage of resources in rvalues:

Data operator+(Data&& x, const Data& y)
{
   // recycle resources in x!
   x += y;
   return std::move(x);
}

Ultimately resources ought to be created exactly once for each Data you care about. There should be no needless new/delete just for the purpose of moving things around.

edited May 12 '11 at 12:12

answered May 12 '11 at 12:00

Howard Hinnant

179,402
46
391
527

thanks, you made some good points. I guess the only way to know if this implementation is more performant than Anthony's is to measure it. Also for the operator overloads, I already figured there would be a lot of them since they should be defined both for Data objects and scalars. Possible solution here: http://stackoverflow.com/questions/2696156/how-to-reduce-redundant-code-when-adding-new-c0x-rvalue-reference-operator-over – stijn May 12 '11 at 14:25
I'm going to downvote this, because it would be trivial to make the copy assignment operator strongly exception safe and I definitely think that you should. – Puppy May 12 '11 at 18:57
3

@DeadMG: Thanks for identifying your downvote. I so strongly disagree with your position that I will gladly accept a million downvotes without changing my position. The copy/swap idiom is little more than a poor man's move assignment operator for C++03. Once you have a real move assignment operator in your toolbox, it is not logical to blindly continue along that path. Everyone interested should count allocations and measure performance. Test both lvalue and rvalue rhs's. Test your common use cases. Otoh, if you do not care about performance then move members are a waste of time. – Howard Hinnant May 13 '11 at 02:23
PS: If you accept Matthieu M. 's answer, then you implicitly get my answer. I'm telling you how `std::vector` works. – Howard Hinnant May 13 '11 at 02:37
@DeadMG How can you make the copy assignment strongly exception safe *and* have the optimization path in? I don't see a way around that. I like the tradeoff in this answer, by having an optimized operator=, and still a way to have exception safe assignments by use of a wrapper. – Johannes Schaub - litb May 14 '11 at 23:48
In your last point, wouldn't it be better to take the lhs by value? That way, you won't need two operators and with a move constructor still should get the same behaviour you want. Or am I wrong in some point? – Xeo Nov 18 '11 at 23:02
"Better" is qualitative here. Your suggestion certainly could reduce the number of overloads to one, and still be efficient in 90% of the use cases. If you really want to cover all of the use cases with as much efficiency as possible, you may need to use more overloads. E.g. what if the lhs is an lvalue and the rhs is an rvalue, and `Data` is a type such that you can reuse rhs's resources (e.g. if x+y == y+x). Which design is better is probably dependent upon a number of factors and I wouldn't want to claim one answer fits all. – Howard Hinnant Nov 18 '11 at 23:46
In your assignment operator, shouldn't you only reallocate if `n < rh.n` ? You'd need to differentiate between size and capacity then, but maybe allocation is expensive enough to allow for an extra member integer? – wilhelmtell Dec 28 '11 at 20:11
@wilhelmtell: In answering questions I try not to drag unrelated issues in. In the OP's code `n` appeared to be both the size and capacity of the allocated block and thus an invariant is that `size == capacity`. Certainly if I were designing a `std::vector`-like object, I would consider separating `size` from `capacity` and only reallocate when `capacity` is exceeded. And indeed, `std::vector` is a prime example where using the "copy/swap" idiom of assignment would be a very poor choice (imho). No implementation I'm aware of does so. – Howard Hinnant Dec 28 '11 at 20:25

Anthony Williams · Answer 2 · 2011-05-12T12:28:04.400

13

If your object is resource-heavy, you might want to avoid copying altogether, and just provide the move constructor and move assignment operator. However, if you really want copying too, it is easy to provide all of the operations.

Your copy operations look sensible, but your move operations don't. Firstly, though an rvalue reference parameter will bind to an rvalue, within the function it is an lvalue, so your move constructor ought to be:

AnObject( AnObject&& rh ) :
  n( std::move(rh.n) ),
  a( std::move(rh.a) )
{
  rh.n = 0;
  rh.a = nullptr;
}

Of course, for fundamental types like you've got here it doesn't actually make a difference, but it's as well to get in the habit.

If you provide a move-constructor, then you don't need a move-assignment operator when you define copy-assignment like you have --- because you accept the parameter by value, an rvalue will be moved into the parameter rather than copied.

As you found, you can't use std::swap() on the whole object inside a move-assignment operator, since that will recurse back into the move-assignment operator. The point of the comment in the post you linked to is that you don't need to implement a custom swap if you provide move operations, as std::swap will use your move operations. Unfortunately, if you don't define a separate move assignment operator this doesn't work, and will still recurse. You can of course use std::swap to swap the members:

AnObject& operator=(AnObject other)
{
    std::swap(n,other.n);
    std::swap(a,other.a);
    return *this;
}

Your final class is thus:

class AnObject
{
public:
  AnObject( size_t n = 0 ) :
    n( n ),
    a( new int[ n ] )
  {}
  AnObject( const AnObject& rh ) :
    n( rh.n ),
    a( new int[ rh.n ] )
  {
    std::copy( rh.a, rh.a + n, a );
  }
  AnObject( AnObject&& rh ) :
    n( std::move(rh.n) ),
    a( std::move(rh.a) )
  {
    rh.n = 0;
    rh.a = nullptr;
  }
  AnObject& operator = ( AnObject rh )
  {
    std::swap(n,rh.n);
    std::swap(a,rh.a);
    return *this;
  }
  ~AnObject()
  {
    delete [] a;
  }
private:
  size_t n;
  int* a;
};

edited May 12 '11 at 12:28

answered May 12 '11 at 10:46

Anthony Williams

62,015
12
122
149

while I love the simplicity, this still recurses in the assignment operator. – stijn May 12 '11 at 11:04
The best way to implement the move ctor is actually to default initialize the `this` object and then just swap each member in the body of the move ctor. :) – Xeo May 13 '11 at 11:26
1

@Xeo Default-initializing `*this` will leave pointer and integer members **uninitialized**, and thus holding random values. Swapping with the other object will then likely lead to undefined behaviour. Initializing them to proper null values and then swapping leads to excessive writes. It is best to initialize by moving, and then clear out the "other", as in my code. – Anthony Williams May 13 '11 at 14:08
1

@Anthony: I was talking about delegating the move ctor to the default ctor `AnObject(AnObject&& other) : AnObject() { other.swap(*this); }` which is C++0x/11. – Xeo May 13 '11 at 14:11
@Xeo: Yes, that's legal C++0x, but it is less efficient in this case as I described in my previous comment (especially in this case, since the default constructor always calls `new[]`). In general, it might be a reasonable strategy. – Anthony Williams May 13 '11 at 14:42
@Anthony: Well, tbh, the default ctor of course shouldn't do such a thing. The OP better split that in two ctor, as allocating `0` byte is possible and well defined, but actually totally useless and a waste of time for the program. :) – Xeo May 13 '11 at 14:47
If `AnObject` aggregated something with a throwing copy ctor (instead of an `int`) `AnObject`'s copy ctor would not be exception safe. – deft_code Nov 19 '11 at 00:37
@deft_code Yes, that's true. If your circumstances are different then you can't just blindly copy code, you have to pay attention to things like that. – Anthony Williams Nov 21 '11 at 08:20

score 4 · Answer 3 · answered May 12 '11 at 12:28

Let me help you:

#include <vector>

class AnObject
{
public:
  AnObject( size_t n = 0 ) : data(n) {}

private:
  std::vector<int> data;
};

From the C++0x FDIS, [class.copy] note 9:

If the definition of a class X does not explicitly declare a move constructor, one will be implicitly declared as defaulted if and only if

X does not have a user-declared copy constructor,

X does not have a user-declared copy assignment operator,

X does not have a user-declared move assignment operator,

X does not have a user-declared destructor, and

the move constructor would not be implicitly defined as deleted.

[ Note: When the move constructor is not implicitly declared or explicitly supplied, expressions that otherwise would have invoked the move constructor may instead invoke a copy constructor. —end note ]

Personally, I am much more confident in std::vector correctly managing its resources and optimizing the copies / moves that in any code I could write.

the whole AnObject class was just an example, in the actual code I would not use it as such but indeed resort to std::vector as well.. — stijn, May 12 '11 at 14:10
@stijn: I understood, however I have found that about every single time I would design my own "resources handling" class, I could find what I needed either in the STL or Boost and use it instead. The only one I could never find was one to automatically handle the Pimpl idiom, so I had to roll my own. — Matthieu M., May 13 '11 at 06:18

score 1 · Answer 4 · answered Feb 10 '17 at 13:29

With help of delegating constructor, you only need to implement each concept once;

default init
resource delete
swap
copy

the rest just uses those.

Also don't forget to make move-assignment (and swap) noexcept, if helps performance a lot if you, for example, put your class in a vector

#include <utility>

// header

class T
{
public:
    T();
    T(const T&);
    T(T&&);
    T& operator=(const T&);
    T& operator=(T&&) noexcept;
    ~T();

    void swap(T&) noexcept;

private:
    void copy_from(const T&);
};

// implementation

T::T()
{
    // here (and only here) you implement default init
}

T::~T()
{
    // here (and only here) you implement resource delete
}

void T::swap(T&) noexcept
{
    using std::swap; // enable ADL
    // here (and only here) you implement swap, typically memberwise swap
}

void T::copy_from(const T& t)
{
    if( this == &t ) return; // don't forget to protect against self assign
    // here (and only here) you implement copy
}

// the rest is generic:

T::T(const T& t)
    : T()
{
    copy_from(t);
}

T::T(T&& t)
    : T()
{
    swap(t);
}

auto T::operator=(const T& t) -> T&
{
    copy_from(t);
    return *this;
}

auto T::operator=(T&& t) noexcept -> T&
{
    swap(t);
    return *this;
}

score 1 · Answer 5 · answered May 12 '11 at 18:50

1

Since I haven't seen anyone else explicitly point this out...

Your copy assignment operator taking its argument by value is an important optimization opportunity if (and only if) it's passed an rvalue, due to copy elision. But in a class with an assignment operator that explicitly only takes rvalues (i.e., one with a move assignment operator), this is a nonsensical scenario. So, modulo the memory leaks that have already been pointed out in other answers, I'd say your class is already ideal if you simply change the copy assignment operator to take its argument by const reference.

answered May 12 '11 at 18:50

ildjarn

59,718
8
115
201

that's a good point, but I already kinda mentioned it in the quetsion: if I declare the move assignment operator I'm automatically forced to use the const reference in the copy assignment operator by the compiler. – stijn May 13 '11 at 10:52
@stijn : My point is not that you're forced to do anything; my point is that the 'optimization opportunity' becomes completely irrelevant as copy elision when calling the copy assignment constructor can never occur on a movable type. I.e., because the 'optimization opportunity' no longer exists, there's no need to fight the compiler on this one. – ildjarn May 13 '11 at 16:54

score 1 · Answer 6 · edited May 23 '17 at 11:45

q3 of the Original Poster

I think you (and some of the other responders) misunderstood what the compiler error meant, and came to the wrong conclusions because of it. The compiler thinks that the (move) assignment call is ambiguous, and it's right! You have multiple methods that are equally qualified.

In your original version of the AnObject class, your copy constructor takes in the old object by const (lvalue) reference, while the assignment operator takes its argument by (unqualified) value. The value argument is initialized by the appropriate transfer constructor from whatever was on the right-side of the operator. Since you have only one transfer constructor, that copy constructor is always used, no matter if the original right-side expression was a lvalue or a rvalue. This makes the assignment operator act as the copy-assignment special member function.

The situation changes once a move constructor is added. Whenever the assignment operator is called, there are two choices for the transfer constructor. The copy constructor will still be used for lvalue expressions, but the move constructor will be used whenever a rvalue expression is given instead! This makes the assignment operator simultaneously act as the move-assignment special member function.

When you added a traditional move-assignment operator, you gave the class two versions of the same special member function, which is an error. You already have what you wanted, so just get rid of the traditional move-assignment operator, and no other changes should be needed.

In the two camps listed in your update, I guess I'm technically in the first camp, but for entirely different reasons. (Don't skip the (traditional) move-assignment operator because it's "broken" for your class, but because it's superfluous.)

BTW, I'm new to reading about C++11 and StackOverflow. I came up with this answer from browsing another S.O. question before seeing this one. (Update: Actually, I still had the page open. The link goes to the specific response by FredOverflow that shows the technique.)

About the 2011-May-12 Response by Howard Hinnant

(I'm too much of a newbie to directly comment on responses.)

You don't need to explicitly check for self-assignment if a later test would already cull it. In this case, n != rh.n would already take care of most of it. However, the std::copy call is outside of that (currently) inner if, so we would get n component-level self-assignments. It's up to you to decide if those assignments would be too anti-optimal even though self-assignment should be rare.

How to actually implement the rule of five?

6 Answers6

Linked

Related