6

I created the following class to understand the behavior of std::sort:

class X {
public:
  X(int i) : i_(i) { }
  X(X&& rhs) noexcept : i_(std::move(rhs.i_)) { mc_++; }
  X& operator=(X&& rhs) noexcept {
    i_ = std::move(rhs.i_); ao_++; return *this;
  }
  void swap(X& rhs) noexcept { std::swap(i_, rhs.i_); sw_++; } 
  friend bool operator<(const X& lhs, const X& rhs) {
    return lhs.i_ < rhs.i_;
  }
  static void reset() { mc_ = ao_ = sw_ = 0; }
private:
  int i_;
  static size_t mc_, ao_, sw_; // function-call counters
};
// void swap(X& lhs, X& rhs) { lhs.swap(rhs); }

And run the following benchmark code:

int main() {
  std::vector<X> v;
  for (int i = 0; i < 1000000; i++) v.emplace_back(i);

  std::mt19937 g(0xa41bc9); // fixed seed to compare measurements
  std::shuffle(v.begin(), v.end(), g);

  X::reset();
  std::sort(std::begin(v), std::end(v));
}

The whole code in online IDE is here: https://wandbox.org/permlink/nbwRKptakgCSHK4f.

The measured number of calls of particular functions are as follows (all with -O2 or /O2 flags):

   function:   move ctor    operator=        swap
  GCC 7.1.0:   5,007,335   11,700,048           0
clang 4.0.0:   4,932,061    9,973,899           0
 MSVC 19.11:   8,580,356   21,521,211           0

If I uncomment the swap function, situation gets better:

   function:   move ctor    operator=        swap
  GCC 7.1.0:     999,999    3,685,376   4,007,336
clang 4.0.0:      72,554      254,885   4,859,507
 MSVC 19.11:     906,593    6,173,685   7,673,763

However, there still remains a lot of calls of the move constructor (plus destructor) and the move assignment operator. What bothers me is the efficiency. For instance, calls of swap and operator= can be inlined by the compiler, but I guess the compiler may not "inline" (optimize out) the construction/destruction of objects.

Why is construction/assignment of objects used in sorting? In-place sorting (which usually std::sort does) can be implemented purely by compare and swap operations.

UPDATE

My assumption was wrong. It seems perfectly legal to optimize out object creation/destruction, such as in:

X temp = std::move(x1);  
x1 = std::move(x2);
x2 = std::move(temp);

Therefore, such code can be as efficient as custom swap. Online example: https://godbolt.org/g/ud4u9U - there are no calls of move constructor / assignment operator, though these are not trivial, and their functionality is inlined into main.

Daniel Langr
  • 18,256
  • 1
  • 39
  • 74
  • Don't think this is a duplicate. There is nothing about _"Why isn't `std::sort` implemented purely by `swap`?"_ in the referred post. – Daniel Langr Aug 10 '17 at 21:54
  • Were you expecting it to call your `swap` method in the original version? `std::sort` isn't defined to look for `swap` methods. – user2357112 supports Monica Aug 10 '17 at 21:55
  • @user2357112 No, I were not. I added this measurement just for comparison. – Daniel Langr Aug 10 '17 at 21:56
  • 3
    Potential dupe https://stackoverflow.com/q/14212701/3002139 – Baum mit Augen Aug 10 '17 at 21:56
  • 2
    "calls of swap and operator= can be inlined by the compiler, but I guess the compiler may not "inline" (optimize out) the construction/destruction of objects" - why would you expect that? – user2357112 supports Monica Aug 10 '17 at 21:58
  • @BaummitAugen Thanks, this seems to be a reason. – Daniel Langr Aug 10 '17 at 21:59
  • 1
    "For instance, calls of `swap` and `operator=` can be inlined by the compiler, but I guess the compiler may not "inline" (optimize out) the construction/destruction of objects." -- Why wouldn't the compiler be able to inline the constructor and destructor? –  Aug 10 '17 at 21:59
  • 1
    @DanielLangr If you agree it's a dupe tell me, I can hammer. – Baum mit Augen Aug 10 '17 at 22:00
  • Likely wrong assumption. I meant that the object needs to be created (in memory?) while swapping of numbers, pointers, etc. can be done using registers. – Daniel Langr Aug 10 '17 at 22:00
  • @BaummitAugen Ok, I agree. – Daniel Langr Aug 10 '17 at 22:03
  • @user2357112 @hvd Namely, `swap(x1, x2);` needs to touch only 2 memory locations (`x1.i_` and `x2.i_`). Is it true for `X temp = std::move(x1); x1 = std::move(x2); x2 = std::move(temp);` as well? I am not sure wether the object `temp` must be physically created on the stack. – Daniel Langr Aug 10 '17 at 22:09
  • I don't know the details here, but `std::sort` is probably fairly optimized. I would guess they tried if that helps. – Baum mit Augen Aug 10 '17 at 22:11
  • @BaummitAugen It seems that compilers can completely optimize away object creation/destruction in such cases, see https://godbolt.org/g/ud4u9U. – Daniel Langr Aug 11 '17 at 06:13

1 Answers1

4

std::sort() is a hybrid algorithm. While a vanilla quicksort could get away with only swap() operations (from std::partition()) the real sorting approach most likely uses insertion, heap, and/or merge sort, too. For these sorting algorithms it is generally more effective to lift an object out of the way (by move construction), moving objects into the current "hole", and finally moving the out of place ibject out of the way. It may be reasonable to keep just one temporary object but most likely the algorithm uses some functions and keeping on temporary ibject around is somewhat impractical (although something I hadn't contemplated before and possibly worth trying).

Earlier this year my Quicker Sorting talk got recorded at the Italian C++ Conference: it goes over the details of making quicksort quick.

The upshot is: if you want to sort objects you better make sure that copy/move construction, copy/move assignment, destructor, and swap() are fast. I can imagine that keeping a temporary object could lessen the need for construction and destruction but assignments will remain. A dedicated destructive move could possibly improve performance, too, but I haven't experimented with that (yet).

Dietmar Kühl
  • 141,209
  • 12
  • 196
  • 356
  • Insertion sort and heapsort can be implemented with compare&swap operations only as well (see, e.g., https://github.com/DanielLangr/AQsort/blob/master/include/impl/sequential_sort.h). But I see your point and understand that move can be sometimes more efficient, such as with `_GLIBCXX_MOVE_BACKWARD3`. – Daniel Langr Aug 11 '17 at 06:29