336

The std::sort algorithm (and its cousins std::partial_sort and std::nth_element) from the C++ Standard Library is in most implementations a complicated and hybrid amalgamation of more elementary sorting algorithms, such as selection sort, insertion sort, quick sort, merge sort, or heap sort.

There are many questions here and on sister sites such as https://codereview.stackexchange.com/ related to bugs, complexity and other aspects of implementations of these classic sorting algorithms. Most of the offered implementations consist of raw loops, use index manipulation and concrete types, and are generally non-trivial to analyse in terms of correctness and efficiency.

Question: how can the above mentioned classic sorting algorithms be implemented using modern C++?

  • no raw loops, but combining the Standard Library's algorithmic building blocks from <algorithm>
  • iterator interface and use of templates instead of index manipulation and concrete types
  • C++14 style, including the full Standard Library, as well as syntactic noise reducers such as auto, template aliases, transparent comparators and polymorphic lambdas.

Notes:

  • for further references on implementations of sorting algorithms see Wikipedia, Rosetta Code or http://www.sorting-algorithms.com/
  • according to Sean Parent's conventions (slide 39), a raw loop is a for-loop longer than composition of two functions with an operator. So f(g(x)); or f(x); g(x); or f(x) + g(x); are not raw loops, and neither are the loops in selection_sort and insertion_sort below.
  • I follow Scott Meyers's terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don't flame me for that.
  • As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98.
  • The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.
Community
  • 1
  • 1
TemplateRex
  • 65,583
  • 16
  • 147
  • 283
  • 8
    It would be great to add the C++Faq tag to the question, though it would require to lose at least one of the others. I would suggest removing the versions (as it is a generic C++ question, with implementations available in most versions with some adaptation). – Matthieu M. Jul 09 '14 at 12:20
  • @TemplateRex Well, technically, if it's not *FAQ* then this question is too broad (guessing - I didn't downvote). Btw. good job, lots of useful information, thanks :) – BartoszKP Jul 15 '14 at 21:15

2 Answers2

399

Algorithmic building blocks

We begin by assembling the algorithmic building blocks from the Standard Library:

#include <algorithm>    // min_element, iter_swap, 
                        // upper_bound, rotate, 
                        // partition, 
                        // inplace_merge,
                        // make_heap, sort_heap, push_heap, pop_heap,
                        // is_heap, is_sorted
#include <cassert>      // assert 
#include <functional>   // less
#include <iterator>     // distance, begin, end, next
  • the iterator tools such as non-member std::begin() / std::end() as well as with std::next() are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin() / boost::end(), and from Boost.Utility in boost::next().
  • the std::is_sorted algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_find and a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sorted as a substitute.
  • the std::is_heap algorithm is only available for C++11 and beyond.

Syntactical goodies

C++14 provides transparent comparators of the form std::less<> that act polymorphically on their arguments. This avoids having to provide an iterator's type. This can be used in combination with C++11's default function template arguments to create a single overload for sorting algorithms that take < as comparison and those that have a user-defined comparison function object.

template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++11, one can define a reusable template alias to extract an iterator's value type which adds minor clutter to the sort algorithms' signatures:

template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;

template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type syntax

template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation

template<class It>
void xxx_sort(It first, It last)
{
    xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
}
  • Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas (with auto parameters that are deduced like function template arguments).
  • C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t.
  • In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st / std::bind2nd / std::not1 type of syntax.
  • Boost.Bind improves this with boost::bind and _1 / _2 placeholder syntax.
  • C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_if with a std::not1 around a function object.

C++ Style

There is no generally acceptable C++14 style yet. For better or for worse, I closely follow Scott Meyers's draft Effective Modern C++ and Herb Sutter's revamped GotW. I use the following style recommendations:

  • Herb Sutter's "Almost Always Auto" and Scott Meyers's "Prefer auto to specific type declarations" recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.
  • Scott Meyers's "Distinguish () and {} when creating objects" and consistently choose braced-initialization {} instead of the good old parenthesized initialization () (in order to side-step all most-vexing-parse issues in generic code).
  • Scott Meyers's "Prefer alias declarations to typedefs". For templates this is a must anyway, and using it everywhere instead of typedef saves time and adds consistency.
  • I use a for (auto it = first; it != last; ++it) pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last) and a ++first somewhere inside the loop might be slightly better.

Selection sort

Selection sort does not adapt to the data in any way, so its runtime is always O(N²). However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.

To implement it using the Standard Library, repeatedly use std::min_element to find the remaining minimum element, and iter_swap to swap it into place:

template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const selection = std::min_element(it, last, cmp);
        std::iter_swap(selection, it); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

Note that selection_sort has the already processed range [first, it) sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort's random access iterators.

Details omitted:

  • selection sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;).
  • for bidirectional iterators, the above test can be combined with a loop over the interval [first, std::prev(last)), because the last element is guaranteed to be the minimal remaining element and doesn't require a swap.

Insertion sort

Although it is one of the elementary sorting algorithms with O(N²) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

To implement insertion_sort with the Standard Library, repeatedly use std::upper_bound to find the location where the current element needs to go, and use std::rotate to shift the remaining elements upward in the input range:

template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last; ++it) {
        auto const insertion = std::upper_bound(first, it, *it, cmp);
        std::rotate(insertion, it, std::next(it)); 
        assert(std::is_sorted(first, std::next(it), cmp));
    }
}

Note that insertion_sort has the already processed range [first, it) sorted as its loop invariant. Insertion sort also works with forward iterators.

Details omitted:

  • insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn't require a rotate.
  • for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library's std::find_if_not algorithm.

Four Live Examples (C++14, C++11, C++98 and Boost, C++98) for the fragment below:

using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first), 
    [=](auto const& elem){ return cmp(*it, elem); }
).base();
  • For random inputs this gives O(N²) comparisons, but this improves to O(N) comparisons for almost sorted inputs. The binary search always uses O(N log N) comparisons.
  • For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).

Quick sort

When carefully implemented, quick sort is robust and has O(N log N) expected complexity, but with O(N²) worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last) as the pivot, then use two calls to std::partition (which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;
    auto const pivot = *std::next(first, N / 2);
    auto const middle1 = std::partition(first, last, [=](auto const& elem){ 
        return cmp(elem, pivot); 
    });
    auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ 
        return !cmp(pivot, elem);
    });
    quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
    quick_sort(middle2, last, cmp);  // assert(std::is_sorted(middle2, last, cmp));
}

However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N) complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1) pivot, but which can be guaranteed if one sets the pivot as the O(N) median of the input range.

Details omitted:

  • the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2) complexity for the "organ pipe" input 1, 2, 3, ..., N/2, ... 3, 2, 1 (because the middle is always larger than all other elements).
  • median-of-3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2).
  • 3-way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partition is not the most efficient O(N) algorithm to achieve this result.
  • for random access iterators, a guaranteed O(N log N) complexity can be achieved through median pivot selection using std::nth_element(first, middle, last), followed by recursive calls to quick_sort(first, middle, cmp) and quick_sort(middle, last, cmp).
  • this guarantee comes at a cost, however, because the constant factor of the O(N) complexity of std::nth_element can be more expensive than that of the O(1) complexity of a median-of-3 pivot followed by an O(N) call to std::partition (which is a cache-friendly single forward pass over the data).

Merge sort

If using O(N) extra space is of no concern, then merge sort is an excellent choice: it is the only stable O(N log N) sorting algorithm.

It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last) and combine two recursively sorted segments with a std::inplace_merge:

template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
{
    auto const N = std::distance(first, last);
    if (N <= 1) return;                   
    auto const middle = std::next(first, N / 2);
    merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
    merge_sort(middle, last, cmp);  // assert(std::is_sorted(middle, last, cmp));
    std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge. Note that when sorting linked lists, merge sort requires only O(log N) extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort in the Standard Library.

Heap sort

Heap sort is simple to implement, performs an O(N log N) in-place sort, but is not stable.

The first loop, O(N) "heapify" phase, puts the array into heap order. The second loop, the O(N log N) "sortdown" phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:

template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
    lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));
}

In case you consider it "cheating" to use std::make_heap and std::sort_heap, you can go one level deeper and write those functions yourself in terms of std::push_heap and std::pop_heap, respectively:

namespace lib {

// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = first; it != last;) {
        std::push_heap(first, ++it, cmp); 
        assert(std::is_heap(first, it, cmp));           
    }
}

template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
    for (auto it = last; it != first;) {
        std::pop_heap(first, it--, cmp);
        assert(std::is_heap(first, it, cmp));           
    } 
}

}   // namespace lib

The Standard Library specifies both push_heap and pop_heap as complexity O(log N). Note however that the outer loop over the range [first, last) results in O(N log N) complexity for make_heap, whereas std::make_heap has only O(N) complexity. For the overall O(N log N) complexity of heap_sort it doesn't matter.

Details omitted: O(N) implementation of make_heap

Testing

Here are four Live Examples (C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).

Toby Speight
  • 23,550
  • 47
  • 57
  • 84
TemplateRex
  • 65,583
  • 16
  • 147
  • 283
  • 1
    I kinda disagree with the `auto it = first` pattern. Some iterators are not trivially copyable and I doubt that you can rely on the compiler to optimize the copy, just use the `first` iterator when possible, it is passed by value for this reason. – sbabbi Jul 09 '14 at 10:25
  • 13
    While [I disagree with your use of `auto`](http://josephmansfield.uk/articles/dont-use-auto-unless-you-mean-it.html) (and many people disagree with me), I enjoyed seeing the standard library algorithms being used well. I'd been wanting to see some examples of this kind of code after seeing Sean Parent's talk. Also, I had no idea `std::iter_swap` existed, although it seems strange to me that it's in ``. – Joseph Mansfield Jul 09 '14 at 10:29
  • @sbabbi tnx, that's a good point, let me update. Note that in `selection_sort` and `insertion_sort`, the `it` is used to do the `assert()` for the loop invariant. In production code, `while(first != last)` and a `++first` inside the loop is probably better. – TemplateRex Jul 09 '14 at 10:32
  • @JosephMansfield yes, I'm a big fan of "`auto` all the way to the bank" :-) I'll put in a comment. BTW, `iter_swap(a, b)` just does `swap(*a, *b)`, but I use the former to make the difference with `std::rotate` more transparant. – TemplateRex Jul 09 '14 at 10:35
  • HP / Microsoft std::sort() uses quick sort unless the recursion gets too deep, in which case it switches to heap sort. HP / Microsoft std::stable_sort() uses merge sort. Apparently due to a remnant of recursive code (now only one level occurs), it allocates a temp buffer half the size of the data and does a non-recursive bottom up merge sort on each half, then moves one half into the temp buffer and does a final merge. It's old code as the copyright date is 1994. – rcgldr Jul 09 '14 at 11:02
  • @rcgldr the current libstdc++/libc++ implementations are also a complicated mix of routines that adapt to the size and nature of the input data. – TemplateRex Jul 09 '14 at 11:07
  • @TemplateRex IIRC `iter_swap` is specialized for list iterators, and just ajdust some pointers instead of moving the values. – sbabbi Jul 09 '14 at 11:08
  • 32
    @sbabbi The entire standard library is based on the principle that iterators are cheap to copy; it passes them by value, for example. If copying an iterator isn't cheap, then you're going to suffer performance problems everywhere. – James Kanze Jul 09 '14 at 13:02
  • 2
    Great post. Regarding the cheating part of [std::]make_heap. If std::make_heap is considered cheating, so would std::push_heap. I.e. cheating = not implementing the actual behaviour defined for a heap structure. I would find it instructive have push_heap included as well. – Captain Giraffe Jul 09 '14 at 16:50
  • @TemplateRex - For the HP / Microsoft STL implementations, the only adaptation that I didn't mention is that if the number of elements is less than 32, then insertion sort is used. Otherwise, it's as described in my previous post, std::sort may switch from quick sort to heap sort, std::stable_sort always uses bottom up merge sort (as long as there are 32 or more elements). – rcgldr Jul 09 '14 at 17:07
  • @CaptainGiraffe after studying the libc++/libstdc++ implementations of `push_heap` and `pop_heap`, it looks to me as if they are primitive algorithms, not easily expressable into smaller Standard algorithms. – TemplateRex Jul 09 '14 at 17:18
  • @TemplateRex: Nobody seems to have mentioned it, but a quicksort that has a limit on depth then switches to heapsort (often switching to insertion sort for small sets) is called Introsort: http://en.wikipedia.org/wiki/Introsort This is what's used in C++ in every library I know of – Mooing Duck Jul 09 '14 at 21:16
  • @MooingDuck correct, and it is discused in [the first link](http://stackoverflow.com/questions/22339240/what-algorithms-are-used-in-c11-stdsort-in-different-stl-implementations) in the question. – TemplateRex Jul 09 '14 at 21:24
  • @TemplateRex Allow me to make an attempt without the raw loops. I'll add it as an answer, you can edit it in as you please. – Captain Giraffe Jul 09 '14 at 22:51
  • Worth noting : your version of insertion sort is linearithmic in the case of already sorted data rather than being linear. – Michael Graczyk Jul 14 '14 at 15:32
  • Considering there is nothing specific to C++11 in the meat of most of your answers, you might as well stop using `auto` in most places, that way people can use your code with C++03 too. – user541686 Jul 15 '14 at 08:01
  • @Mehrdad I expanded the Q&A with 4 verions: C++14/C++11/C++98 and Boost/C++98. Main answer is still written in C++14 with some notes on syntactical, library and style issues. You were right that most (except heap sort) of the answer carries through to C++98, albeit at a +100% LOC penalty. C++11/14 are really a car, not a faster horse! ;-) But thanks again for driving these points home! – TemplateRex Jul 15 '14 at 19:45
  • @TemplateRex under which license is this code? cc by-sa 3.0 with attribution ? – gnzlbg Aug 07 '14 at 11:52
  • @TemplateRex could you provide code for the optimized versions (without asserts and with the proposed optimizations)? In particular note that for the early test optimizations using `std::distance` is not efficient since for Forward and Bidirectional iterators computing the whole distance is O(N) and not required (the only thing required is to know if the distance is >= 1). – gnzlbg Aug 07 '14 at 13:16
  • 3
    @gnzlbg The asserts you can comment out, of course. The early test can be tag-dispatched per iterator category, with the current version for random access, and `if (first == last || std::next(first) == last)`. I might update that later. Implementing the stuff in the "omitted details" sections is beyond the scope of the question, IMO, because they contain links to entire Q&As themselves. Implementing real-word sorting routines is hard! – TemplateRex Aug 07 '14 at 13:27
  • 3
    Great post. Though, you've cheated with your quicksort by using `nth_element` in my opinion. `nth_element` does half a quicksort already (including the partitioning step and a recursion on the half that includes the n-th element you're interested in). – sellibitze Aug 07 '14 at 16:10
  • @sellibitze tnx, although `nth_element` does not sort *within* the two half segments, it only sorts *between* the elements smaller/greater than the middle. In that sense, it is similar to `partition` which separates on a predicate.Both are `O(N)`. – TemplateRex Aug 07 '14 at 16:48
  • And where are non-comparing algorithms like counting or bucket sort? – enedil Jan 14 '15 at 14:26
  • 1
    @enedil I decided to limit the scope of my answer to comparison based sorting, mainly because the ones you mention are a bit more advanced. Feel free to add your own answer in Modern C++ style for these other interesting algorithms. – TemplateRex Jan 14 '15 at 20:06
  • For the `selection_sort`, you mention you can iterate to `prev(last)` for bidirectional iterators, but in truth it can be done trivially with forward iterators as well: `; std::next(it) != last; ` – Mooing Duck Mar 26 '15 at 17:31
  • 1
    @MooingDuck true, but the `next(it)` would add an extra increment per iteration. For bidir, you can factor the `prev(last)` out of the loop. – TemplateRex Mar 26 '15 at 21:32
  • For all I can tell, your *merge sort* is the only one that requires the value type to be *MoveConstructible*. That requirement results from `inplace_merge` using of a temporary buffer; EoP describes a way to merge w/o constructing any new elements (subdivide each range using the same pivot, rotate the two middle ranges, recur on each original range). – dyp Sep 02 '15 at 13:11
  • @dyp not sure I follow: AFAICS std::sort requires its iterators to be ValueSwappable, i.e. dereferenced iterators are Swappable, and Swappable implies MoveConstructible. Similar for std::rotate used in InsertionSort. – TemplateRex Sep 02 '15 at 20:02
  • While `std::swap` requires MoveConstructible, I don't see how/why Swappable should require MoveConstructible. Anyway, it's a good point: if a type is swappable it is probably also MoveConstructible. – dyp Sep 02 '15 at 20:08
  • @dyp I think the chain of reasoning goes like: Swappable requires in [swappable.requirements]/2 calls to `std::swap` to be valid, and `std::swap` requires in [utility.swap]/2 its arguments to be MoveDestructible. Perhaps it would be good if the Standard contained a note that Swappable implies MoveConstructible and MoveAssignable. – TemplateRex Sep 03 '15 at 06:12
  • I don't think `std::swap` needs to be valid. As far as I understand [swappable.requirements], `using std::swap; swap(t, u); swap(u, t);` needs to be valid. – dyp Sep 03 '15 at 08:33
  • @dyp you are right, [N3048](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3048.html) removed the MoveConstructible requrements on Swappable, mainly to deal with proxies. Nevertheless, it appears that most mutating sequence algos I use in this Q&A such as `rotate`, `inplace_merge` and `make_heap` (and indeed all `std::sort`, `nth_element` and related algos) all require that the iterator value types are MoveConstructible. So even a rotate based merge sort would still require MoveConstructible, see [alg.rotate]/4. – TemplateRex Sep 03 '15 at 11:25
  • Oh, thanks. I didn't realize this had been removed. I'm a bit surprised that `rotate` requires MoveConstructible, though. – dyp Sep 03 '15 at 12:31
  • Hmm, that `nth_element` for quick sort guarantees perfect pivoting for the sort itself. So you essentially sidestep all the complex issues with pivoting by punting them to `nth_element`'s implementation. – T.C. Oct 14 '15 at 05:09
  • But you are not partitioning based on the element in the middle; you are partitioning on the actual median, which makes the discussion of "organ pipe" inputs misleading (unless that input happens to be the worst case for the algorithm used by `nth_element`, which I doubt.) – T.C. Oct 14 '15 at 20:26
  • But your code is not using "the middle element" (`*pivot` prior to the `nth_element` call) as the pivot - that would be using something like `std::partition`; it's using the actual median of the sequence. – T.C. Oct 14 '15 at 21:07
  • @T.C. updated the quick_sort section to reflect your comments. Thanks! – TemplateRex Oct 15 '15 at 21:18
  • 1
    It would be cool to make a note about quicksort not being able to sort move-only types as impemented here because of the pivot copy. It might be worth adding a note since `std::sort` works with move-only types. – Morwenn Oct 24 '15 at 18:12
  • @Morwenn Thanks for pointing that out. For MoveConstructible types, I would have to do something like: swap the pivot element to the end, do the first partition, swapping the pivot back into the middle before the second partition and the recursion. Especially with forward iterators, finding the locations to do the swapping is annoying. – TemplateRex Oct 24 '15 at 20:10
  • @Morwenn I'll keep this in mind until there are enough further comments to warrant another edit (I don't want to do too many updates, so as not to appear to be trying to get on top of the Active Questions list for rep purposes). – TemplateRex Oct 24 '15 at 20:13
  • "for random access iterators, a guaranteed O(N log N) complexity can be achieved through median pivot selection using std::nth_element(first, middle, last)" is not true: `std::nth_element` is only O(N) in the *average case*, but the standard puts no bounds on the *worst case*. –  Nov 27 '16 at 00:27
  • I just checked: `std::nth_element` is O(N log N) worst case in libstdc++ and O(N^2) in the Microsoft implementation. –  Nov 27 '16 at 01:06
  • So a quicksort calling `std::nth_element` definitely isn't *guaranteed* O(N log N), even if we ignore the standard's requirements and focus on practical implementations. –  Nov 27 '16 at 01:12
  • 1
    @Fanael you are right that `nth_element` is only `O(N)` on average according to the Standard. Pre-C++11, `std::sort` was only guaranteed to be `O(N log N)` on average. [It appears](https://www.reddit.com/r/cpp/comments/5n59dn/know_your_algorithms_on_sets_it_really_makes_a/dc9chwp/) that [Introselect](https://en.wikipedia.org/wiki/Introselect) made not only a guaranteed `O(N log N)` possible for `std::sort`, but it could also be used for a guaranteed `O(N)` `nth_element` as well. It might appear in a future Standard, or else vendor competition might make this a quality of implementation issue. – TemplateRex Jan 12 '17 at 20:29
  • The second `std::partition()` in `quick_sort()` seems entirely pointless to me. Also, the idea in insertion sort is *not* to first determine the place where the insertion element should end up and then move the data: the idea is to determine this place *while* moving the data, avoiding unnecessary loads of data into the CPU. – cmaster - reinstate monica Jan 18 '17 at 14:53
  • 1
    Merge sort is not the only stable O(n log n) sort. Moreover, there exist at least two implementations of *inplace* stable O(n log n) sorts: [Grail sort](https://github.com/Mrrl/GrailSort) and [WikiSort](https://github.com/BonzaiThePenguin/WikiSort) – Ilya Popov Jun 06 '17 at 23:06
  • 1
    This implementation of merge_sort has the unfortunate effect that each call to `std::inplace_merge` will allocate and deallocate its temporary buffer, rather than reusing that buffer between calls. – David Stone Oct 09 '17 at 14:48
  • 2
    @DavidStone You are right of course. This Q&A is not meant to be the definite guide to write real life optimized sort routines, but rather to show how to combine basic building blocks. See e.g. the [cpp-sort](https://github.com/Morwenn/cpp-sort) library on how much extra details require careful attention in real life :) – TemplateRex Oct 21 '17 at 09:10
  • Note that using `std::less<>` has the unfortunate consequence of not behaving well with pointers that aren't from the same array. To get around this, you may still want to use `std::less::value_type>` . See [this question](https://stackoverflow.com/q/21079958/1896169) – Justin Oct 27 '17 at 18:53
  • There is an infinite number of stable `O(N log N)` sorting algorithms, not just merge-sort. In fact, merge sort isn't even that fast on a whole range of inputs -- natural merge sort is `O(N)` on sorted output and uses `O(1)` extra space in the same case. For a given pattern of input it *easy* to make a stable sorting algorithm that will run faster than `O(N log N)` (still, `O(N log N)` on average though). – Clearer Apr 05 '18 at 09:32
  • 1
    "The latter algorithm is implemented by std::list::sort in the Standard Library." std::list has it's own merge function, which in turn uses std::list::splice. Visual Studio prior to 2015 was bottom up iterative, using a small array of lists. VS2015+ switched to using iterators (for start and end of runs, to avoid potential allocation issues related to the array of lists) and also to top down recursive, but bottom up could have been implemented using a small array of iterators (to start of runs) as well. VS2015+ std::list:sort has no calls to merge, just calls to splice(). – rcgldr Aug 15 '19 at 00:52
15

Another small and rather elegant one originally found on code review. I thought it was worth sharing.

Counting sort

While it is rather specialized, counting sort is a simple integer sorting algorithm and can often be really fast provided the values of the integers to sort are not too far apart. It's probably ideal if one ever needs to sort a collection of one million integers known to be between 0 and 100 for example.

To implement a very simple counting sort that works with both signed and unsigned integers, one needs to find the smallest and greatest elements in the collection to sort; their difference will tell the size of the array of counts to allocate. Then, a second pass through the collection is done to count the number of occurrences of every element. Finally, we write back the required number of every integer back to the original collection.

template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
{
    if (first == last || std::next(first) == last) return;

    auto minmax = std::minmax_element(first, last);  // avoid if possible.
    auto min = *minmax.first;
    auto max = *minmax.second;
    if (min == max) return;

    using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
    std::vector<difference_type> counts(max - min + 1, 0);

    for (auto it = first ; it != last ; ++it) {
        ++counts[*it - min];
    }

    for (auto count: counts) {
        first = std::fill_n(first, count, min++);
    }
}

While it is only useful when the range of the integers to sort is known to be small (generally not larger than the size of the collection to sort), making counting sort more generic would make it slower for its best cases. If the range is not known to be small, another algorithm such a radix sort, ska_sort or spreadsort can be used instead.

Details omitted:

  • We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first std::minmax_element pass through the collection. This will make the algorithm even faster when a usefully-small range limit is known by other means. (It doesn't have to be exact; passing a constant 0 to 100 is still much better than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once).

  • Growing counts on the fly is another way to avoid a separate first pass. Doubling the counts size each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a new max is easy with std::vector::resize to add new zeroed elements. Changing min on the fly and inserting new zeroed elements at the front can be done with std::copy_backward after growing the vector. Then std::fill to zero the new elements.

  • The counts increment loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arrays to reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number.

  • In the algorithm above, we use a min == max check to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader.

  • Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with std::enable_if_t might be preferred.

  • While modern C++ is cool, future C++ could be even cooler: structured bindings and some parts of the Ranges TS would make the algorithm even cleaner.

Peter Cordes
  • 245,674
  • 35
  • 423
  • 606
Morwenn
  • 19,202
  • 10
  • 89
  • 142
  • @TemplateRex If it was able to take an arbitrary comparison object, it would make counting sort a comparison sort, and comparison sorts can't have a better worst case than O(n log n). Counting sort has a worst case of O(n + r), which means that it can't be a comparison sort anyway. Integers *can* be compared but this property isn't used to perform the sort (it is only used in the `std::minmax_element` which only collects information). The property used is the fact that integers can be used as indices or offsets, and that they are incrementable while preserving the latter property. – Morwenn May 09 '16 at 10:11
  • Ranges TS is indeed very nice, e.g. the final loop can be over `counts | ranges::view::filter([](auto c) { return c != 0; })` so that you don't have to repeatedly test for nonzero counts inside the `fill_n`. – TemplateRex May 09 '16 at 12:14
  • (I found typos in `small` _an_ `rather` and `appart` - may I keep them til the edit concerning reggae_sort?) – greybeard Jan 17 '17 at 23:23
  • @greybeard You may do whatever you want to :p – Morwenn Jan 17 '17 at 23:25
  • I suspect that growing the `counts[]` on the fly would be a win vs. traversing the input with `minmax_element` before the histogramming. Especially for the use-case where this is ideal, of very large input with many repeats in a small range, because you will quickly grow `counts` to its full size, with few branch mispredicts or size-doublings. (Of course, knowing a small-enough bound on the range will let you avoid a `minmax_element` scan *and* avoid bounds-checking inside the histogram loop.) – Peter Cordes Oct 05 '17 at 07:42
  • For inputs where there are a lot of consecutive repeats of the same number, unrolling over multiple `counts` arrays and summing at the end can be a win on real CPUs. It hides the latency of store/reload to the same bin repeatedly. (This is a somewhat well known histogram optimization for small numbers of bins.) – Peter Cordes Oct 05 '17 at 07:46
  • @PeterCordes Sure, that implementation wasn't meant to be a real-world production-ready one, but merely a simple one that could be written with the rules given by TemplateRex. Feel free to edit and add notes to "Details omitted" if you feel that readers might gain from them. – Morwenn Oct 05 '17 at 07:53
  • Done. I think growing `counts` on the fly is always the way to go. It's the same total work (modulo a bit of copying of counts), but fused into one loop. It's even better for the cases where counting sort is good, and maybe slightly worse for the cases where counting sort is bad (gigantic ranges that require massive copying with a new min). – Peter Cordes Oct 05 '17 at 08:33