Why bother with comparison sorts?

Question

Algorithms like Timsort, Quicksort & Mergesort dominate the "real world" sorting methods. The case for these comparison sorts is quite practical — they've been shown to be the most performant, stable, multipurpose sorting algorithms in a wide variety of environments.

However, it seems like nearly everything that we would sort on a computer are countable / partially ordered. Numbers, characters, strings, even functions are amenable to some meaningful non-comparison sorting method. A candidate here is Radix sort. In general it will behave faster than O(n*log(n)), beating the theoretical comparison sort limit of n * log(n) by a wide margin in many cases with a complexity of O(K*n) -- K being the number of bits that are required to represent a particular item.

What gives?

+1 I've also been wondering why seemingly no (standard or effectively-standard) library offers something like a clever radix sort variant. Note though: (1) "The number of bits required to represent a particular item" is order `log n` for n distinct items -- that's not really why radix sort can beat comparison based sorts. (2) Introsort is not the only widely-used sorting algorithm, at least two popular standard libraries use Timsort. — , Nov 01 '12 at 22:10
Quicksort isn't by necessity stable, but that's an implementation detail. Quicksort can behave in a stable fashion. — ŹV -, Nov 01 '12 at 22:23
What do you mean with "can behave in a stable fashion"? How would you code a stable quicksort? — Daniel Fischer, Nov 01 '12 at 22:31
@DanielFischer: you can add *as a tie-breaker* a final comparison based on pointer value. (which in C is valid since both pointers should point to elements of the same array-object. But that assumes of course a comparison function, or equivalent) Update: I am confusing quicksort and qsort(). — wildplasser, Nov 01 '12 at 22:52

score 8 · Answer 1 · answered Nov 01 '12 at 22:18

8

Comparison sorts are based on a really nice abstraction: all you need is a way to compare two elements. Then, depending on your language, with templates (c++), interfaces (java), typeclasses (haskell), function objects (javascript) etc.. you can sort containers which can hold arbitrary types, the only thing you need is implement the comparison.

How would you implement Radix sort for arbitrary types? :)

answered Nov 01 '12 at 22:18

Karoly Horvath

88,860
11
107
169

You certainly can't, but I think it stands to reason that sorting arbitrary types without a preexisting representation of ordering (natural, lexicographical, etc.) that is immediately obvious occurs virtually never. – ŹV - Nov 01 '12 at 22:22
3

@ZephyrPellerin I don't want to code a radix-sort like algorithm for every object I use. Comparison based algorithms are nice because the implementation does not depend on the objects to be sorted; so you can code a generic quicksort function (or use one from your language library) and feed it with a comparator to be used for sorting. That's the purpose of an **abstraction**. – Haile Nov 01 '12 at 22:39
@Haile Correct me if I'm mistaken, I have never implemented a radix sort. But AFAIK one only needs a function from items to be sorted to keys (integers), and can then reuse the radix sort by running it on the keys. – Nov 01 '12 at 22:44
more like a set of keys, as a plain integer in most languages clearly can't hold all the possible values. – Karoly Horvath Nov 01 '12 at 22:49
@delnan "one **only** needs a function from items to be sorted to keys (integers)" It doesn't seem trivial at all to me, in the case of arbitrary objects. To write a comparator is much simpler! – Haile Nov 01 '12 at 22:55
@Haile I'm not saying it's trivial (though a comparator isn't trivial either), I'm playing devil's advocate. And I'm pretty sure it could be made easier by providing helper functions (e.g. recursively converting items of a sequence and concatenating their keys, and predefined mappings for builtin types). – Nov 01 '12 at 22:58
1

Can you please provide an example where you have implemented a comparison function without using an attribute or function that could have readily been provided to a Radix sort algorithm ? i.e. object.age, object.size, even object.name etc. – ŹV - Nov 01 '12 at 23:36
First of all a small note: passing a generic type to the sort can be problematic in some statically typed languages, you probably want to return an integer (a slice of the bits), and feed it with these chunks. So that's your basic interface. – Karoly Horvath Nov 02 '12 at 12:41
You're right, I could do this, but then it becomes quite ugly when you have a compound key, eg sorting first by name then age and the by size. With comparison, generating this custom comparator is quite straigthforward. I suggest you create a generic Radix sort interface in C++ and then use this interface and the normal comparison based interface and generate the client code which does compund key sorting. My guess is one of them is going to be a one-liner, the other one probably dozens. Personally I rarely had problems with sort performance, and tend to optimize more important things.. – Karoly Horvath Nov 02 '12 at 12:43

score 6 · Accepted Answer · answered Nov 01 '12 at 22:24

6

The speed of radix sort depends on the length of the key. If you have long keys like strings, radix sort may be very slow.

Further, for sorting only a few items the initialization costs may outweight the actual sorting by a magnitude.

For instance if you sort 32 bit integers by using a 8 bit radix you need to initialize at least 4 times the list of 256 buckets - if you only have 20 or so items to sort this and the 80 swaps will be far slower than the about ~200 comparisons/swaps a quicksort needs.

If you sort anything longer, like strings, you have for each character of the longest string a bucket initialization - this may be even worse.

answered Nov 01 '12 at 22:24

Gunther Piez

28,058
6
62
101

Are these fundamental problems with radix sort, or just with naive implementations? I lean towards the latter (which would make your point moot), but I'm no expert. – Nov 01 '12 at 22:41
Most of the points about initialization can be easily fixed with a decent implemention (by doing some other kind of sort for reasonably small sizes). But the problem of long keys is a pretty fundamental problem for radix sort (think of all your string keys having the same long prefix). – Keith Randall Nov 01 '12 at 22:46
3

@KeithRandall Long prefixes are just as much of a problem for comparisons. – Nov 01 '12 at 22:59
2

*"The speed of radix sort depends on the length of the key."*... and the speed of comparison sorts doesn't? – user541686 Nov 11 '13 at 08:20

score 1 · Answer 3 · answered Nov 01 '12 at 22:24

1

Radix sort it's useful only for sorting objects with integer keys, and from a practical performance point of view it depends heavily on the length of the keys. For the general case of sorting arbitrary objects, this won't be enough - hence the necessity for comparison-based sorting.

answered Nov 01 '12 at 22:24

Óscar López

215,818
33
288
367

1

Can you give an example? American flag sort is, in many cases, faster than quicksort for lexicographical ordering of strings. – ŹV - Nov 01 '12 at 22:26

Why bother with comparison sorts?

3 Answers3

Linked

Related