Is there an efficient way to iterate over an unsorted container in a specific order without sorting/copying/referencing the original container?

Question

What I have in mind is a SortedIterator, which accepts a Less function that would be used to sort the container with all the known algorithms.

The brute force implementation would of course either keep a copy of the original elements, or keep references/pointers to the elements in the original list.

Is there an efficient way to iterate in a well-defined order without actually sorting the list? I'm asking this out of algorithmic curiosity, and I expect the answer to be no (or yes with a big but). It is asked out of a C++ style mindset, but is in fact a quite general language-agnostic premise.

The complete iteration through all items in sorted order can't be performed in under O(n log n) time, since that would enable you to sort the input in under O(n log n) time using only comparisons, which violates a known lower bound in this model of computation. — j_random_hacker, May 08 '18 at 09:30
For a solution using O(1) space, you could keep a single pointer to the most recently output element. Then each subsequent visit scans the entire list of elements, looking for the smallest element that is greater than this element. Producing each element thus takes O(n) time, for O(n^2) time overall to output the complete list in sorted order. — j_random_hacker, May 08 '18 at 09:34
When you say "container" I assume you mean a dynamic array (`std::vector`)? Because I guess the answer would probably be quite different for hash tables, heaps, linked lists... — jdehesa, May 08 '18 at 10:06
@jdehesa yeah, an unstructured list of elements. Something `Iterable`. — rubenvb, May 08 '18 at 10:10

Sorin · Accepted Answer · 2018-05-08T14:38:54.520

2

If you want O(1) memory the O(n^2) complexity is the only way to do it that we know of. Otherwise we could improve selection-sort algorithm the same way. Any other sorting mechanism relies on being able to restructure part of array(merge sort relies on sorting parts of the array, qsort relies on splitting the array based on the pivot and so on).

Now if you relax the memory constrain you can do something a bit more efficient. For example you could store a heap to contain the lowest values x elements. So after one pass O(Nlog x) you get x elements for your iterator. For the next pass restrict only to elements greater than the last element you've emitted so far. You'll need to do N/x passes to get all. If x ==1 than the solution is O(N^2). If x == N the solution is O(Nlog N) (but with larger constant than the typical qsort). If the data is on disk then I would set x to about as much ram as you can, minus a few MB to be able to read large chunks for drive.

edited May 08 '18 at 14:38

answered May 08 '18 at 11:26

Sorin

11,270
17
23

Your heap strategy gives a nice set of tradeoffs. Another possibility is to get O(n^1.5 log(n)) total time with O(sqrt(n)) memory usage by setting x = sqrt(n) to get n/sqrt(n) = sqrt(n) passes, each costing O(n log sqrt(n)) = O(n log n) time. But I think it's better to say that "nobody knows how to do better (since if someone did, they would have written a better version of selection sort)" than that it's *impossible*. TTBOMK, we don't know that for certain. – j_random_hacker May 08 '18 at 14:29
@j_random_hacker Thanks, rephrased. – Sorin May 08 '18 at 14:39
1

Another possibility is to use O(n) additional memory into which you can create a heap in O(n) using the build-heap method. Then, your iterator is simply O(log n) for each element to remove. – Jim Mischel May 08 '18 at 18:04
@JimMischel If you can use O(n) additional memory you can just run a sort algorithm. Both will be O(NlogN) overall, but the constant tends to be larger for heaps (will be slightly slower). – Sorin May 11 '18 at 14:43
Whereas that's true, the sort has to complete before the first item can be returned. The heap solution speeds returning the first item, although you're right that in general the overall time will be higher. Perhaps better if you're not going to iterate over the entire collection. – Jim Mischel May 11 '18 at 16:31
@JimMischel I don't think even that is right, but I didn't do any benchmarks. To get the first item, you need to insert every item into the heap and that's O(NlogN), same as the other sort algorithms. Given that heaps are not very cache friendly, I would expect you still get the first result faster with the built-in algorithms. I think quick-sort can be implemented so you get the first element right in O(N). – Sorin May 14 '18 at 13:27
@Sorin: You can build a heap from a list in O(N). See https://stackoverflow.com/q/9755721/56778. If you have a reference for that Quicksort that will return the first element in O(n), I'd sure like to see it. – Jim Mischel May 14 '18 at 17:03
1

@JimMischel Cool, I didn't know that about the heap. For quicksort, assuming you get decent pivots, you'll do a run over the entire array, then you pick the first half to sort and you'll do a run over half of the array, then pick the first quarter and so on. Every time you push the first element into the next section you'll touch. Overall you'll do `n * (1/2^0 + 1/2^1 + 1/2^2 + 1/2^3 +...)` the sum adds up to 2 so you do 2n checks/swaps (O(N)) to get the first element in position. – Sorin May 15 '18 at 08:59

Is there an efficient way to iterate over an unsorted container in a specific order without sorting/copying/referencing the original container?

1 Answers1