The answer provided by Anonymous is the better solution in this case because we know that the individual arrays are sorted.
You can do it with a heap in O(n log k) time, worst case. It will require O(k) extra space.
initialize a MAX heap
for each array
for each item in the array
if (heap.count < k)
heap.insert(item)
else if (item < heap.peek())
{
// item is smaller than the largest item on the heap
// remove the smallest item and replace with this one
heap.remove_root()
heap.insert(item)
}
else
{
break; // go to next array
// see remarks below
}
Because you know that the arrays are initially sorted, you can include that final optimization I showed. If the item you're looking at is not smaller than the largest item already on the heap, then you know that no other item in the current array will be smaller. So you can skip the rest of the current array.
That's the algorithm to give you the smallest k
items. If you want the largest k
items, build a MIN heap and change if (item < heap.peek())
to if (item > heap.peek())
. In that case, you would get better performance by walking the arrays backwards. That would reduce the number of heap insertions and removals. If you don't walk the arrays backwards, you won't be able to use the optimization I showed.
Another way to do it would be to concatenate all of the items into a single array and use Quickselect. QuickSelect is an O(n) algorithm. Empirical evidence suggests that using a heap is faster when k < .01*n
. Otherwise, Quickselect is faster. Your mileage may vary, of course, and having to create a single array from the multiple arrays will add processing and memory overhead to Quickselect.