1

I have n arrays. Each of these array can be of infinite length. (length can be variable). All these n arrays are sorted.

now I want to fetch out top k smallest elements out of these n sorted arrays.

For example n=5 and k=10

2 4 6 7 9 

23 45 67 78 99

1 2 6 9 1000 4567 6567 67876

45 56 67 78 89 102 103 104

91 991 9991 99991 

now answer should be 1 2 4 6 7 9 23 45 56 67

Would it be O(n*k) i.e O(n^2) in worst case, and O(k) in best case?

einpoklum
  • 86,754
  • 39
  • 223
  • 453
user609306
  • 1,227
  • 6
  • 16
  • 26

2 Answers2

8

It's O(n + k.log(n)) I think.

First build a heap of the smallest element in each array (storing the index of the array too). Building a heap of size n is O(n). Then, repeat k times: take an element from the heap (which is O(log n)), and insert the next smallest element from the array the element you took was from (also O(log n)). Overall, this is O(n + k.log(n)).

Paul Hankin
  • 44,768
  • 11
  • 79
  • 97
3

The answer provided by Anonymous is the better solution in this case because we know that the individual arrays are sorted.

You can do it with a heap in O(n log k) time, worst case. It will require O(k) extra space.

initialize a MAX heap
for each array
    for each item in the array
        if (heap.count < k)
            heap.insert(item)
        else if (item < heap.peek())
        {
            // item is smaller than the largest item on the heap
            // remove the smallest item and replace with this one
            heap.remove_root()
            heap.insert(item)
        }
        else
        {
            break;  // go to next array
            // see remarks below
        }

Because you know that the arrays are initially sorted, you can include that final optimization I showed. If the item you're looking at is not smaller than the largest item already on the heap, then you know that no other item in the current array will be smaller. So you can skip the rest of the current array.

That's the algorithm to give you the smallest k items. If you want the largest k items, build a MIN heap and change if (item < heap.peek()) to if (item > heap.peek()). In that case, you would get better performance by walking the arrays backwards. That would reduce the number of heap insertions and removals. If you don't walk the arrays backwards, you won't be able to use the optimization I showed.

Another way to do it would be to concatenate all of the items into a single array and use Quickselect. QuickSelect is an O(n) algorithm. Empirical evidence suggests that using a heap is faster when k < .01*n. Otherwise, Quickselect is faster. Your mileage may vary, of course, and having to create a single array from the multiple arrays will add processing and memory overhead to Quickselect.

Jim Mischel
  • 122,159
  • 16
  • 161
  • 305