1

(I got this as an interview question and would love some help with it.)

You have k sorted lists containing n different numbers in total.
Show how to create a single sorted list containing all the element from the k lists in O(n * log(k))

user218867
  • 16,252
  • 12
  • 112
  • 156
  • 2
    Look up [merge sort](https://en.wikipedia.org/wiki/Merge_sort). – Amadan Feb 14 '19 at 13:06
  • What programming language? – Dave Feb 14 '19 at 13:06
  • This is still not totally identical to merge sort, so I guess it's worth asking. – user218867 Feb 14 '19 at 13:14
  • Before asking a question you should try using the [search bar](https://stackoverflow.com/questions/5055909/algorithm-for-n-way-merge) or [google](https://www.geeksforgeeks.org/merge-k-sorted-arrays/)? – SaiBot Feb 14 '19 at 13:18
  • 1
    Possible duplicate of [Algorithm for N-way merge](https://stackoverflow.com/questions/5055909/algorithm-for-n-way-merge) – SaiBot Feb 14 '19 at 13:21

3 Answers3

4

The idea is to use a min heap of size k.

Push all the k lists on the heap (one heap-entry per list), keyed by their minimum (i.e. first) value

Then repeatedly do this:

  1. Extract the top list (having the minimal key) from the heap
  2. Extract the minimum value from that list and push it on the result list
  3. Push the shortened list back (if it is not empty) on the heap, now keyed by its new minimum value

Repeat until all values have been pushed on the result list.

The initial step will have a time complexity of O(klogk).

The 3 steps above will be repeated n times. At each iteration the cost of each is:

  1. O(1)
  2. O(1) if the extraction is implemented using a pointer/index (not shifting all values in the list)
  3. O(log k) as the heap size is never greater than k

So the resulting complexity is O(nlogk) (as k < n, the initial step is not significant).

trincot
  • 211,288
  • 25
  • 175
  • 211
  • Or put tuples `(element, index, number-of-list)` onto the heap, and after popping push the next element from the same list. – tobias_k Feb 14 '19 at 13:43
  • Another solution would be maintain the list in a hashtable, keyed by first value, initially extract one element from each list and put on to heap. On extract from heap, remove that list from map, then if the array is not empty, put it back to map again with new key (the new first element in the array). – user218867 Feb 14 '19 at 13:43
1

As the question is stated, there's no need for a k-way merge (or a heap). A standard 2 way merge used repeatedly to merge pairs of lists, in any order, until a single sorted list is produced will also have time complexity O(n log(k)). If the question had instead asked how to merge k lists in a single pass, then a k-way merge would be needed.

Consider the case for k == 32, and to simplify the math, assume all lists are merged in order so that each merge pass merges all n elements. After the first pass, there are k/2 lists, after the 2nd pass, k/4 lists, after log2(k) = 5 passes, all k (32) lists are merged into a single sorted list. Other than simplifying the math, the order in which lists are merged doesn't matter, the time complexity remains the same at O(n log2(k)).

Using a k-way merge is normally only advantageous when merging data using an external device, such as one or more disk drives (or classic usage tape drives), where the I/O time is great enough that heap overhead can be ignored. For a ram based merge / merge sort, the total number of operations is about the same for a 2-way merge / merge sort or a k-way merge / merge sort. On a processor with 16 registers, most of them used as indexes or pointers, an optimized (no heap) 4-way merge (using 8 of the registers as indexes or pointers to current and ending location of each run) can be a bit faster than a 2-way merge due to being more cache friendly.

rcgldr
  • 23,179
  • 3
  • 24
  • 50
0

When N=2, you merge the two lists by iteratively popping the front of the list which is the smallest. In a way, you create a virtual list that supports a pop_front operation implemented as:

pop_front(a, b): return if front(a) <= front(b) then pop_front(a) else pop_front(b)

You can very well arrange a tree-like merging scheme where such virtual lists are merged in pairs:

pop_front(a, b, c, d): return if front(a, b) <= front(c, d) then pop_front(a, b) else pop_front(c, d)

Every pop will involve every level in the tree once, leading to a cost O(Log k) per pop.


The above reasoning is wrong because it doesn't account for the front operations, that involves the comparison between two elements, which will cascade and finally require a total of k-1 comparisons per output element.

This can be circumvented by "memoizing" the front element, i.e. keeping it next to the two lists after a comparison has been made. Then, when an element is popped, this front element is updated.

This directly leads to the binary min-heap device, as suggested by @trincot.

    5 7 32 21
  5
    6 4 8 23 40
2
    7 7 20 53
  2
    2 4 6 8 10
Yves Daoust
  • 48,767
  • 8
  • 39
  • 84