4

enter image description hereI am looking for an algorithm to merge multiple sorted sequences, lets say X sorted sequences with n elements, into one sorted sequence in javascript , can you provide some examples?

note: I do not want to use any library. Trying to solve https://icpc.kattis.com/problems/stacking

what will be the minimal number of operations needed to merge sorted arrays, under conditions :

Split: a single stack can be split into two stacks by lifting any top portion of the stack and putting it aside to form a new stack.

Join: two stacks can be joined by putting one on top of the other. This is allowed only if the bottom plate of the top stack is no larger than the top plate of the bottom stack, that is, the joined stack has to be properly ordered.

6 Answers6

6

History

This problem has been solved for more than a century, going back to Hermann Hollerith and punchcards. Huge sets of punchcards, such as those resulting from a census, were sorted by dividing them into batches, sorting each batch, and then merging the sorted batches--the so-called "merge sort". Those tape drives you see spinning in 1950's sci-fi movies were most likely merging multiple sorted tapes onto one.

Algorithm

All the algorithms you need can be found at https://en.wikipedia.org/wiki/Merge_algorithm. Writing this in JS is straightforward. More information is available in the question Algorithm for N-way merge. See also this question, which is an almost exact duplicate, although I'm not sure any of the answers are very good.

The naive concat-and-resort approach does not even qualify as an answer to the problem. The somewhat naive take-the-next-minimum-value-from-any-input approach is much better, but not optimal, because it takes more time than necessary to find the next input to take a value from. That is why the best solution using something called a "min-heap" or a "priority queue".

Simple JS solution

Here's a real simple version, which I make no claim to be optimized, other than in the sense of being able to see what it is doing:

const data = [[1, 3, 5], [2, 4]];    

// Merge an array or pre-sorted arrays, based on the given sort criteria.
function merge(arrays, sortFunc) {
  let result = [], next;
   
  // Add an 'index' property to each array to keep track of where we are in it.
  arrays.forEach(array => array.index = 0);
 
  // Find the next array to pull from.
  // Just sort the list of arrays by their current value and take the first one.     
  function findNext() {
    return arrays.filter(array => array.index < array.length)
      .sort((a, b) => sortFunc(a[a.index], b[b.index]))[0];
  }

  // This is the heart of the algorithm.
  while (next = findNext()) result.push(next[next.index++]);

  return result;
}

function arithAscending(a, b) { return a - b; }

console.log(merge(data, arithAscending));

The above code maintains an index property on each input array to remember where we are. The simplistic alternative would be to shift the element from the front of each array when it is its turn to be merged, but that would be rather inefficient.

Optimizing finding the next array to pull from

This naive implementation of findNext, to find the array to pull the next value from, simply sorts the list of inputs by the first element, and takes the first array in the result. You can optimize this by using a "min-heap" to manage the arrays in sorted order, which removes the need to resort them each time. A min-heap is a tree, consisting of nodes, where each node contains a value which is the minimum of all values below, with left and right nodes giving additional (greater) values, and so on. You can find information on a JS implementation of a min-heap here.

A generator solution

It might be slightly cleaner to write this as a generator which takes a list of iterables as inputs, which includes arrays.

// Test data.
const data = [[1, 3, 5], [2, 4]];

// Merge an array or pre-sorted arrays, based on the given sort criteria.
function* merge(iterables, sortFunc) {
  let next;

  // Create iterators, with "result" property to hold most recent result.
  const iterators = iterables.map(iterable => {
    const iterator = iterable[Symbol.iterator]();
    iterator.result = iterator.next();
    return iterator;
  });

  // Find the next iterator whose value to use.
  function findNext() {
    return iterators
      .filter(iterator => !iterator.result.done)
      .reduce((ret, cur) => !ret || cur.result.value < ret.result.value ? cur : ret, 
         null);
  }

  // This is the heart of the algorithm.
  while (next = findNext()) {
    yield next.result.value;
    next.result = next.next();
  }
}

function arithAscending(a, b) { return a - b; }

console.log(Array.from(merge(data, arithAscending)));
Community
  • 1
  • 1
  • torazaburo, Please provide algorithm for given conditions. – Prakash Barnwal Jan 15 '17 at 05:39
  • @torazaburo This does the job as i have understood the question but it seems not so efficient as you suggest. I have done some tests and merging 1000 random sorted arrays each in random length between 5-10 items took like 1700ms while in my reducing by merging one by one algo took only 110ms for the same amount of data. May be i am doing something wrong. Could you please check [min-heap](https://repl.it/FJjl) vs [dynamical](https://repl.it/FJjn) – Redu Jan 15 '17 at 08:35
5

The naive approach is concatenating all the k sequences, and sort the result. But if each sequence has n elements, the the cost will be O(k*n*log(k*n)). Too much!

Instead, you can use a priority queue or heap. Like this:

var sorted = [];
var pq = new MinPriorityQueue(function(a, b) {
  return a.number < b.number;
});
var indices = new Array(k).fill(0);
for (var i=0; i<k; ++i) if (sequences[i].length > 0) {
  pq.insert({number: sequences[i][0], sequence: i});
}
while (!pq.empty()) {
  var min = pq.findAndDeleteMin();
  sorted.push(min.number);
  ++indices[min.sequence];
  if (indices[min.sequence] < sequences[i].length) pq.insert({
    number: sequences[i][indices[min.sequence]],
    sequence: min.sequence
  });
}

The priority queue only contains at most k elements simultaneously, one for each sequence. You keep extracting the minimum one, and inserting the following element in that sequence.

With this, the cost will be:

  • k*n insertions to a heap of k elements: O(k*n)
  • k*n deletions in a heap of k elements: O(k*n*log(k))
  • Various constant operations for each number: O(k*n)

So only O(k*n*log(k))

Oriol
  • 225,583
  • 46
  • 371
  • 457
0

Just add them into one big array and sort it.

You could use a heap, add the first element of each sequence to it, pop the lowest one (that's your first merged element), add the next element from the sequence of the popped element and continue until all sequences are over.

It's much easier to just add them into one big array and sort it, though.

zmbq
  • 35,452
  • 13
  • 80
  • 153
  • 4
    This misses the whole point, which is to take advantage of the fact that the inputs are pre-sorted. –  Jan 15 '17 at 05:10
  • Actually, it doesn't. I gave you the merge algorithm. Implementing it, however, can be a waste of your time, because it'll take more than a couple of lines of code. Are you sure there's a performance issue there? Start with the simple solution. If it's too slow, implement something more complicated. – zmbq Jan 15 '17 at 07:19
0

es6 syntax:

function mergeAndSort(arrays) {
    return [].concat(...arrays).sort()
}

function receives array of arrays to merge and sort.

*EDIT: as cought by @Redu, above code is incorrect. Default sort() if sorting function is not provided, is string Unicode. Fixed (and slower) code is:

function mergeAndSort(arrays) {
    return [].concat(...arrays).sort((a,b)=>a-b)
}
Mirko Vukušić
  • 1,369
  • 8
  • 12
  • Hi Mirko, thanks but I need algorithm, I am trying to solve https://icpc.kattis.com/problems/stacking – Prakash Barnwal Jan 14 '17 at 22:42
  • This approach won't make any use of the precious nature of the arrays being already sorted. – Redu Jan 15 '17 at 00:56
  • 1
    @Redu, and how would you use the fact that two arrays are already sorted, when you have to sort the result merge of the two? At this moment it has nothing to do with original question though, because it was changed after this answer. In the original question, there was no mention of kattis.com problem, and before that edit, like others, I thought he needs to merge/sort two or more arrays. – Mirko Vukušić Jan 15 '17 at 10:36
  • @Redu, ufff, what a mistake. Yes, default sorting is by string indeed. – Mirko Vukušić Jan 15 '17 at 21:39
0

This is a beautiful question. Unlike concatenating the arrays and applying a .sort(); a simple dynamical programming approach with .reduce() would yield a result in O(m.n) time complexity. Where m is the number of arrays and n is their average length.

We will handle the arrays one by one. First we will merge the first two arrays and then we will merge the result with the third array and so on.

function mergeSortedArrays(a){
  return a.reduce(function(p,c){
                    var pc = 0,
                        cc = 0,
                       len = p.length < c.length ? p.length : c.length,
                       res = [];
                    while (p[pc] !== undefined && c[cc] !== undefined) p[pc] < c[cc] ? res.push(p[pc++])
                                                                                     : res.push(c[cc++]);
                    return p[pc] === undefined ? res.concat(c.slice(cc))
                                               : res.concat(p.slice(pc));
                  });
}


var sortedArrays = Array(5).fill().map(_ => Array(~~(Math.random()*5)+5).fill().map(_ => ~~(Math.random()*20)).sort((a,b) => a-b));
 sortedComposite = mergeSortedArrays(sortedArrays);

sortedArrays.forEach(a => console.log(JSON.stringify(a)));
console.log(JSON.stringify(sortedComposite));

OK as per @Mirko Vukušić's comparison of this algorithm with .concat() and .sort(), this algorithm is still the fastest solution with FF but not with Chrome. The Chrome .sort() is actually very fast and i can not make sure about it's time complexity. I just needed to tune it up a little for JS performance without touching the essence of the algorithm at all. So now it seems to be faster than FF's concat and sort.

function mergeSortedArrays(a){
  return a.reduce(function(p,c){
                    var pc = 0,
                        pl =p.length,
                        cc = 0,
                        cl = c.length,
                       res = [];
                    while (pc < pl && cc < cl) p[pc] < c[cc] ? res.push(p[pc++])
                                                             : res.push(c[cc++]);
                    if (cc < cl) while (cc < cl) res.push(c[cc++]);
                    else while (pc < pl) res.push(p[pc++]);
                    return res;
                  });
}

function concatAndSort(a){
  return a.reduce((p,c) => p.concat(c))
          .sort((a,b) => a-b);
}


var sortedArrays = Array(5000).fill().map(_ => Array(~~(Math.random()*5)+5).fill().map(_ => ~~(Math.random()*20)).sort((a,b) => a-b));
console.time("merge");
 mergeSorted = mergeSortedArrays(sortedArrays);
console.timeEnd("merge");
console.time("concat");
concatSorted = concatAndSort(sortedArrays);
console.timeEnd("concat");

5000 random sorted arrays of random lengths between 5-10.

Redu
  • 19,106
  • 4
  • 44
  • 59
  • It seems you are just doing `m` merges, each time with some additional `n` elements. That will cost `O(2n + 3n + ... + m*n) = O(m^2 * n)` – Oriol Jan 15 '17 at 03:07
  • Hi redu, what will be the minimal number of operations needed to merge sorted arrays, under given conditions. if i add a counter and increment it in while loop i am not getting expected output. – Prakash Barnwal Jan 15 '17 at 05:29
  • If, as it seems to me, you are doing *n-1* merges, this seems unlikely to be optimal. –  Jan 15 '17 at 05:31
  • If you could do this in `O(m*n)`, then you could use this algorithm to sort `m` random numbers in `O(m)` just by placing each one in a different sequence with `n=1`. Bot sorting `m` elements costs `O(m*log(m))` – Oriol Jan 15 '17 at 05:34
  • This is way way from optimal in any way. Amount of code you need to write, readability of code and at the end performance of this is about 30-40 times slower than one liner simple concat.sort(). Here is jsperf to compare: https://jsperf.com/concat-sort And @Redu you downwote 40times faster and 40 times shorter answer only to offer this? :) – Mirko Vukušić Jan 15 '17 at 10:48
  • @Mirko Vukušić I am not in the habit of down voting anybody regardless of their code. I have just up-voted the question and down voted nobody in this topic. – Redu Jan 15 '17 at 11:47
  • @Redu, Could i ask which sort algo is described here: https://icpc.kattis.com/problems/stacking? Is this parallel merge sort (some variant)? So, if i understand correctly, OP's task is to: 1) apply described algo from the page 2) introduce and return some counter which will return number of operations needed. Thanks. :) – sinisake Jan 15 '17 at 11:55
  • @Mirko Vukušić Your code is not 40 times faster. It's only faster at Chrome and that's because of Chrome's crazy fast `.sort()` algorithm. In FF it's another story. And... once again; in SO you shouldn't accuse people for down voting your answers because here people can down vote answers without owing any explanation to the poster and also, you can accuse somebody totally irrelevant.. just like in this case. – Redu Jan 15 '17 at 17:48
  • @sinisake I still believe my answer is valid for this question. It's sort of a merge sort but the difference is while all the stacks (arrays) are sorted they can be in different length. As far as i remember in merge sort the sub arrays are not sorted but of the same size which would eliminate the second conditional in my answer. – Redu Jan 15 '17 at 17:49
  • @Redu, I appologize for accusing you. It looked logical at the moment but obviosely turned out to be wrong. As for FF / Chrome comparison I'd like to see a difference but only have FF 50 on Linux and interestingly difference there for my test case scenario was even bigger in FF, about 150 times faster. – Mirko Vukušić Jan 15 '17 at 19:09
  • @Redu, Note that I'm aware your code is amazing and I learned a lot from it, I just don't think it fits in this Q because ever since the first post and author's comment (where he posted example of input) it was obvious we deal here with only a few arrays with a few elements. Also I feel like we hijacked this Q and turned it into "fastest way to merge&sort sorted arrays" which was not original Q. – Mirko Vukušić Jan 15 '17 at 19:10
  • I'm really curious about this, especially FF/Chrome difference but just can't reproduce it. I tested it even with huge arrays (5000 of 5-10 like above) in console of FF and Chrome with `console.time()` - FF 50 on Linux has even larger difference of speed than Chrome. I created jsperf to test: https://jsperf.com/merge-sort-sorted-arrays ... Chrome still way waster. Tried in FF, but I get timeouts for `concatAndSort()` but still obviosely simple func is dramatically faster – Mirko Vukušić Jan 15 '17 at 19:36
  • @Mirko Vukušić: It's really interesting. You are right it's much faster. However there is a slight problem in your code. When you do `.sort()` without the comparator callback it will fall back to the default sorting and [the default sort order is according to string Unicode code points.](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/sort) yielding a wrong result. So i would advise you to do like `.sort((a,b) => a-b)`. It is still very fast. To make it even faster i tried `Array.prototype.concat.apply([],a).sort((a,b) => a-b)` for a 5-10% gain. – Redu Jan 15 '17 at 20:25
  • @Mirko Vukušić And Mirko, I believe we haven't hijacked anything. What we are discussing should be the answer to the problem given in the link. Oh BTW now i have up-voted your answer. :) – Redu Jan 15 '17 at 20:29
0

This is a simple javascript algo I came up with. Hope it helps. It will take any number of sorted arrays and do a merge. I am maintaining an array for index of positions of the arrays. It basically iterates through the index positions of each array and checks which one is the minimum. Based on that it picks up the min and inserts into the merged array. Thereafter it increments the position index for that particular array. I feel the time complexity can be improved. Will post back if I come up with a better algo, possibly using a min heap.

function merge() {
   var mergedArr = [],pos = [], finished = 0;
   for(var i=0; i<arguments.length; i++) {
       pos[i] = 0;
   }
   while(finished != arguments.length) {
       var min = null, selected;
       for(var i=0; i<arguments.length; i++) {
          if(pos[i] != arguments[i].length) {
              if(min == null || min > arguments[i][pos[i]]) {
                  min = arguments[i][pos[i]];
                  selected = i;
              }
          }
      }
      mergedArr.push(arguments[selected][pos[selected]]);
      pos[selected]++;
      if(pos[selected] == arguments[selected].length) {
         finished++;
      }
   }
   return mergedArr;
}
poushy
  • 924
  • 8
  • 15
  • If there are `k` sequences, each with `n` elements, this will cost `O(k^2 * n)`, because for each of the `k*n` elements you check it's the minimum among the `k` sequences. – Oriol Jan 15 '17 at 03:12
  • @Oriol - Yes I agree it's not the most efficient algo. It would be better to implement with a min heap as I mentioned. Thank you for calculating the Time complexity. – poushy Jan 15 '17 at 03:16
  • @poushy Are you sure this is merging the arrays at all..? – Redu Jan 15 '17 at 08:39
  • Yes, please try it out on the console. Input should be sorted arrays. Example merge([1,2,3,4],[3,4,5,6]) – poushy Jan 15 '17 at 08:45