-2

I'm looking for the fastest way to merge multiple pre-sorted arrays, into one sorted array, without duplicates.

For example:

const arrays = [
    [15, 30, 35, 40, 45, 50],
    [33, 36, 39, 42, 45, 48],
    [37, 38, 39, 40, 41, 42]
];

Should output:

[15, 30, 33, 35, 36, 37, 38, 39, 40, 41, 42, 45, 48, 50]

In reality these arrays are much larger so I'm looking for a fast way to do this.

This question is about performance I'm aware it can be done in with concat and sort in O(n log(n)) but I'm looking for something O(n).

parliament
  • 18,013
  • 35
  • 131
  • 223
  • 1
    https://stackoverflow.com/questions/1584370/how-to-merge-two-arrays-in-javascript-and-de-duplicate-items Or do it server side using PHP for example. Pretty easy that way. – rf1234 Mar 10 '21 at 19:57

1 Answers1

1

Untested. I don't think my code is optimal, but I currently don't see a better way to approach this.

Further small improvements might be:

  • using ordinary for loops instead of Array.prototype.reduce
  • removing arrays that reach a "done" status (instead of just flagging them as done)

console.time("Creation of arrays");
const arrays = Array.from({ length: 1000 }, (_, row) => {
  return Array.from({ length: 100000 }, (_, i) => row + i);
});
console.timeEnd("Creation of arrays");

const createMergedArray = (arrays) => {
  console.time("Creation of toProcess");
  const toProcess = arrays.map((array, row) => {
    return { done: false, row, column: 0, array };
  });
  console.timeEnd("Creation of toProcess");

  const merged = [];

  while (toProcess.some(({ done }) => !done)) {
    const scan = toProcess.reduce(
      (acc, item) => {
        if (!item.done) {
          if (item.array[item.column] < acc.minimum) {
            acc.minimum = item.array[item.column];
            acc.rows = [item.row];
          } else if (item.array[item.column] === acc.minimum) {
            acc.rows.push(item.row);
          }
        }
        return acc;
      },
      { rows: [], minimum: Infinity }
    );

    merged.push(scan.minimum);

    scan.rows.forEach((row) => {
      const item = toProcess[row];
      while (item.array[item.column] === scan.minimum) {
        ++item.column;
      }
      item.done = item.array.length === item.column;
    });
  }

  return merged;
};

console.time("Merging");
const merged = createMergedArray(arrays);
console.timeEnd("Merging");

console.assert(
  merged.every((n, i, arr) => n > arr[i - 1] || 0 === i),
  "Unsorted or duplicates found"
);

You didn't mention the size of your arrays, so I used 1k rows with 100k columns.

If you pass your arrays (number[][]) to createMergedArray, it should give you:

  • an array of unique numbers sorted ascendingly
  • an idea of whether this approach is viable for the size of your data
chillin
  • 4,041
  • 1
  • 5
  • 6