Selection algorithms on sorted matrix

Question

this is a google interview question :

Given a N*N Matrix. All rows are sorted, and all columns are sorted. Find the Kth Largest element of the matrix.

doing it in n^2 is simple and we can sort it using heap or merge sort (n lg n) and then get it, but is there a better approach, better than (n lg n)?

example of the array ::

 1   5   7  12
 3   6   8  14
 4   9  10  15
11  17  19  20

1<5<7<12 and 1<3<4<11 similarly the other rows and columns. now say we need to find the 10th smallest element, in here it is 11..hope this adds some detail to the question...

possible duplicate of [Find the top k sums of two sorted arrays](http://stackoverflow.com/questions/5000512/find-the-top-k-sums-of-two-sorted-arrays) — templatetypedef, Feb 15 '11 at 07:21
The question # 5000512 was changed, so this is no longer a dupe. — , Feb 15 '11 at 07:31
This problem is definitely not the same as two sorted arrays. — Ted Hopp, Feb 15 '11 at 07:37
O(n lg n)? what do you mean by this? number of elements is n^2 how do u want sort them in n log n? — Saeed Amiri, Feb 15 '11 at 08:41
You can find the kth smallest in linear time *regardless* of the order in the matrix, using the classical selection algorithm. — Darius Bacon, Feb 15 '11 at 08:50
@Darius Bacon, Kth largest element in array of size n is linear with n but Kth largest element in the array of length n^2 is linear with n^2 as OP said O(n^2) algorithm is simple. — Saeed Amiri, Feb 15 '11 at 09:20
@Nohsib - you can sort this array with N lg N comparisons, where N=4? that's 4 * 2 = 8 comparisons to sort 16 numbers. I would like to see how that works. — Peter Recore, Feb 15 '11 at 17:15
@Saeed and @Peter, the poster uses capital N for the matrix dimensions and small n when talking about the size of the input; I take them to be different since n log n is given for the sorting complexity. — Darius Bacon, Feb 16 '11 at 20:21
What do you mean by "All rows are sorted, and all columns are sorted"? for example, [1,4;2,3],how is that sorted according to your precondition? — Zhongjie Wu, Feb 18 '11 at 04:41
i find a O(n) algorithm here: [O(n) solution](http://learn.hackerearth.com/forum/161/kth-largest-element-in-a-2d-array-sorted-along-both-rows-and-columns/) But is just suits for Interger not Float. — Kelvin, Dec 24 '12 at 06:58

score 3 · Accepted Answer · answered Feb 15 '11 at 13:51

3

Yes, there is an O(K) algorithm due to Frederickson and Johnson.

Greg N. Frederickson and Donald B. Johnson. Generalized Selection and Ranking: Sorted Matrices. SIAM J. Comput. 13, pp. 14-30. http://epubs.siam.org/sicomp/resource/1/smjcat/v13/i1/p14_s1?isAuthorized=no

answered Feb 15 '11 at 13:51

a dabbler

414
2
4

This has the correct and optimal solution. I wanted to expand it into a separate answer to help people stuck in the paywall but... the algorithm is pretty messy. – hugomg Feb 15 '11 at 15:02
It's asymptotically optimal but not especially practical I expect. It's certainly messy enough not to fit in the confines of an interview. – a dabbler Feb 15 '11 at 15:04
1

@missingno: Not so optimal. I have a more efficient average running time solution to this problem at http://stackoverflow.com/questions/5940420/find-kth-largest-element-from-a-2-d-sorted-array/5941009#5941009 and it isn't that messy of an algorithm. – btilly May 09 '11 at 22:39
1

unable to access the pdf ?? – prashantitis Jul 22 '13 at 09:43
2

-1 This answer is totally useless, the article is no longer available. The answer-er has not been online since answering this question. – user568109 Nov 13 '13 at 15:44

DaeMoohn · Answer 2 · 2011-02-15T10:16:58.750

With the matrix given in the example: If you want to search for the 7-th element, you know the 7-th element is in the elements M[4][1..4], M[1..4][4]. You obtain two arrays already sorted, 12,14,15,20 and 11,17,19 which can be merged. Then you apply a binary search which is O(log N).

Generalize: for k-th biggest element in this matrix, you have to select the proper layer: [2N-1] + [2(N-1)-1]+...>=k so the algorithm to select the proper layer to lookout for is Sum[2(N-i)-1]>=k, for i=0,N-1, where i is the layer's number. After you find i, the layer number, you will have 2(N-i)-1 elements in that array that have to be merged and then searched. The complexity to search that layer is O(log[2(N-i)-1] = O(log(N-i))...

The arithmetic progression leads to

0>=i^2-2*N*i+k

i1,2=N+-sqrt(N^2-k), where k is the element we search...

score 0 · Answer 3 · answered Sep 19 '12 at 05:58

My code below is an O(k) algorithm. It does not work on a certain edge case (probably one in each direction: x and y). I listed the edge case so someone can fix it. I'm not going to fix it because it's bed time for me.

Summary of algorithm: you only need to keep track of two candidate #s that might be the smallest, one while proceeding in the x-direction and one while proceeding in the y-direction. Think about it and it might make sense to you.

enum Direction {
  X,
  Y
};

struct Index
{
  Index(int unsigned x, int unsigned y)
    : x(x),
      y(y)
  {}

  void operator = (Index const & rhs)
  {
    x = rhs.x;
    y = rhs.y;
  }

  int unsigned x;
  int unsigned y;
};

int unsigned solve(int unsigned i_k, int unsigned ** i_data, int unsigned i_n)
{
  if (1 == i_k) {
    return i_data[0][0];
  }

  Direction dir = X;
  Index smaller(0,0);
  Index larger(0,0);

  if (i_data[1][0] < i_data[0][1]) {
    dir = X;
    smaller = Index(1,0);
    larger = Index(0,1); }
  else {
    dir = Y;
    smaller = Index(0,1);
    larger = Index(1,0);
  }

  for (int unsigned i = 0; i < (i_k - 2); ++i) {
    int unsigned const x = smaller.x;
    int unsigned const y = smaller.y;
    if (X == dir) {
      if ((x + 1) == i_n) {
        // End of row
        smaller = larger;
        larger.x += 1;
        dir = Y; }
      else if (i_data[x + 1][y] < i_data[larger.x][larger.y]) {
        smaller.x += 1; }
      else {
        smaller = larger;
        larger = Index(x + 1, y);
        dir = Y;
      } }
    else {
      if ((y + 1) == i_n) {
        // End of col
        smaller = larger;
        larger.y += 1;
        dir = X; }
      else if (i_data[x][y + 1] < i_data[larger.x][larger.y]) {
        smaller.y += 1; }
      else {
        smaller = larger;
        larger = Index(x, y + 1);
        dir = X;
      }
    }
  }
  return i_data[smaller.x][smaller.y];
}

doesn't work on the following edge case (where we hit the end of a row). I'm going to bed, feel free to fix this case:

  size = 4;
  data = createMatrix(size);
  data[0][0] = 1; data[1][0] = 6; data[2][0] = 10; data[3][0] = 11;
  data[0][1] = 3; data[1][1] = 7; data[2][1] = 12; data[3][1] = 14;
  data[0][2] = 4; data[1][2] = 8; data[2][2] = 13; data[3][2] = 15;
  data[0][3] = 5; data[1][3] = 9; data[2][3] = 19; data[3][3] = 20;
  answer = solve(14, data, size);
  assertAnswer(answer, 15, ++testNum);
  deleteMatrix(data, size);

score 0 · Answer 4 · answered Jan 25 '13 at 06:42

The following is my C++ solution, which is based on a min heap. When a cell in the matrix is on the top of the min heap, the number at the right and/or the downside will be inserted into the heap.

#include <vector>
#include <algorithm>
#include <functional>

using namespace std;

struct Entry {
    int value;
    int x;
    int y;

    bool operator < (const Entry& other) {
        return this->value > other.value;
    }
};

bool getKthNumber(int* matrix, int row, int col, int k, int* result){
    if(matrix == NULL || row <= 0 || col <= 0 || result == NULL)
        return false;
    if(k <= 0 || k > row * col)
        return false;

    vector<Entry> minHeap;
    Entry first = {matrix[0], 0, 0};
    minHeap.push_back(first);
    make_heap(minHeap.begin(), minHeap.end());

    for(int i = 0; i < k; ++i){
        first = minHeap[0];
        int x = first.x;
        int y = first.y;
        if(first.y == 0 && first.x < row - 1){
            Entry next = {matrix[(x + 1) * col], x + 1, y};
            minHeap.push_back(next);
            push_heap(minHeap.begin(), minHeap.end());
        }
        if(first.y < col - 1){
            Entry next = {matrix[x * col + y + 1], x, y + 1};
            minHeap.push_back(next);
            push_heap(minHeap.begin(), minHeap.end());
        }

        pop_heap(minHeap.begin(), minHeap.end());
        minHeap.pop_back();
    }

    *result = first.value;
    return true;
}

Nathan · Answer 5 · 2011-02-16T22:30:36.320

You do a breath first search starting at the (0,0). (0,0)’s 2 children (0,1) & (1,0) are added to the potential candidates list for the 2nd element. Loop picking the smallest element in the potential candidates list to be the next element, add it’s children to the potential candidates list. Stop when find the kth element.

Make the potential candidates list a min heap. The heap will never be bigger than n+m.

Also you could do the reverse from the last element (n,m) if k is greater than n*m/2.

Worst Case: this would be n*m/2 lg(n + m), instead of n*m lg(n * m) of sorting.

score 0 · Answer 6 · answered Mar 18 '12 at 14:37

You can find the k^th smallest element in time O(n log n) expected, if you notice that:

Generating a random number that lies between Array[i][j] and Array[k][l] such that Array[i][j] < Array[k][l] takes O(n) time (expected) and

Using [1] as a subroutine, you can use a procedure similar to RANDOMIZED-SELECT to generate the k^th smallest number in the whole array.

score -1 · Answer 7 · answered Feb 15 '11 at 08:30

rotate the matrix clockwise by 45 degrees. You will get a diamond shaped data set. The height will be 2N-1, number of elements in each row from top will be like: 1,2,3,4,5,4,3,2,1 for a N=5

You will find out that each number in a row is always larger than any number above.

for k-th row(counting from 1), you will have k elements for k < N and , 2N-k for k >= N k belongs to {1..2N-1}

By computing accumulative number of elements from row 1 to k-1 and 1 to k, you will find the row where your target locates(sum(1 to k-1)

Now that you have located a row of elements with worst case N total. You can sort them and then find the correct one. this taks O(N ln N)

since N = sqrt(n), overall cost of this algorithm is O(sqrt(n) ln(sqrt(n)))

In the example, the third row of the diamond has a number (4) smaller than a number above it (5). — Darius Bacon, Feb 15 '11 at 08:49

score -2 · Answer 8 · answered Feb 15 '11 at 14:44

Based on N, you can find the diagonal where the element is located. For example in the matrix,

 1   5   7  12
 3   6   8  14
 4   9  10  15
11  17  19  20

You can deduce the diagonal by determining the total # of elements in the previous diagonals,

/diagonal#/elements/# of elements/cumulative # of elements/
/d1/ 1         / 1 / 1 /
/d2/ 3 5       / 2 / 1+2 = 3 /
/d3/ 4 6 7     / 3 / 1+2+3 = 6 /
/d4/ 11 9 8 12 / 4 / 1+2+3+4 = 10 /
/d5/ 17 10 14  / 3 /
/d6/ 19 15     / 2 /
/d7/ 20        / 1 /

The reason why we need to find the diagonal is because the diagonals above will always have elements lesser than any of the current diagonal elements and the diagonals below will always have elements greater than any of the current diagonal elements.

So, you can be sure that diagonal d4 has the required element(Since it contains 7th largest to 10th largest). Since until the previous diagonal there were 6 elements, you just need to find the 4th largest element in diagonal d4.

Doesn't work. Look how 4 in the 3rd diagonal is less than the 5 in the 2nd diagonal. — hugomg, Feb 15 '11 at 15:40

Selection algorithms on sorted matrix

8 Answers8

Linked

Related