15

I have a 2 dimensional array. The rows and columns are sorted. How to find the kth largest element from the 2-d array?

Shamim Hafiz
  • 19,616
  • 36
  • 104
  • 164
Paul Nibin
  • 151
  • 1
  • 3
  • 3
    If this is a homework exercise, please mark it with *homework* tag. – Tomasz Nurkiewicz May 09 '11 at 17:39
  • 6
    What do you mean by "rows and columns are sorted"? Does the beginning of each row start after the end of the previous row, or are they sorted independently? – AShelly May 09 '11 at 17:40
  • It is not a homework. My friend asked this question to me.And the elements are sorted independently. If you take any element in the array, the elements above that and the elements that are in the left are always smaller to that element. – Paul Nibin May 09 '11 at 17:41
  • 4
    Funny I just got asked this in an interview recently. – MAK May 09 '11 at 20:22
  • 2
    possible duplicate of [search algorithm](http://stackoverflow.com/questions/5000836/search-algorithm) –  May 09 '11 at 21:47
  • @Moron: It does look like the same problem. In which case it looks like I came up with a better answer than the published solution. This surprises me. – btilly May 09 '11 at 22:39
  • @Btilly: I believe there are proven Omega(k) lower bounds for finding the kth largest. For instance: http://www.springerlink.com/content/y32362r303u8765x/. Even though the paper talks about a different problem, the lower bounds in that paper also apply to this problem I believe, as that is a special case of this. –  May 09 '11 at 22:53
  • @Moron: Is that a bound on the average case or the worst case? My worst case is potentially bad, it is only the average case that I claim is good. – btilly May 09 '11 at 23:20
  • @Btilly: I would guess (I haven't read the paper) that it is the worst case. The same is probably true about the answers in the other SO question. –  May 09 '11 at 23:23

7 Answers7

5

If you have an n * n matrix then it is possible to do this in average time O(n * log(n) * log(n)).

What you do is break the matrix into a series of sorted arrays, then do a binary search through all of them at once. For instance suppose that n = 4 and is indexed from (0,0) to (3,3). We can break it into arrays that go down a column to the rising diagonal then turn right to finish the row. This would give us the following set of sorted arrays:

  1. (0,0), (0,1), (0,2), (0,3), (1,3), (2,3), (3,3)
  2. (1,0), (1,1), (1,2), (2,2), (3,2)
  3. (2,0), (2,1), (3,1)
  4. (3,0)

This gives us n sorted lists out of the matrix.

So we need to figure out where the k'th element is in a set of sorted arrays.

We will do this with a binary search for what its value should be.

Start by taking the midpoint of our longest array, which would be the element (0,3) in the example. Then for every other array figure out how many are bigger than, smaller than, or equal to this value. (You can find this by a binary search.) This let's us figure out which side of that divide the k'th element is on. If it matches the element we just chose, we have the answer. Otherwise we can throw away on average half of each array (sometimes throw away whole arrays) and repeat.

After an average O(log(n)) operations, each of which costs O(n log(n)) to perform, we'll have the answer, leading to my estimate above.

btilly
  • 35,214
  • 3
  • 46
  • 74
  • 1
    Interesting! The running time is most likely harder to get precisely right though, since the arrays have different sizes and their inputs are not independent. I'm curious if there is some sort of median-of-medians scheme that would remove the average-ness as well. – hugomg May 09 '11 at 23:15
  • @missingno: That is why I said that there may be an extra factor of `log(n)` in there. We start with `sqrt(n)` arrays, but I suspect that most of them will fall out pretty quickly. However the first pass is `O(sqrt(n) * log(n))` so I'm not going to do better than that. – btilly May 09 '11 at 23:27
  • Could you please post an example over a real matrix? I do not understand what you mean in the third line "suppose that k=9..." – Javi May 26 '14 at 11:25
  • @JaviV I had messed up my explanation pretty badly. Please see the revised one and see if it makes more sense now. – btilly Jul 03 '14 at 19:53
  • `and repeat.` (if not done) Well, what do you select as a _median candidate_ in following iterations? (Middle=median of longest remaining?) – greybeard Jan 01 '17 at 17:11
  • @greybeard Median of longest remaining would be a reasonable choice. But picking randomly will, on average, work out just fine. – btilly Jan 03 '17 at 17:13
  • Since the given matrix is a 2-d sorted matrix, all its rows (or columns) are already sorted, which are just the `a set of sorted arrays` as you said, and there is no need to do any special breaking down of the matrix as described. – nybon Dec 24 '17 at 00:49
  • @nybon The breaking down of the matrix is not strictly necessary. But it does conveniently give a very good starting breaking point. – btilly Dec 24 '17 at 01:09
3

Do a n-way merge across the smallest dimension. When you pull off the k'th item you are done.

Testing shows this running in k lg d, where d = min(rows,cols).

AShelly
  • 32,752
  • 12
  • 84
  • 142
  • Sorry.. I couldn't get you. What do you mean by smallest dimension? If I am i right, in mergesort we use divide and conquer method. We split an array into two and do mergesort on each array. Mergesort is again dividing the array into 2 and combining in the sorted order. This we have to do recursively, till the array length is 1. Right? – Paul Nibin May 09 '11 at 17:56
  • Sorry, I meant do the "combine" or Merge step only. If your array has fewer rows than columns, merge the rows. If there are fewer columns, merge those. – AShelly May 09 '11 at 18:11
  • Thanks you for the reply. Yes. I think this would get me the answer. And the complexity of merge sort is (n log n ). Is there a way with lesser complexity? Thanks once again. – Paul Nibin May 09 '11 at 18:31
  • Since you don't actually have to do the divide part, and you can stop after finding k elements, the complexity should more like (k* c Lg c) where c is the smaller number of columns or rows. See http://stackoverflow.com/questions/5055909/algorithm-for-n-way-merge for a n-way merge algorithm. – AShelly May 09 '11 at 18:52
  • See also #http://stackoverflow.com/questions/5783696/merge-n-sorted-arrays-in-ruby-lazily – AShelly May 09 '11 at 19:06
2

There is actually an O(n) divide-and-conquer algorithm that solves the selection problem in a sorted matrix (i.e. finding the kth smallest element in a sorted matrix).

The authors of Selection In X+Y and Matrices with Sorted Rows and Columns originally proposed such an algorithm, but the way it works is not that intuitive. A simpler algorithm, the one presented below, can be found in Selection in a sorted matrix.

Definitions: Assuming a sorted n x m matrix M, with n <= m and no duplicates, we can define a submatrix N such that N consists of all odd-numbered columns and the last column of M. The rank of an element e in a matrix M is defined as rank(M,e) = |{M(i,j) | M(i,j) < e}|.

Main theorem: The algorithm relies on the fact that if M is a sorted matrix, 2*rank(N,e) - 2n <= rank(M,e) <= 2*rank(N,e).

Proof: Taking f(i) = min j s.t. M(i,j) >= e, we can state that

rank(M,e) = sum i=1 to n of f(i)
rank(N,e) = sum i=1 to n of ceil(f(i)/2) <= rank(M,e)/2 + n
=> 2*rank(N,e) - 2n <= rank(M,e)
rank(N,e) > sum i=1 to n of f(i)/2
=> rank(M,e) <= 2*rank(N,e)

Conquer: In other words, if we are to find an element with rank k in M, we would only have to look into in the submatrix P of M that is bounded by elements a and b such that rank(N,a) = floor(k/2) and rank(N,b) = ceil(k/2) + n. How many elements are in this submatrix? By the previous inequality and the assumption that there are no duplicates, so at most O(n). Therefore we just have to select the k - rank(N,a) th element in P, and this can be done by rearranging P into a sorted array in O(m), and then running a linear-time algorithm such as quickselect to find the actual element. rank(M,a) can be computed in O(m), starting from the smallest element in the matrix and iterating over the columns until an element larger than a is found, and then going to the next line and going to the previous column until we find the first element to be larger than a, etc. The conquer part thus runs in O(m).

Divide: The only thing left to do is to find a and b such that rank(N,a) = k/2 and rank(N,b) = k/2 + n. This can obviously be done recursively on N (whose size is divided by 2 with respect to M).

Runtime analysis: So all in all, we have an O(m) conquer algorithm. Taking f(n,m) as the complexity of the algorithm for an n x m matrix, with n <= m (if not the matrix could conceptually be rotated), we can establish the recurrence relation f(m) = c*m + f(m/2). By the master theorem, since f(1) = 1, we find f(n,m) = O(m). The whole algorithm has therefore a running time of O(m), which is O(n) In the case of a square matrix (this is n.b. also O(k), since we can confine the search to the k x k matrix containing the first k columns and rows).

For the general case of a matrix with duplicates, one could tag the matrix' elements with the row and column numbers.

Jean Gauthier
  • 225
  • 2
  • 8
2

Suppose I have matrix as shown below

1    2    3    4    5
6    7    8    9    10
11  12  13  14  15
16  17  18  19  20
21  22  23  24  25

When I was thinking about a solution for this problem, I saw that the first largest element will always be at (4,4). And the second largest element will be at (3,4) or (4,3) and it cannot be in (4,4). So I was thinking whether the possible positions of the kth largest element could be found in terms of the matrix size and k.

Suppose set of possible locations of kth largest element = f( size(matrix), k ).

But in the answer posted below, I could not find a simple function f() which can give generate the possible locations.

And instead of checking the elements at all the locations, I can only check the elements from the possible locations.

For finding the numbers larger than an element, we can use the following way.

If I want to find how many elements are there larger than 14. Anyway, the elements in the right side of 14 (15) and under 14 (19,24) and all the elements between them (20,25) are greater than 14. as rows and columns are sorted. Then there are 2 sub matrices above 14 ( which includes 5 and 10 ) and one below 14 (which includes 16, 17, 18, 21, 22, 23) which may or may not contain elements larger than 14. So if we find and add the number of elements larger than 14 from these 3 matrices, we will have the no of elements greater than 14.

For each possible positions, we could find the no of larger elements in the matrix. If there are k-1 larger elements, then the element at the current position is the kth largest element.

package test;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class NewTest
{
    private static int matrixSize = 25;

    private static Map < Integer, List < Point > > largestEltVsPossiblePositions = new HashMap < Integer, List < Point >>();

    static
    {
        // In the initialize method, I am populating the map
        // "largestEltVsPossiblePositions" with kth largest element and its
        // possible positions. That is 1st largest element will always be in
        // (24,24) and 2nd largest element will be (23,24) and (24,23). Like
        // that I am populating the possible locations for all the nth largest
        // elements. This map we need to initialize only once.
        initialize();
    }

    private static void initialize()
    {
        for ( int i = 1; i <= matrixSize * matrixSize; i++ )
        {
            //Getting the possible locations for each number and putting in the map.
            List < Point > possiblePositions = getPossiblePositions( matrixSize, i );
            largestEltVsPossiblePositions.put( i, possiblePositions );
        }
    }

    /**
     * @param args
     */
    public static void main( String [] args )
    {
        //        int matrixSize = 5;
        //        for ( int i = 1; i <= matrixSize * matrixSize; i++ )
        //        {
        //            List < Point > possiblePositions = getPossiblePositions( matrixSize, i );
        //            System.out.println( i + " --- " + possiblePositions.size() + " - " + possiblePositions );
        //        }

        //creating a test array.
         int [][] matrix = createTestArray();

         long currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 7 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 27 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 34 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 624 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 2 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 4 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );

         currentTimeMillis = System.currentTimeMillis();
         findKthLargestElement( matrix, 310 );
         System.out.println( "Total time : " + ( System.currentTimeMillis() -
             currentTimeMillis ) );
    }

    private static int [][] createTestArray()
    {
        int [][] matrix = new int [matrixSize] [matrixSize];

        int count = 1;
        for ( int i = 0; i < matrixSize; i++ )
        {
            for ( int j = 0; j < matrixSize; j++ )
            {
                matrix[j][i] = count;
                count++ ;
            }
        }

        return matrix;
    }

    private static void findKthLargestElement( int [][] matrix, int k )
    {
        //Get all the possible positions of this kth largest element.
        List < Point > possiblePoints = largestEltVsPossiblePositions.get( k );

        //I am sorting the points in descending order of the values in them.
        Collections.sort( possiblePoints, new PointComparator( matrix ) );

        for ( Point point : possiblePoints )
        {
            //For a point, If there are exactly k-1, larger elements in the matrix, then it is the kth largest element.
            if ( ( k - 1 ) == getNoofLargerElementsThanKFromMatrix( matrix, point ) )
            {
                System.out.println( "Largest " + k + "th element in the matrix is : " + matrix[point.x][point.y]
                        + " in the co-ordinates : " + point );
                break;
            }
        }
    }

    /*
     * This method will find the elements larger than the element at the specified point from the matrix.
     */
    private static int getNoofLargerElementsThanKFromMatrix( int [][] matrix, Point point )
    {
        int sum = 0;
        // Suppose the point is (x,y). Then all the elements (x+1,y),
        // (x+2,y).... (maxRows,y), (x,y+1), (x,y+2), ... (x,maxCols) and all
        // the numbers between them(x+1,y+1), (x+2,y+1)... (maxRows,maxCols)
        // will be surely greater than the element at the point (x,y.). We are counting those element. 
        sum = ( matrixSize - point.x ) * ( matrixSize - point.y ) - 1;
        if ( point.x > 0 )
        {
            // In the above case, we were sure that all the elements in that range are greater than element at the point.
            // There is a region in the matrix where there might be elements larger than the element at the point.
            // If the point is (x,y), then the elements from (0,y+1) to
            // (x-1,maxCols), In this region there might be some elements which
            // are larger than the element we need to count those.
            sum = sum + getNumbersGreaterThanKFromUpperMatrix( matrix, point );
        }
        if ( point.x < matrix.length - 1 )
        {
            // It is same as the above case, There is another region in the
            // matrix where there might be elements larger than the element at the point.
            // If the point is (x,y), then the elements from (x+1,0) to
            // (maxRows,y-1), In this region there might be some elements which
            // are larger than the element we need to count those.
            sum = sum + getNumbersGreaterThanKFromLowerMatrix( matrix, point );
        }
        //Once we get all the elements larger than k, we can return it.
        return sum;
    }

    private static int getNumbersGreaterThanKFromUpperMatrix( int [][] matrix, Point point )
    {
        int startY = point.y;
        if ( point.y + 1 != matrix[0].length )
        {
            startY = point.y + 1;
        }
        Point matrixStart = new Point( 0, startY );
        int startX = point.x;
        if ( point.x != 0 )
        {
            startX = point.x - 1;
        }
        Point matrixEnd = new Point( startX, matrix[0].length - 1 );
        return getLargerElementsFromTheMatrix( matrix, matrixStart, matrixEnd, matrix[point.x][point.y] );
    }

    private static int getNumbersGreaterThanKFromLowerMatrix( int [][] matrix, Point point )
    {
        int startX = point.x;
        if ( point.x + 1 != matrix.length )
        {
            startX = point.x + 1;
        }
        Point matrixStart = new Point( startX, 0 );
        int startY = point.y;
        if ( point.y != 0 )
        {
            startY = point.y - 1;
        }
        Point matrixEnd = new Point( matrix.length - 1, startY );
        return getLargerElementsFromTheMatrix( matrix, matrixStart, matrixEnd, matrix[point.x][point.y] );
    }

    private static int getLargerElementsFromTheMatrix( int [][] matrix, Point matrixStart, Point matrixEnd, int elt )
    {
        //If it is a single cell matrix, just check that element in the matrix is larger than the kth element we are checking.
        if ( matrixStart.equals( matrixEnd ) )
        {
            if ( elt <= matrix[matrixStart.x][matrixStart.y] )
            {
                return 1;
            }
            else
            {
                return 0;
            }
        }
        if ( elt <= matrix[matrixStart.x][matrixStart.y] )
        {
            return ( matrixEnd.x - matrixStart.x + 1 ) * ( matrixEnd.y - matrixStart.y + 1 );
        }
        else
        {
            //Do it recursively to get all the elements larger than elt from the matrix from the startPoint to endPoint.
            int matrixStartX = matrixStart.x;
            if ( matrixStart.x + 1 <= matrixEnd.x )
            {
                matrixStartX = matrixStart.x + 1;
            }
            int matrixStartY = matrixStart.y;
            if ( matrixStart.y + 1 <= matrixEnd.y )
            {
                matrixStartY = matrixStart.y + 1;
            }
            Point newMatrixStart = new Point( matrixStartX, matrixStartY );
            int s1 = getLargerElementsFromTheMatrix( matrix, newMatrixStart, matrixEnd, elt );
            int s2 = getLargerElementsFromTheMatrix( matrix, new Point( matrixStartX, matrixStart.y ), new Point(
                    matrixEnd.x, matrixStart.y ), elt );
            int s3 = getLargerElementsFromTheMatrix( matrix, new Point( matrixStart.x, matrixStartY ), new Point(
                    matrixStart.x, matrixEnd.y ), elt );
            return s1 + s2 + s3;
        }
    }

    //For getting the possible positions of kth largest element.
    private static List < Point > getPossiblePositions( int matrixSize, int k )
    {
        List < Point > points = new ArrayList < Point >();
        k-- ;
        for ( int i = 0; i < matrixSize; i++ )
        {
            for ( int j = 0; j < matrixSize; j++ )
            {
                int minNoGreaterThanIJ = ( matrixSize - i ) * ( matrixSize - j ) - 1;
                int maxNoGreaterThanIJ = matrixSize * matrixSize - ( ( i + 1 ) * ( j + 1 ) );
                if ( minNoGreaterThanIJ <= k && maxNoGreaterThanIJ >= k )
                    points.add( new Point( i, j ) );
            }
        }
        return points;
    }
}

class Point
{
    final int x;
    final int y;

    Point( int x, int y )
    {
        this.x = x;
        this.y = y;
    }

    @Override
    public String toString()
    {
        return "(" + x + "," + y + ")";
    }

    @Override
    public int hashCode()
    {
        final int prime = 31;
        int result = 1;
        result = prime * result + x;
        result = prime * result + y;
        return result;
    }

    @Override
    public boolean equals( Object obj )
    {
        if ( this == obj )
            return true;
        if ( obj == null )
            return false;
        if ( getClass() != obj.getClass() )
            return false;
        Point other = ( Point ) obj;
        if ( x != other.x )
            return false;
        if ( y != other.y )
            return false;
        return true;
    }
}

class PointComparator implements Comparator < Point >
{
    private final int [][] matrix;

    public PointComparator( int [][] matrix )
    {
        this.matrix = matrix;
    }

    @Override
    public int compare( Point o1, Point o2 )
    {
        if ( matrix[o1.x][o1.y] == matrix[o2.x][o2.y] )
        {
            return -1;
        }
        else if ( matrix[o1.x][o1.y] < matrix[o2.x][o2.y] )
        {
            return 1;
        }
        else
        {
            return 1;
        }
    }
}

The initialization is done once, at the beginning. When the initialization is done, the possible locations will be calculated and cached. This information can be used to find the kth largest element.

But I am not sure what will be the complexity of this method.

forsvarir
  • 10,243
  • 6
  • 38
  • 70
Paul Nibin
  • 676
  • 1
  • 10
  • 17
1

How about this?

Assuming

  1. Rows and columns are in ascending order wlog.
  2. We have to find the kth smallest number out of m*n numbers (this is the problem statement)
  3. m*n >= k, return null/raise exception otherwise

Maintain a max heap of size k.

Push A[0][0] in the heap.

for i = 1 to k
    curr_element = pop max element from heap
    Push the right and bottom neighbor of the popped element from the matrix
        (if they exist and have not been pushed earlier)

return curr_element

Time complexity = loop runs k times (O(k)) * 1 iteration runs O(3*log(k)) times = O(k*log(k))

Community
  • 1
  • 1
Him
  • 218
  • 3
  • 11
-1

Approach :

  1. Duplicate every node and insert it to the right of original node.
  2. Duplicate random pointer.
  3. Duplicate left pointer.
  4. Separate both tree.

For graphical explanation refer http://mytechspaze.com/index.php/2016/09/15/clone-binary-tree/

Kalpesh Dusane
  • 1,281
  • 3
  • 16
  • 27
-1

Lets assume an array with r rows and c columns. Indexes start at 1.

UPDATE: sorry I forgot to mention that first you have to transform k for the below formulas to work:

k = n - (k-1). Where n is the total number of elements, that is r*c.

You can get the row index of the k largest element: ceil(k/r)

You can get the column index of the k largest element: k%c (% is the Mod operator)

UPDATE: if k%c = 0, set the result to c.

The running time is O(1).

If you have a k=14 for an array with dimensions r=4 and c=4

k = 16 - (14 - 1)

k= 3

ARR[ceil(3/4),3%c] will return the kth largest element.

Enrique
  • 9,088
  • 7
  • 43
  • 56
  • This won't work if you are allowed to have duplicates in the array, which is what I believe is the case here... – gusbro May 09 '11 at 18:02
  • @gusbro Well the question does not state how duplicates should be handled. If you have a duplicate and it should be handled like a single value this wont work. If the duplicates are independent then this will work. – Enrique May 09 '11 at 18:10
  • I have 3x3 array. {{1,2,3},{4,5,6},{7,8,9}}. And I want to find the 2nd largest element. The second largest element in this case is 8 and the position is a(3,2) if the indexing starts at (1,1) or a(2,1) if the index starts at (0,0). By the solution the 2nd largest element should be a(2/3,2%3) and that would be a(0,2)? It doesnt seem to be correct. And I think, duplicate should be handled. My friend didnt say anything about duplicate values. – Paul Nibin May 09 '11 at 18:13
  • @Paul Nibin in the formula k=8 not 2. I have updated my answer. – Enrique May 09 '11 at 18:21
  • Sorry... I did not see the updated answer. Check another array. {{1,2,9},{4,5,10},{7,8,11}}. I want to find the 2nd largest element. The second largest element in this case is 10 and the position is a(2,3). k=9-(2-1)=9-1=8. By the solution the 2nd largest element should be a(ceil(8/3),8%3) and that would be a(3,2)? but it should be a(2,3). – Paul Nibin May 09 '11 at 18:25
  • @Paul Nibin Yes that wont work because 10 its not at the right position. You said: If you take any element in the array, the elements above that and the elements that are in the left are always smaller to that element. So that array contradicts your previous comment. For example if you take the 8, the 10 and 9 are above but they are not smaller – Enrique May 09 '11 at 18:27
  • There is no way this can possibly work. Suppose k = 4 in a 10 x 10 matrix. Then the k'th biggest element could be in any of the 7 cells `(1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (3, 1), (4, 1)`. – btilly May 09 '11 at 18:29
  • I think I was not very clear in the question. When I said above the element, i meant the elements in the column above the element. If you take 8, the element above the 8 are 5 and 2. I mean if you take any row or any column that will be in sorted order. – Paul Nibin May 09 '11 at 18:34
  • @btbilly (2,1), (3,1) and (4,1) rows are invalid. 1 cannot come after 2 or 3 or 4 in the row. The row and column is sorted. – Paul Nibin May 09 '11 at 18:36
  • @paul-nibin: They are valid. For instance the 4'th element is at `(2, 1)` in the following matrix: `[[1, 2, 3, 5], [4, 6, 7, 8], [9, 10, 11, 12], [13, 14, 15, 16]]`. Note that every row and column is sorted. Examples for the others are easy to construct. If this is *not* an example, then you need to fix the problem description. – btilly May 09 '11 at 18:53
  • anyway O(1) doesn't seem likely – BlackBear May 09 '11 at 18:57