6

This was one of my interview questions.

We have a matrix containing integers (no range provided). The matrix is randomly populated with integers. We need to devise an algorithm which finds those rows which match exactly with a column(s). We need to return the row number and the column number for the match. The order of of the matching elements is the same. For example, If, i'th row matches with j'th column, and i'th row contains the elements - [1,4,5,6,3]. Then jth column would also contain the elements - [1,4,5,6,3]. Size is n x n. My solution:

RCEQUAL(A,i1..12,j1..j2)// A is n*n matrix
if(i2-i1==2 && j2-j1==2 && b[n*i1+1..n*i2] has [j1..j2])
   use brute force to check if the rows and columns are same.
if (any rows and columns are same)
   store the row and column numbers in b[1..n^2].//b[1],b[n+2],b[2n+3].. store row no,
                                                 // b[2..n+1] stores columns that 
                                                 //match with row 1, b[n+3..2n+2] 
                                                 //those that match with row 2,etc..

else
   RCEQUAL(A,1..n/2,1..n/2);
   RCEQUAL(A,n/2..n,1..n/2);
   RCEQUAL(A,1..n/2,n/2..n);
   RCEQUAL(A,n/2..n,n/2..n);

Takes O(n^2). Is this correct? If correct, is there a faster algorithm?

Brahadeesh
  • 2,207
  • 8
  • 37
  • 57
  • Question - if one of the rows is [1,2,3] and one of the columns is [2,3,1] is that considered a match? – corsiKa Mar 23 '11 at 17:13
  • No, they have to be in the same order. – Brahadeesh Mar 23 '11 at 17:17
  • Isn't it n cubed? If you're comparing each on n rows against each of n columns, that's n*n comparisons and each compare is potentially length n (worst case). Average with random data would probably be n squared. – phkahler Mar 23 '11 at 17:33
  • its T(n)=4T(n/2)+O(1) which is O(n^log4)=O(n^2) – Brahadeesh Mar 24 '11 at 03:33
  • Sorry, it's been a while since this question has been added, but I must ask, isn't this n^2? You'll potentially be doing n/2 comparisons, so you're recurrence cannot be + O(1). What's the insight/trick here that is essentially preventing you from comparing each row to every column? – Kakira Jun 27 '15 at 01:59

5 Answers5

5

you could build a trie from the data in the rows. then you can compare the columns with the trie.

this would allow to exit as soon as the beginning of a column do not match any row. also this would let you check a column against all rows in one pass.

of course the trie is most interesting when n is big (setting up a trie for a small n is not worth it) and when there are many rows and columns which are quite the same. but even in the worst case where all integers in the matrix are different, the structure allows for a clear algorithm...

Adrien Plisson
  • 19,742
  • 4
  • 38
  • 71
  • He said the matrix is randomly populated. A trie takes advantage of common prefixes. Besides, you'll still have O(n*n) to scan all the data to create the trie. – phkahler Mar 23 '11 at 17:25
  • @phkahler: you are right. however, you may not built the trie at first. you can built the trie while comparing columns&rows. the trie then becomes a kind of memoizing structure and do not necessitate O(n*n) to scan all the rows. – Adrien Plisson Mar 23 '11 at 17:30
  • thinking of it, every algorithm i come up with have a worst time of almost O(n*n). the trie will allow to filter out most cases if the integers are truly random, but have a worst case if all elements are equal except the last row and last column... – Adrien Plisson Mar 23 '11 at 17:33
  • @Adrien Plisson what about mine? – orlp Mar 23 '11 at 17:34
  • @nightcracker: i have to admit, i have a hard time grasping your algorithm... i don't see how building the diagonals would not be O(n*n), same as building the trie... (the trie solution seems to be O(n*n+n*n) in the worst case, one pass to build the trie from the rows (n*n), another to test the columns) – Adrien Plisson Mar 23 '11 at 20:14
  • @Adrien Plisson: You are right. I think I need to rethink this. Sorry. – orlp Mar 23 '11 at 20:38
1

You could speed up the average case by calculating the sum of each row/column and narrowing your brute-force comparison (which you have to do eventually) only on rows that match the sums of columns.

This doesn't increase the worst case (all having the same sum) but if your input is truly random that "won't happen" :-)

corsiKa
  • 76,904
  • 22
  • 148
  • 194
  • Yes. I thought of that. But I did not incorporate it as it does not decrease the bound on the running time. Is there an algorithm that can do this in say O(n logn) or O(n) ? – Brahadeesh Mar 23 '11 at 17:01
  • It would seem that because every row would have to be compared to every column that you wouldn't be able to get around the (row*col --> n*n) running time. That being said, they said the same thing about sorting until lg-n sorts came out too... – corsiKa Mar 23 '11 at 17:05
  • yeah. So thats what I was looking for.Some sort of 2D adaptation of quicksort to this problem. – Brahadeesh Mar 23 '11 at 17:20
  • @Jasie: O(nlogn) is the upper bound in the worst case for heap sort and some others. – phkahler Mar 23 '11 at 17:30
0

This might only work on non-singular matrices (not sure), but...

Let A be a square (and possibly non-singular) NxN matrix. Let A' be the transpose of A. If we create matrix B such that it is a horizontal concatenation of A and A' (in other words [A A']) and put it into RREF form, we will get a diagonal on all ones in the left half and some square matrix in the right half.

Example:

A = 1 2
    3 4

A'= 1 3
    2 4

B = 1 2 1 3
    3 4 2 4

rref(B) = 1  0 0   -2
          0  1 0.5 2.5

On the other hand, if a column of A were equal to a row of A then column of A would be equal to a column of A'. Then we would get another single 1 in of of the columns of the right half of rref(B).

Example

A=
 1     2     3     4     5
 2     6    -3     4     6
 3     8    -7     6     9
 4     1     7    -5     3
 5     2     4    -1    -1

A'=
 1     2     3     4     5
 2     6     8     1     2
 3    -3    -7     7     4
 4     4     6    -5    -1
 5     6     9     3    -1

B = 
 1     2     3     4     5     1     2     3     4     5
 2     6    -3     4     6     2     6     8     1     2
 3     8    -7     6     9     3    -3    -7     7     4
 4     1     7    -5     3     4     4     6    -5    -1
 5     2     4    -1    -1     5     6     9     3    -1

rref(B)=
 1     0     0     0     0    1.000  -3.689  -5.921   3.080   0.495
 0     1     0     0     0        0   6.054   9.394  -3.097  -1.024
 0     0     1     0     0        0   2.378   3.842  -0.961   0.009
 0     0     0     1     0        0  -0.565  -0.842   1.823   0.802
 0     0     0     0     1        0  -2.258  -3.605   0.540   0.662

1.000 in the top row of the right half means that the first column of A matches on of its rows. The fact that the 1.000 is in the left-most column of the right half means that it is the first row.

Phonon
  • 12,013
  • 12
  • 57
  • 111
0

Without looking at your algorithm or any of the approaches in the previous answers, but since the matrix has n^2 elements to begin with, I do not think there is a method which does better than that :)

dcn
  • 4,119
  • 2
  • 28
  • 36
  • This argument is not correct. There are square-matrix problems which have n^2 elements but can be solved in linear time. All I'm saying is, you can have better algorithm than the size of input. – Srikanth Mar 23 '11 at 22:13
  • Of Course there are problems where you may not have to consider the entire input (e.g. "Determine the first number in a sequence of n elements") and I did not state an "argument" for a lower bound of O(n^2) for the given problem. All I am saying is, that for this specific problem, it seemed obvious that you may have to consider all elements in the matrix! – dcn Mar 24 '11 at 09:05
0

IFF the matrix is truely random...

You could create a list of pointers to the columns sorted by the first element. Then create a similar list of the rows sorted by their first element. This takes O(n*logn).

Next create an index into each sorted list initialized to 0. If the first elements match, you must compare the whole row. If they do not match, increment the index of the one with the lowest starting element (either move to the next row or to the next column). Since each index cycles from 0 to n-1 only once, you have at most 2*n comparisons unless all the rows and columns start with the same number, but we said a matrix of random numbers.

The time for a row/column comparison is n in the worst case, but is expected to be O(1) on average with random data.

So 2 sorts of O(nlogn), and a scan of 2*n*1 gives you an expected run time of O(nlogn). This is of course assuming random data. Worst case is still going to be n**3 for a large matrix with most elements the same value.

phkahler
  • 5,474
  • 1
  • 20
  • 31