The correct O(n) solution is quite complicated, and takes a significant amount of text, code and skill to explain and prove. More precisely, it takes 3 pages to do so convincingly, as can be seen in details here http://www.cse.yorku.ca/~andy/pubs/X+Y.pdf (found by simonzack
in the comments).
It is basically a clever divide-and-conquer algorithm that, among other things, takes advantage of the fact that in a sorted n-by-n matrix, one can find in O(n)
the amount of elements that are smaller/greater than a given number k
. It recursively breaks down the matrix into smaller submatrixes (by taking only the odd rows and columns, resulting in a submatrix that has n/2
colums and n/2
rows) which combined with the step above, results in a complexity of O(n) + O(n/2) + O(n/4)... = O(2*n) = O(n)
. It is crazy!
I can't explain it better than the paper, which is why I'll explain a simpler, O(n logn)
solution instead :).
O(n * logn) solution:
It's an interview! You can't get that O(n)
solution in time. So hey, why not provide a solution that, although not optimal, shows you can do better than the other obvious O(n²)
candidates?
I'll make use of the O(n)
algorithm mentioned above, to find the amount of numbers that are smaller/greater than a given number k
in a sorted n-by-n
matrix. Keep in mind that we don't need an actual matrix! The Cartesian sum of two arrays of size n
, as described by the OP, results in a sorted n-by-n
matrix, which we can simulate by considering the elements of the array as follows:
a[3] = {1, 5, 9};
b[3] = {4, 6, 8};
//a + b:
{1+4, 1+6, 1+8,
5+4, 5+6, 5+8,
9+4, 9+6, 9+8}
Thus each row contains non-decreasing numbers, and so does each column. Now, pretend you're given a number k
. We want to find in O(n)
how many of the numbers in this matrix are smaller than k
, and how many are greater. Clearly, if both values are less than (n²+1)/2
, that means k
is our median!
The algorithm is pretty simple:
int smaller_than_k(int k){
int x = 0, j = n-1;
for(int i = 0; i < n; ++i){
while(j >= 0 && k <= a[i]+b[j]){
--j;
}
x += j+1;
}
return x;
}
This basically counts how many elements fit the condition at each row. Since the rows and columns are already sorted as seen above, this will provide the correct result. And as both i
and j
iterate at most n
times each, the algorithm is O(n)
[Note that j
does not get reset within the for
loop]. The greater_than_k
algorithm is similar.
Now, how do we choose k
? That is the logn
part. Binary Search! As has been mentioned in other answers/comments, the median must be a value contained within this array:
candidates[n] = {a[0]+b[n-1], a[1]+b[n-2],... a[n-1]+b[0]};
.
Simply sort this array [also O(n*logn)
], and run the binary search on it. Since the array is now in non-decreasing order, it is straight-forward to notice that the amount of numbers smaller than each candidate[i]
is also a non-decreasing value (monotonic function), which makes it suitable for the binary search. The largest number k = candidate[i]
whose result smaller_than_k(k)
returns smaller than (n²+1)/2
is the answer, and is obtained in log(n)
iterations:
int b_search(){
int lo = 0, hi = n, mid, n2 = (n²+1)/2;
while(hi-lo > 1){
mid = (hi+lo)/2;
if(smaller_than_k(candidate[mid]) < n2)
lo = mid;
else
hi = mid;
}
return candidate[lo]; // the median
}