7

is there a way to find the Median of an unsorted array: 1- without sorting it. 2- without using the select algorithm, nor the median of medians

I found a lot of other questions similar to mine. But the solutions, most of them, if not all of them, discussed the SelectProblem and the MedianOfMedians

Monica
  • 103
  • 1
  • 1
  • 3
  • @GordonLinoff author of the question mentions Hoare's algorithm ("without using select algo") – ig-melnyk Nov 27 '15 at 21:17
  • 3
    Why the arbitrary restrictions? And what have you tried? – SirGuy Nov 27 '15 at 21:22
  • @GordonLinoff: I remember seeing a nice answer on SO that used two heaps, and maintained them at equal sizes as values were added to either side. I've seen that in a few answers on other questions, but I couldn't find one that looked like the one I remember. I think it looked pretty efficient, and there was some trick to removing elements from one heap and adding to the other to rebalance if necessary. Hmm. – Peter Cordes Nov 28 '15 at 07:30

3 Answers3

15

You can certainly find the median of an array without sorting it. What is not easy is doing that efficiently.

For example, you could just iterate over the elements of the array; for each element, count the number of elements less than and equal to it, until you find a value with the correct count. That will be O(n2) time but only O(1) space.

Or you could use a min heap whose size is just over half the size of the array. (That is, if the array has 2k or 2k+1 elements, then the heap should have k+1 elements.) Build the heap using the first array elements, using the standard heap building algorithm (which is O(N)). Then, for each remaining element x, if x is greater than the heap's minimum, replace the min element with x and do a SiftUp operation (which is O(log N)). At the end, the median is either the heap's minimum element (if the original array's size was odd) or is the average of the two smallest elements in the heap. So that's a total of O(n log n) time, and O(n) space if you cannot rearrange array elements. (If you can rearrange array elements, you can do this in-place.)

rici
  • 201,785
  • 23
  • 193
  • 283
  • What if the size of the array is odd? Also when you say " if x is greater than the heap's minimum, replace the min element with x" do I also need to balance the minHeap, because after replacing the tree isn't a minHeap. – Liger Mar 31 '19 at 03:49
  • I don't really understand how it is working but it is working for me for odd and even size of the array. I would like to request an explanation or a reference to "How this works"! – Liger Mar 31 '19 at 04:41
  • 1
    @Milind: Edited the answer to be a bit more precise about even and odd array sizes. Yes, you have to SiftUp after the swap, but that's pretty fast. – rici Apr 03 '19 at 21:50
4

There is a randomized algorithm able to accomplish this task in O(n) steps (average case scenario), but it does involve sorting some subsets of the array. And, because of its random nature, there is no guarantee it will actually ever finish (though this unfortunate event should happen with vanishing probability).

I will leave the main idea here. For a more detailed description and for the proof of why this algorithm works, check here.

Let A be your array and let n=|A|. Lets assume all elements of A are distinct. The algorithm goes like this:

  1. Randomly select t = n^(3/4) elements from A.
  2. Let T be the "set" of the selected elements.Sort T.
  3. Set pl = T[t/2-sqrt(n)] and pr = T[t/2+sqrt(n)].
  4. Iterate through the elements of A and determine how many elements are less than pl (denoted by l) and how many are greater than pr (denoted by r). If l > n/2 or r > n/2, go back to step 1.
  5. Let M be the set of elements in A in between pl and pr. M can be determined in step 4, just in case we reach step 5. If the size of M is no more than 4t, sort M. Otherwise, go back to step 1.
  6. Return m = M[n/2-l] as the median element.

The main idea behind the algorithm is to obtain two elements (pl and pr) that enclose the median element (i.e. pl < m < pr) such that these two are very close one two each other in the ordered version of the array (and do this without actually sorting the array). With high probability, all the six steps only need to execute once (i.e. you will get pl and pr with these "good" properties from the first and only pass through step 1-5, so no going back to step 1). Once you find two such elements, you can simply sort the elements in between them and find the median element of A.

Step 2 and Step 5 do involve some sorting (which might be against the "rules" you've mysteriously established :p). If sorting a sub-array is on the table, you should use some sorting method that does this in O(slogs) steps, where s is the size of the array you are sorting. Since T and M are significantly smaller than A the sorting steps take "less than" O(n) steps. If it is also against the rules to sort a sub-array, then take into consideration that in both cases the sorting is not really needed. You only need to find a way to determine pl, pr and m, which is just another selection problem (with respective indices). While sorting T and M does accomplish this, you could use any other selection method (perhaps something rici suggested earlier).

Community
  • 1
  • 1
cobarzan
  • 654
  • 3
  • 10
  • 22
0

A non-destructive routine selip() is described at http://www.aip.de/groups/soe/local/numres/bookfpdf/f8-5.pdf. It makes multiple passes through the data, at each stage making a random choice of items within the current range of values and then counting the number of items to establish the ranks of the random selection.

Markus Jarderot
  • 79,575
  • 18
  • 131
  • 135
mcdowella
  • 18,736
  • 2
  • 17
  • 24