4

I know the answer is using median of medians but can someone explain how to do it?

user2316569
  • 343
  • 2
  • 8
  • 1
    There are some threads on this here on SO: [Thread 1](http://stackoverflow.com/questions/12545795/explanation-of-the-median-of-medians-algorithm), [Thread 2](http://stackoverflow.com/questions/9489061/understanding-median-of-medians-algorithm). And there are [more](http://stackoverflow.com/search?q=%22median+of+medians%22) – keyser May 04 '13 at 18:30
  • Find the median of each a sample of values which fit into memory and take a median of those. – Peter Lawrey May 04 '13 at 19:28
  • I found the code for quickselect if you still need it – aaronman Jul 02 '13 at 21:42

3 Answers3

1

There are linear time algorithms to do this, this page might be helpful, http://en.wikipedia.org/wiki/Selection_algorithm, if you are still confused just ask

Basically the way the selection algorithm works is like a quicksort but it only sorts on side of the pivot each time. The goal is to keep partitioning until you choose the pivot equal to the index of the element you were trying to find. Here is java code I found for quickselect:

public static int selectKth(int[] arr, int k) {
 if (arr == null || arr.length <= k)
  throw new Error();

 int from = 0, to = arr.length - 1;

 // if from == to we reached the kth element
 while (from < to) {
  int r = from, w = to;
  int mid = arr[(r + w) / 2];

  // stop if the reader and writer meets
  while (r < w) {

   if (arr[r] >= mid) { // put the large values at the end
    int tmp = arr[w];
    arr[w] = arr[r];
    arr[r] = tmp;
    w--;
   } else { // the value is smaller than the pivot, skip
    r++;
   }
  }

  // if we stepped up (r++) we need to step one down
  if (arr[r] > mid)
   r--;

  // the r pointer is on the end of the first k elements
  if (k <= r) {
   to = r;
  } else {
   from = r + 1;
  }
 }

 return arr[k];
}
aaronman
  • 17,266
  • 6
  • 57
  • 78
  • Thanks, but my question is how to use the algorithm, when the all the numbers won't fit in the memory. Can someone please explain in detail – user2316569 May 04 '13 at 20:01
  • for this algorithm all the numbers do not need to be in memory at once, read it – aaronman May 04 '13 at 20:03
  • thanks - can u please confirm my understanding? First I will bring the first five numbers in memory and find their median using selection algoritm. I store the result in memory. Then I bring the next five numbers in memory - and store their median in memory. And so forth. ie Finally I will have n/5 numbers in memory. Now I run a selection algorithm among them to find the median of these numbers. – user2316569 May 05 '13 at 09:52
  • Yes, and then you use the select algorithm because you have a guaranteed good pivot – aaronman May 05 '13 at 16:40
  • Another possible solution is maintaining the median as the list grows if that is an option – aaronman May 05 '13 at 16:42
0

here is the Median of Medians algorithm. check this out

stinepike
  • 50,967
  • 14
  • 89
  • 108
0

See the first two answers to this question. If the first one (frequency counts) can work for your data / available storage, you can get the exact answer that way. The second (remedian) is a robust, general method.

Community
  • 1
  • 1
Phil Steitz
  • 614
  • 3
  • 10
  • There is also a two-heap algorithm that uses a min and a max heap in parallel to find the median with constant storage even with large numbers. – Thomas Jungblut May 04 '13 at 19:33
  • Can you provide a reference, Thomas, to this constant-storage two-heap algorithm? – Phil Steitz May 05 '13 at 18:39
  • see http://stackoverflow.com/questions/2579912/how-do-i-find-the-median-of-numbers-in-linear-time-using-heaps – Thomas Jungblut May 05 '13 at 19:03
  • Thanks, Thomas. Could be I am misunderstanding the setup in the article, but I don't see a storage bound there. Looks like the heaps end up including all of the values. What am I missing? – Phil Steitz May 05 '13 at 19:37