59

To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this approach would take O(nlogn) time.

Can we do the same by some method in O(n) time? If we can, then please tell or suggest some method.

Elrond_EGLDer
  • 47,430
  • 25
  • 189
  • 180
Luv
  • 4,873
  • 8
  • 42
  • 57
  • possible duplicate of [How to find the kth largest element in an unsorted array of length n in O(n)?](http://stackoverflow.com/questions/251781/how-to-find-the-kth-largest-element-in-an-unsorted-array-of-length-n-in-on) – Saeed Amiri May 27 '12 at 20:18
  • 9
    Keep in mind that if it takes O(nlogn) then you might as well just sort the array and divide the index by 2. – Zombies Oct 27 '13 at 07:07
  • 3
    building heap takes O(n) time not O(nlogn) – GorvGoyl Nov 08 '16 at 18:40
  • 3
    @JerryGoyal, If you have all elements at the same time, then building a heap takes O(n). But if you have stream of elements then, it takes O(nlogn). Its like pushing one element at time, and n times. So, I guess he means stream of elements here. – Raghav Navada Nov 06 '19 at 19:25

9 Answers9

44

You can use the Median of Medians algorithm to find median of an unsorted array in linear time.

Katamaritaco
  • 295
  • 4
  • 12
Sergey Kalinichenko
  • 675,664
  • 71
  • 998
  • 1,399
  • 9
    @KevinKostlan It actually isn't approximate, it's the real median and it finds it in linear time. Notice that after finding the median of medians (which is guaranteed to be greater than at least 30% of the elements and smaller than at least 30% of the elements) you partition the the array using that pivot. Then you recurse (if necessary) into one of those array which is at most %70 the size of the original array in order to find the real median (or in the general case the k-statistic). – dcmm88 Jun 27 '15 at 23:31
  • 1
    @dcmm88: Please read [https://en.wikipedia.org/wiki/Median_of_medians]. In linear time, the best you can get is a good guess. (The moment you recurse you're no longer O(n)/linear - by definition.) – AlanK Apr 24 '19 at 15:21
  • @AlanK the wikipedia page you linked specifically say that it is. https://en.wikipedia.org/wiki/Median_of_medians#Proof_of_O(n)_running_time – dcmm88 Apr 25 '19 at 17:49
  • @dcmm88 Read the first sentence of the article again. MoM is O(n) *and approximate*. When you prescribe recursive repetition of a linear operation over (subsets of) a dataset to obtain a "real median", you are specifying a new algorithm, with greater time complexity, by definition. – AlanK Apr 25 '19 at 18:49
  • @AlanK excuse me, I misinterpreted the reply. I thought approximate was referring to the complexity, not the accuracy. Nevertheless, you can still use median of medians to find the true median in O(n), it's just that the wikipedia page doesn't explain this. I hinted at the solution in my previous reply, and you can find a more detailed explanation [here](http://cs.indstate.edu/~spitla/abstract2.pdf), from https://stackoverflow.com/a/251884/3272850. Basically since you are recursing into a set that is 70% size of previous step, you get a geometric series that sums to some constant times O(n). – dcmm88 Apr 26 '19 at 18:25
15

I have already upvoted the @dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort

Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).

BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams

rkachach
  • 13,862
  • 5
  • 35
  • 55
  • 5
    Why does this work? Suppose your array is [3, 2, 1]. We would then put the first 2 in a max heap: [3, 2], thus 3 would be the root, so that 2, its child must be smaller than it. And, we would have [1] in the min heap. According to this algorithm, we would then choose the max (root), of the maxHeap as our median. Wouldn't this give us 3? – Arkidillo Apr 22 '17 at 18:22
  • It's O(n^2) time worse case, not O(n). When referring to an algorithm's Big O complexity, without specifying the case, it's typically assumed that you're referring to the worse time. – Rick Apr 26 '17 at 20:37
  • Yeah the answer given is wrong, he said first n/2 elements needs to be added that's not true, in reality you have to add first n/2 (or n/2 +1 if n is odd) smallest element in Max heap and rest in Min heap hence it would ensure correct answer. Follow the link he provided below "Median of integer stream" – codeitram Jan 30 '21 at 11:42
10

Quickselect works in O(n), this is also used in the partition step of Quicksort.

BrokenGlass
  • 149,257
  • 27
  • 271
  • 318
  • 5
    I don't think quickselect would necessarily give the median in ONLY ONE run. It depends on your pivot choice. – Yash Oct 29 '14 at 10:47
  • Unfortunately, quickselect to find median will take O(n^2) in worst case. This occurs when we reduce the array by just 1 element in each iteration of QuickSelect. Consider an already sorted array and we always choose right most element as pivot. I know this is bit foolish to do so but this is how worst cases are. – Vishal Sahu Sep 20 '16 at 15:10
  • @VishalSahu you are wrong. Quickselect runs in O(n), because it always chooses a good pivot – Captain_Obvious Dec 18 '19 at 21:27
  • Quickselect is between O(n) and O(n^2). – Pavan Dittakavi May 14 '21 at 16:10
10

The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:

import random

def partition(L, v):
    smaller = []
    bigger = []
    for val in L:
        if val < v: smaller += [val]
        if val > v: bigger += [val]
    return (smaller, [v], bigger)

def top_k(L, k):
    v = L[random.randrange(len(L))]
    (left, middle, right) = partition(L, v)
    # middle used below (in place of [v]) for clarity
    if len(left) == k:   return left
    if len(left)+1 == k: return left + middle
    if len(left) > k:    return top_k(left, k)
    return left + middle + top_k(right, k - len(left) - len(middle))

def median(L):
    n = len(L)
    l = top_k(L, n / 2 + 1)
    return max(l)
royhowie
  • 10,605
  • 14
  • 45
  • 66
doizuc
  • 399
  • 4
  • 11
  • How is this linear? If I understand correctly this implementation is O(n^2) in worst case. – akki Jul 17 '18 at 12:47
  • @akki It's "expected value" linear time because of the randomness. The intuition is that the random index will, on average, split the list into a list of 1/4 size and of 3/4 size. – Jack Nov 08 '18 at 21:53
1

The answer is "No, one can't find the median of an arbitrary, unsorted dataset in linear time". The best one can do as a general rule (as far as I know) is Median of Medians (to get a decent start), followed by Quickselect. Ref: [https://en.wikipedia.org/wiki/Median_of_medians][1]

AlanK
  • 1,142
  • 7
  • 13
0

It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).

Imposter
  • 2,546
  • 1
  • 19
  • 31
0

As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
http://en.wikipedia.org/wiki/Selection_algorithm

Here is Java source for a Quickselect algorithm to find the k'th element in an array:

/**
 * Returns position of k'th largest element of sub-list.
 * 
 * @param list list to search, whose sub-list may be shuffled before
 *            returning
 * @param lo first element of sub-list in list
 * @param hi just after last element of sub-list in list
 * @param k
 * @return position of k'th largest element of (possibly shuffled) sub-list.
 */
static int select(double[] list, int lo, int hi, int k) {
    int n = hi - lo;
    if (n < 2)
        return lo;

    double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot

    // Triage list to [<pivot][=pivot][>pivot]
    int nLess = 0, nSame = 0, nMore = 0;
    int lo3 = lo;
    int hi3 = hi;
    while (lo3 < hi3) {
        double e = list[lo3];
        int cmp = compare(e, pivot);
        if (cmp < 0) {
            nLess++;
            lo3++;
        } else if (cmp > 0) {
            swap(list, lo3, --hi3);
            if (nSame > 0)
                swap(list, hi3, hi3 + nSame);
            nMore++;
        } else {
            nSame++;
            swap(list, lo3, --hi3);
        }
    }
    assert (nSame > 0);
    assert (nLess + nSame + nMore == n);
    assert (list[lo + nLess] == pivot);
    assert (list[hi - nMore - 1] == pivot);
    if (k >= n - nMore)
        return select(list, hi - nMore, hi, k - nLess - nSame);
    else if (k < nLess)
        return select(list, lo, lo + nLess, k);
    return lo + k;
}

I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].

In practice, you can expect the above code to be o(N).

Adam Gawne-Cain
  • 985
  • 10
  • 11
0

Let the problem be: finding the Kth largest element in an unsorted array.

Divide the array into n/5 groups where each group consisting of 5 elements.

Now a1,a2,a3....a(n/5) represent the medians of each group.

x = Median of the elements a1,a2,.....a(n/5).

Now if k<n/2 then we can remove the largets, 2nd largest and 3rd largest element of the groups whose median is greater than the x. We can now call the function again with 7n/10 elements and finding the kth largest value.

else if k>n/2 then we can remove the smallest ,2nd smallest and 3rd smallest element of the group whose median is smaller than the x. We can now call the function of again with 7n/10 elements and finding the (k-3n/10)th largest value.

Time Complexity Analysis: T(n) time complexity to find the kth largest in an array of size n.

T(n) = T(n/5) + T(7n/10) + O(n)

if you solve this you will find out that T(n) is actually O(n)

n/5 + 7n/10 = 9n/10 < n

0

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

Example 1:

Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

Code:

class Solution:
    def findMedianSortedArrays(self, nums1: List[int], nums2: List[int]) -> float:
        merged_array = sorted(nums1 + nums2)
        if len(merged_array) % 2 == 0:
            index = int(len(merged_array)/2)
            output =  (merged_array[index - 1] +  merged_array[index])/2
        else: 
            index = int(len(merged_array)/2)
            output = merged_array[index]
        return output