20

Given an unsorted sequence of integers that flows into your program as a stream.

The integers are too many to fit into memory.

Imagine there is a function:

int getNext() throws NoSuchElementException;

It returns the next integer from the stream.

Write a function to find the median.

Solve the problem in O(n).

Any ideas?

Hint is given (use heap the data structure..)

Henk Holterman
  • 236,989
  • 28
  • 287
  • 464
SiLent SoNG
  • 3,942
  • 3
  • 24
  • 31

4 Answers4

10

You have to maintain two heaps one max heap ( which contains the smallest N/2 elements seen till now) and one min heap ( which contains the largest N/2 elements). Store the extra element aside if N is odd.

Whenever you call the function getNext(),

If N becomes odd, save the new element as the extra element. If necessary, exchange that element with one from the min-heap or max-heap to satisfy the following condition

max(max-heap) <= extra element <= min(min-heap).

If N becomes even, do the same as above to get a 2nd extra element. Then, add the smaller one to the max-heap and the larger one to the min-heap. Insert should be O(log N)

Get Median: O(1)
If N is odd, the median is the extra element.
If N is even, the median is the average between the tops of the 2 heaps

Tanuj
  • 535
  • 3
  • 12
  • 13
    How would you cater to "the integers are too many to fit in memory"? –  Aug 09 '10 at 16:08
  • Not to mention the O(n) time constraint. Even with a Fibonacci heap the running time will be O(n lg n) – deinst Aug 09 '10 at 17:04
2

See this paper. It will (likely) take more than one pass. The idea is that in each pass upper and lower bounds are computed such that the median lies between them.

A fundamental result here is N = size of data, P = number of passes

Theorem 2) A P-pass algorithm which selects the Kth highest of N elements requires storage at most O(N(1/P)(log N)(2-2/P)).

Also, for very small amounts of storage S, i.e., for 2 <= S <= O((log N)2), there is a class of selection algorithms which use at most O((log N)3/S) passes.

Read the paper. I'm not really sure what the heap has to do with it

Svante
  • 46,788
  • 11
  • 77
  • 118
deinst
  • 16,749
  • 3
  • 43
  • 45
0

Using selection algorithm, we can achieve this in O(n) complexity. But I still dont understand the use of a Heap in this case clearly.

Pavan Dittakavi
  • 2,604
  • 5
  • 23
  • 36
0

Suppose the window for the median to maintain is K. Construct a binary search tree for K number. O(K); Do in-order traversal and find the (K/2)th element.O(K/2);

The overall time is O(K).

willawill
  • 450
  • 6
  • 4