7

I'm trying to return the running median for a series of streaming numbers. To do that I use a max-heap (which stores the values on the lower half of the series) and a min-heap (which stores the values on the higher half of the series).

In particular I'm using the Python (2.0) built-in min-heap data structure from the heapq module (https://docs.python.org/2/library/heapq.html). To build the max-heap instead I simply use the negative of the numbers I need to push into my heap.

My Python code is the following:

import heapq

maxh = []
minh = []
vals=[1,2,3,4,5,6,7,8,9,10]
for val in vals:

    # Initialize the data-structure and insert/push the 1st streaming value
    if not maxh and not minh:
        heapq.heappush(maxh,-val)
        print float(val)
    elif maxh:

        # Insert/push the other streaming values
        if val>-maxh[0]:
            heapq.heappush(minh,val)
        elif val<-maxh[0]:
            heapq.heappush(maxh,-val)

        # Calculate the median
        if len(maxh)==len(minh):
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+1:
            print float(-maxh[0])
        elif len(minh)==len(maxh)+1:
            print float(minh[0])

        # If min-heap and max-heap grow unbalanced we rebalance them by
        # removing/popping one element from a heap and inserting/pushing
        # it into the other heap, then we calculate the median
        elif len(minh)==len(maxh)+2:
            heapq.heappush(maxh,-heapq.heappop(minh))
            print float(-maxh[0]+minh[0])/2
        elif len(maxh)==len(minh)+2:
            heapq.heappush(minh,-heapq.heappop(maxh))
            print float(-maxh[0]+minh[0])/2

Below is the full list of test cases I've built to check my code:

vals=[1,2,3,4,5,6,7,8,9,10] # positive numbers, increasing series
vals=[10,9,8,7,6,5,4,3,2,1] # positive numbers, decreasing series
vals=[10,9,11,8,12,7,13,6,14,5] # positive numbers, jumping series (keeping
                                # heaps balanced)

vals=[-10,-9,-8,-7,-6,-5,-4,-3,-2,-1] # negative numbers, increasing series
vals=[-1,-2,-3,-4,-5,-6,-7,-8,-9,-10] # negative numbers, decreasing series
vals=[-10,-9,-11,-8,-12,-7,-13,-6,-14,-5] # negative numbers
                                          # jumping series (keeping heaps
                                          # balanced)

vals=[-5,-4,-3,-2,-1,0,1,2,3,4,5] # mixed positive-negative numbers,
                                  # increasing series
vals=[5,4,3,2,1,0,-1,-2,-3,-4,-5] # mixed positive-negative numbers,
                                  # decreasing series
vals=[0,-1,1,-2,2,-3,3,-4,4,-5,5] # mixed positive-negative numbers,
                                  # jumping series (keeping heaps balanced)

My code seems ok to me but I cannot pass 4 out of 10 test cases with an online judge (https://www.hackerrank.com/challenges/ctci-find-the-running-median/problem).

Do you have any hint?

SergeGardien
  • 127
  • 2
  • 10
  • That problem states that the first number tells how many values will be input. Did you account for thst? – Christopher Bottoms Aug 03 '17 at 11:22
  • No, I didn't account for that because I thought it was irrelevant for sake of a solution. How do you think I could use such information? – SergeGardien Aug 03 '17 at 12:57
  • 1
    Your code can fail if there are duplicate values. If the next item is equal to the value that's currently at the top of `maxh`. – Jim Mischel Aug 03 '17 at 13:24
  • 1
    You probably don't need that information to find a solution, but it could potentially be a cause of failure if you don't have direct control over your input when submitting your code for judgin. But it sounds like @JimMischel has found something more important for you to worry about. – Christopher Bottoms Aug 03 '17 at 13:42
  • 1
    Somehow, I truncated my comment. If the next item is equal to the value that's currently at the top of `maxh`, it won't be added to either heap. The test case `[1,1,2]` should reveal the error. – Jim Mischel Aug 03 '17 at 13:46
  • Great! Thanks you @JimMischel for the hint, I feel a bit of a dumb for not thinking about it :) . I've added the block ``elif val==-maxh[0]: heapq.heappush(minh,val)`` and now I've passed all the test cases! – SergeGardien Aug 03 '17 at 14:03
  • Pardon, I've substituted the block `if val>-maxh[0]: heapq.heappush(minh,val)` with `if val>=-maxh[0]: heapq.heappush(minh,val)` – SergeGardien Aug 03 '17 at 14:12

1 Answers1

7

The problem is here:

    # Insert/push the other streaming values
    if val>-maxh[0]:
        heapq.heappush(minh,val)
    elif val<-maxh[0]:
        heapq.heappush(maxh,-val)

If val == maxh[0], then the item is never pushed onto either heap. You should be able to reveal the error with the test case [1,1,2].

A simple fix would be:

    # Insert/push the other streaming values
    if val >= -maxh[0]:
        heapq.heappush(minh,val)
    else
        heapq.heappush(maxh,-val)
Jim Mischel
  • 122,159
  • 16
  • 161
  • 305