4

I am trying to solve a Sorting Problem in Python3 from HackerRank: https://www.hackerrank.com/challenges/fraudulent-activity-notifications/problem

This problem requires finding the median for every sub list on a running base.

My code passes for the Sample Test Cases but not entirely for the actual Test Cases due to timeout termination. I suspect using sort() every time to find the median is causing the time lag.

How can I improve my code?

def activityNotifications(expenditure, d):
    totalDays = len(expenditure)
    notified = 0

    for x in range(d, totalDays):
        check = expenditure[x-d:x]
        check.sort()

        if d % 2 == 0:
            median = (check[int(d/2)] + check[int((d-2)/2)])/2
        else:
            median = check[int((d-1)/2)]

        if expenditure[x] >= median * 2:
            notified += 1

    return notified
Anatolii
  • 11,658
  • 3
  • 27
  • 51
  • 2
    Yes it most certainly is. The `for x in range()` is O(n) and `sort()` is an O(n) (worst) operation. – pstatix Dec 28 '18 at 07:40
  • @pstatix Thanks. Any lead tips to prevent this? – hellocrypto Dec 28 '18 at 07:54
  • 1
    This is known as the "sliding window median" or "rolling median" problem. There is plenty of literature. – schwobaseggl Dec 28 '18 at 08:21
  • Your code looks as efficient as it can be, barring the import of specialized libraries like `numpy` or `pandas.rolling_median` or using cython to speed it up. Consider looking at https://stackoverflow.com/questions/37671432/how-to-calculate-running-median-efficiently which uses `numpy` – ycx Dec 28 '18 at 08:23
  • @schwobaseggl I'll look into that concept. Thanks! – hellocrypto Dec 29 '18 at 01:08

1 Answers1

2

To find a median at each iteration you sort your subarray. It's not really efficient, especially if d is not small. The time complexity of each iteration is O(dlog(d)).

To find a median we need a sorted array but we don't need a sort() method. If we notice that each expenditure[i] is in range [0;200] then a counting sort sounds like a good idea here. Basically we count a frequency of each number i using counts[i]. To get a sorted array we just need to iterate over j: counts[j] > 0.

So, if counts keeps frequencies of expenditure numbers for each interval of length d (interval [i; i + d)) we can find a median by checking at most 201 numbers from counts (see code for details). Moving to a next interval [i+1; i+d+1) requires decrementing frequency for number i as counts[i]-- and incrementing for number i+d. This approach requires O(n*201) time and O(201) space complexity.

Now, please see the code below:

def activityNotifications(expenditure, d):
    totalDays = len(expenditure)
    counts = [0] * 201
    notifications = 0

    for i in range(totalDays):
        # now we have enough data to check if there was any fraudulent activity
        if i >= d:
            # let's count frequencies of numbers in range [i - d; i)
            current_num_of_numbers = 0
            prev_number = -1
            for j in range(201):
                if counts[j] > 0:
                    current_num_of_numbers += counts[j]
                    # now we can determine the median because we have enough numbers
                    if d < (2 * current_num_of_numbers):
                        if (d % 2 == 0) and (current_num_of_numbers - counts[j] == d / 2):
                            median = (prev_number + j) / 2
                        else:
                            median = j

                        # if the condition is met then send a notification
                        if expenditure[i] >= (median * 2):
                            notifications += 1
                            break
                    prev_number = j
                counts[expenditure[j - d]] -= 1
        counts[expenditure[i]] += 1

    return notifications
Anatolii
  • 11,658
  • 3
  • 27
  • 51