0

My app is receiving stream of doubles with total count more than 10 billions. And i need to calculate some statistical parameters from these numbers. I found ways to calculate average and mode. But I have problems with median. There is max heap/min heap solution for this purpose (given in the first answer there Running median). But in this way I must store 5 billions doubles in each heap (or more). Arrays, lists or dictionaries can't do this. How can I do it? And what must I do in described solution if my next element will be the same as element of maxHeap?

  • 1
    AFAIK *running* median is called so because it's calculated for last N items. – Sinatr Feb 11 '21 at 13:26
  • 1
    [Wiki](https://en.wikipedia.org/wiki/Moving_average#Moving_median) – Sinatr Feb 11 '21 at 13:32
  • 1
    Assuming you mean that you want to calculate the median for the whole data set every time a new number is added to the data set: You need to have all the numbers to know where the new median would be, so this is always going to require `O(N)` space. – Matthew Watson Feb 11 '21 at 13:37
  • 1
    You can allow large arrays in the .NET with the help of `gcAllowVeryLargeObjects` [tweak](https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element) – Nikita Sivukhin Feb 11 '21 at 13:38
  • Also, you can improve your algorithm, if you can read your stream more than once. In this case you can use some bucketing approach - choose some reasonable `range` value and than on the first pass count amount of the numbers in the buckets `[x..x + range]`. After this you can determine the bucket where median lies and then on the second pass use naive solution to find kth order statistics in the bucket. – Nikita Sivukhin Feb 11 '21 at 13:43
  • "And what must I do in described solution if my next element will be the same as element of maxHeap?" There is no limitation that a min/max heap must only contain unique numbers as far as I know. – JonasH Feb 11 '21 at 16:32
  • You could also consider if you need the exact median or if an approximation would be sufficient. – JonasH Feb 11 '21 at 16:33

0 Answers0