0

I launch the app and start retrieving some value from a sensor every 50 milliseconds, I get some float value from [-1;1] range. How would I calculate the median value of all values I got from sensor since the launch of the app without storing them into any vector/array at all?

I understand that I can put values into a sorted collection and find the middle element or the sum of two middle elements divided by 2. I also checked std::nth_element() which doesn't even require sorting, however, I would want to avoid storing these numbers in a heap, stack or hard drive. The app can keep working for hours so the amount of number accumulated by the sensor will be massive.

Any ideas are appreciated.

Ross Stepaniak
  • 651
  • 1
  • 5
  • 17
  • 4
    Do you really need the median ? average doesn't need storage for example. – Jarod42 Jul 28 '18 at 11:44
  • 1
    Have you already tried those: https://stackoverflow.com/questions/638030/how-to-calculate-or-approximate-the-median-of-a-list-without-storing-the-list ? – Bob__ Jul 28 '18 at 11:45
  • If your data points are floating-point, the best you can do to save memory is to cast them to integers and store these to calculate the median later on. The set of real numbers is bigger than the set of integers, so, hopefully, you'll need to store less individual integers that way. It'll decrease precision, though. – ForceBru Jul 28 '18 at 11:56
  • What is **range** of your data? – MBo Jul 28 '18 at 11:59
  • @Jarod42 Unfortunately, I need median. – Ross Stepaniak Jul 28 '18 at 12:19
  • @MBo The range is [-1;1] float – Ross Stepaniak Jul 28 '18 at 12:22
  • 2
    @Ross Stepaniak What about `I get some integer value` ? – MBo Jul 28 '18 at 12:25
  • @MBo Sorry, I corrected the question. The range is always between -1.0f and 1.0f – Ross Stepaniak Jul 28 '18 at 14:45
  • 1
    OK. Usually sensor response has limited set of values (for example, internal 10-bit ADC produces only 1024 possible float results), so precision compromiss in my answer might be reasonable. – MBo Jul 28 '18 at 15:11

3 Answers3

4

For values in limited range you can use histogram approach to diminish storage space

Create array of counters and at every step increment counter corresponding to current value.

Example for 16-bit integer values:

int H[65536];
...
H[Value]++;

For float values (linear mapping with some loss of precision):

intIndex = (int) (65535 * (Value - RangeMin) / (RangeMax - RangeMin));
H[intIndex]++;

When needed, sum H entries until Count/2 is reached.

If median is required at every step, keep median index, sums in the left and right part of histogram, update these sums, shift median index when LeftSum + H[median] becomes smaller than RightSum and vice versa

MBo
  • 66,413
  • 3
  • 45
  • 68
1

Assuming you have the values stored in a std::vector (v), the simplest solution I can think of would be

std::nth_element(v.begin(), v.begin() + v.size()/2, v.end());
std::cout << "The median is " << v[v.size()/2] << '\n';

I don't know of any way to calculate the median without storing the intermediate values.

Jesper Juhl
  • 1
  • 3
  • 38
  • 63
-3

There is no way to find an exact median without storing the values. It's possible for average, but not for medians.

PlinyTheElder
  • 1,350
  • 1
  • 9
  • 14
  • 2
    Can you give a hint for the proof that it is not possible? Or is it just an assumption of yours? – MrSmith42 Jul 28 '18 at 13:52
  • I think it's a fairly safe assumption, given that the standard algorithm for this problem seems to be quickselect. If there was a more memory-efficient algorithm for this purpose, it would be known, wouldn't you agree? – PlinyTheElder Jul 28 '18 at 19:41
  • 1
    Your reasoning could be applied to **all** known algorithms, implying that we already know everything there is to know about computer science and no more research is needed... – fjardon Jul 30 '18 at 16:14
  • I have to disagree. Trying to apply that reasoning to everything would be foolish, but the assumption that simple, widely used algorithms have been looked into multiple times by intelligent people is a relatively safe one. For example, you can keep looking for a new integer division algorithm, but it's highly unlikely that you'll find anything more efficient than what has already been found. – PlinyTheElder Jul 31 '18 at 17:08